Models for integrating data science teams within companies

At our inaugural DS Crit meeting.

The center-of-excellence model

We start with the most centralized of all other models. In the center-of-excellence (CoE) model, also known as the research model, the expectation is that the data science team works independently to identify big bets and build prototypes. Under this model, the data science team is considered to be the company’s innovation arm.

Some misconceptions

There are some misconceptions that lead companies to choosing the CoE model for their data science team.


There are important drawbacks to having the data science team operate within the CoE model:

Benefits and success scenarios

It should be noted that the CoE model works for many types of teams. Centralization helps focus and agency. You should centralize that which you can clearly encapsulate from the rest of the organization. Centralization works when coupling is low and joint meetings are few and far between.

Accounting model

In the accounting model, also known as the BI model, the central data science team produces reports and presentations on a recurring basis (usually monthly and quarterly). The data science team would inform the company of notable movements in top-level metrics. Once the team identifies an interesting or worrying trend, they would work with product teams to investigate the root cause. Thus, quite frequently, playing detective becomes a main activity of the data science team under the accounting model.


There are three main drawbacks to this model:


Reporting on quarterly trends of company metrics is valuable practice. The centralized aspect of the BI team allows for a holistic view of the SBU, thereby leading to decisions leading to global optimizations that can balance and correct local decisions. This work is something that the data science team should be tackling as their charter, regardless of the model under practice.

The consultant model

In the consultant model, the central data science team is assigned tickets or emailed with questions. Data science managers then prioritize the tickets and questions and assign them to data scientists.


In this model, the data science manager overrides any existing data science roadmaps to prioritize the questions and needs of stakeholders. Due to the symmetrical treatment of all members of the team, this model makes managing a data science team easy and cheap.


There are many drawbacks with this model:

The embedded model

In this model, product teams hire their own data scientists. Each engineering manager is in charge of planning for data scientist headcount, hiring, and allocation. The data scientist within each product team has the engineering team members as their peers.


This model brings welcome independence to the teams and relieves the SBU of the management requirements of a fully centralized data science team. It solves problems with team sizing and communications by distributing responsibility. It also solves the ownership and motivation issues that exist in fully centralized models.


While there are reductions in data science management cost, this model has important drawbacks:

The democratic model

In this model, it is believed that easy and straightforward access to data by product managers, designers, engineering managers, and engineers would lessen or remove the need for a data science role. Many identify the need for data scientists to be due to the lack of proper infrastructure for fast and easy dashboard creation.


It is valuable to invest in data infrastructure and tooling that makes data access, processing, and visualization simpler everyday. This investment is particularly valuable to data scientists as it frees up time for proactive opportunity sizing, experiment design, metric design, model design, and general improvements in methodology.


While ensuring everyone has direct and easy access to data is a noble goal, there are some drawbacks to this model:

The product data science model

Between the extremes of the fully centralized model (the CoE model) and the fully decentralized model (the embedded model), there exists a spectrum of hybrid models that take characteristics from each of the aforementioned models. Taking advantage of the strengths of both models, while actively making up for their deficiencies is what makes hybrid models successful.


a. Clear ownership, actionable insights, and speed. One important benefit of the PDS model is clear ownership of projects by the data scientists, due to their membership in the various product teams. Membership in each product team gives data scientists a thorough understanding of that product, its limits, and its potential. This in turn allows a straightforward mapping of analysis to proposals for action. It is difficult to move fast if newly available insight does not map into reasonable and informed actions.

Pete Skomoroch on Twitter


No model is perfect and each have their drawbacks. To quote Sinofsky,


Where an SBU is involved, I recommend the PDS model as the best in effectiveness and efficiency in leveraging data for the business.


[1] Functional versus Unit Organizations by Steven Sinofsky
[2] Building Data Science Teams by dj patil
[3] Where should you put your data scientists by Daniel Tunkelang
[4] How to play well with others by Josh Wills


Thanks to Raki Wane, Peter Skomoroch, Sayan Sanyal, Parham Noorzad, Josh Montague, Chris Albon, Josh Silverman, and Harish Krishnan for reviewing and providing valuable feedback.

Management theorist ·

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store