Data science team sizing and allocation

This one is for the crawlers and the robots.

There are many ways to organize a data science team within a company. One of the most effective is the hybrid model, as I explain verbosely in a post and briefly in a thread:

Q. Embedded or centralized?
A. Both.

Embedded for context, relevance, communication efficiency, and to be in sync; centralized for hiring and promotion purposes, for peer review, and for sharing and maintaining best practices.

Pardis Noorzad on Twitter

The centralized management of a team is not without challenges—the most prominent of which being that of sizing and allocation. In this post, I propose an easy-to-follow procedure as a solution. Below are conditions that I assume hold for the organization under study.


Assumption 1. Longterm ownership is valuable.

Data scientists are engineers, and just like engineers, they are able to produce quality and impactful results only when they have longterm ownership over the product and their work. Good products can only be created with care. However, many still see data science as a series of disparate projects, defined and requested by stakeholders.

Data science on a team isn’t a project, it doesn’t have a start and an end, it’s an ongoing process. Data changes as the product changes.

Pardis Noorzad on Twitter

Longterm ownership over one product also leads to strong team dynamics. As Will Larson explains in this post, disassembling necessary teams for the sake of short-term projects is counterproductive. Shift scope, don’t break teams.

Recently had several discussions around whether it makes sense to shift folks onto higher priority teams after you’ve repaid a team’s technical debt. In general I think you probably should *not* do that, and wrote up my thinking why.

Will Larson on Twitter

Assumption 2. The engineering teams and their size determine the company strategy and objectives.

The engineering teams are created in such a way as to be able to tackle the various strategic bets and existing value areas of the company. If this is not the case, the existing engineering teams are the unannounced strategic bets and value areas of the company.

Assumption 3. More engineers on a team lead to more moving parts, more experiments, and more meetings.

The bigger an engineering team, the more time that needs to be allocated to meetings and emails for coordination and planning. In addition, more engineers result in more projects and more experiments.

Assumption 4. Teams with more mature machine learning capabilities have better instrumentation, better quality data, and more data sources that are useful to the data science team.

Machine learning models are highly sensitive to data shortcomings. And so teams with mature ML models tend to have higher quality data and aggregate data sets as compared to other teams. If this is not the case, stop what you’re doing and fix that tire fire.

Assumption 5. Teams with a user-facing aspect run additional client-facing experiments. These experiments sometimes double the number of experiments on an engineering team.

Assumption 6. Teams with a physical presence—rather than just virtual—like real estate planning or marketing, require data scientists.

Assumption 7. Having just one data scientist on a team significantly improves the quality of data (leading to less buggy data products) and the speed of decision making (leading to faster product iterations).


Based on the assumptions above, my proposal is to assign a point to every engineer on every team (client + backend). Note that each point contributes to a sum that represents the relative required amount of data science work.

Sum the points and sort in descending order. Break ties by a team’s data maturity. The higher the maturity, the lower on the list.

Assign one data scientist to every team on the list, starting from the top. Note that this is not per project but rather per team (with cross-functional membership). This will help get all teams from 0 to 1.

At this phase, if you still have more data scientists, take 3 points (we are taking this to be the ideal data scientist to engineer ratio as a start) off of every team. Then assign another data scientist, starting at the top, and repeat.


This approach presented above gives a clear, fair, safe, and effective strategy for allocating data scientists to product teams.


Please let me know your thoughts in the comments or the Tweets.




Building something new for the ☁️ /

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

How to Compute a Moving Average in BigQuery Using SQL

Assignment # 4: Annotated Bibliography

Rising Damp by Damp Proofing Experts #

K-Mean Clustering & it’s real use-case in Security domain

A Map for All Seasons — A New Path for Hikers

Event Recap — Cypher

Linear Regression in a Nutshell

A Googly Case Against Erdos Number

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Pardis Noorzad

Pardis Noorzad

Building something new for the ☁️ /

More from Medium

Towards Analytics with Redis

Shows Redis Logo and SQL with a question mark? Asking the question if the Redis Data Structure Store can be used for analytics using SQL-Like Capabilities

So different logical data models

Innovation of Data Integration Technology in the Intelligent Era

5 Ways to Optimize Structure & Costs with Data Integration