I had asked a question earlier this year surveying everyone here about your company’s data stack and thought it was a fruitful conversation.
As my company Payoff has matured, I’ve realized more and more that something that is as or possibly more important is how the data team is structured across the company. And when I say data team I’m referring to the full data stack and the role people play in building out the infrastructure, building data integrations and of course analyzing the data itself.
So how is your data team structured? Does it have a clear delineation between data engineers (generate the data in the application aka website), DBAs (those that work on the data pipes or ETL) and data analysts/scientists (those that derive business value from the data)? Or do you have an organization where the line is blurred between DBAs and data scientists kind of like Stichfix does it: their motto is engineers shouldn’t write ETL.
To kick things off I thought I’d share how my company is structured. We have:
- Data Engineers: Backend engineers who are creating production data
- DBAs: Those who are only responsible for building out data integrations and bringing first-party and third-party data into a centralized data repository
- Data Analysts/Scientists: End-users of the data who run analysis, build Looks/Dashboards, create markdown output, automate reporting, building models
Pros
- Clear delineation of responsibilities. People generally do what they are good at.
- Data scientists and analysts do not have to worry about ETL. They can focus their energies on analysis.
Cons
- When the ETL breaks or there is some issue with it, it’s hard for the data scientists
to validate where the issue is coming from since in this setup the ETL is essentially a blackbox - ETL can be somewhat dry work as they are often not informed of the output or end result of the data they are working with. In this structure, they function kind of like a middle-man (albeit a very important one!) taking data and passing it along
- The DBA does not often have enough context either up-stream about the production data or down-stream as to what data is important for the business
I’m curious how other company’s divvy up the data work across these different roles and manage the overlap between responsibilities.