The Data Engineer Shortage Nobody Talks About: Why It Pays More Than Data Science Now
Published on BirJob.com · March 2026 · by Ismat
The Moment I Realized the Market Had Flipped
In 2018, I remember reading articles about how "Data Scientist" was the sexiest job of the 21st century. Harvard Business Review had said so, and the internet agreed. Every bootcamp was churning out data scientists. Every tech conference had a keynote about machine learning. Every university was launching a Master's in Data Science. The gold rush was real, and everyone wanted in.
Fast forward to 2025. A friend of mine — a solid data scientist with a Master's degree, three years of experience, and genuine skills in Python, scikit-learn, and PyTorch — spent four months looking for a new role in Europe. Meanwhile, another friend with comparable years of experience but in data engineering had three offers within three weeks. The data engineer's starting salary was higher by $15,000.
That's the moment I started paying attention. Something had quietly shifted in the data job market, and most career advice hadn't caught up yet. Data engineering had overtaken data science in both demand and compensation. Not at every company, not in every geography, but in aggregate, across the industry, the crossover had happened. This article is about why, and what it means if you're deciding where to put your career chips.
The Numbers First
Let's start with evidence, not vibes.
- LinkedIn's 2025 Jobs on the Rise listed Data Engineer among the top 10 fastest-growing roles in the U.S. for the second consecutive year. Data Scientist was not on the list.
- The Bureau of Labor Statistics projects 8% growth for "Database Administrators and Architects" (the closest BLS category to data engineering) through 2032. Meanwhile, the BLS page for Data Scientists projects 36% growth — but that headline number is misleading because the base is much smaller and the category is broad enough to include analytics roles that aren't "data science" in the industry sense.
- Dice's 2024 Tech Salary Report showed Data Engineer median salary at $130,000, compared to $120,000 for Data Scientist. The gap has been widening since 2022.
- Stack Overflow's 2024 Developer Survey found that "Database Administrator" and data engineering-adjacent roles reported higher median compensation than "Data Scientist" roles for the second consecutive year.
- The dbt Community Survey 2024 showed that the analytics engineering role (a subspecialty of data engineering) grew 40% year-over-year, with respondents reporting average salaries of $130,000–$155,000 in the U.S.
- Brent Ozar's 2024 Data Professional Salary Survey, one of the most detailed compensation studies in the data world, showed data engineers out-earning data scientists by a median of $8,000–$12,000 at equivalent experience levels.
- A Glassdoor search as of early 2026 shows approximately 3x more open "Data Engineer" positions than "Data Scientist" positions in the U.S. market.
The data is consistent across sources: data engineering is in higher demand, has more open positions, and now pays as much or more than data science at the median. This is a structural shift, not a blip.
How We Got Here: A Brief History of the Data Job Market
2012–2017: The Data Science Gold Rush
It started with the 2012 HBR article. "Data Scientist: The Sexiest Job of the 21st Century." The timing was perfect: big data (Hadoop, MapReduce) was exploding, machine learning was becoming accessible through scikit-learn and early TensorFlow, and companies were awash in data they didn't know how to use. Everyone wanted a data scientist. Salaries skyrocketed. Bootcamps proliferated. Universities scrambled to create programs.
The problem? Companies hired data scientists before they had the infrastructure to support them. A data scientist needs clean, reliable, well-structured data to build models. Most companies didn't have that. What they had was a mess: data in spreadsheets, legacy databases, SaaS tools, CSV exports, and someone's personal S3 bucket. Data scientists spent 80% of their time cleaning and wrangling data, and 20% actually doing data science. The New York Times reported on this "janitor work" problem as early as 2014, and it never really went away.
2018–2021: The Realization
Companies started figuring out that the bottleneck wasn't models — it was data infrastructure. You can't build a recommendation engine if your clickstream data is unreliable. You can't train a fraud detection model if your transaction data has gaps. The "last mile" of data science (building and deploying models) was well-served by tools and talent. The "first mile" (getting clean data into a usable state) was a disaster.
This realization led to a surge in data engineering hiring. Companies that had hired 10 data scientists and 1 data engineer started flipping that ratio. The a16z "Emerging Architectures for Modern Data Infrastructure" post (2020) captured the moment: the data stack was getting more complex, not less, and someone had to build and maintain all of it.
2022–Present: Data Engineering Ascendant
Three forces converged to push data engineering to the top:
- The Modern Data Stack matured. Tools like Snowflake, Databricks, dbt, Fivetran, and Airbyte created an entire ecosystem of data tools that required specialized knowledge to implement and maintain. Someone had to set this up, and that someone was a data engineer.
- Data regulation increased. GDPR, CCPA, and industry-specific regulations (HIPAA, SOX) meant that data pipelines weren't just about analytics anymore — they were about compliance. Data lineage, data quality, access control, and audit trails became first-class concerns. This is infrastructure work, not modeling work.
- AI/ML deployment got serious. As companies moved from "we have a model in a notebook" to "we have a model in production serving 10 million requests per day," the engineering challenge overwhelmed the data science challenge. Feature stores, model serving infrastructure, data versioning, and ML pipelines — all of this is data engineering work.
What Data Engineers Actually Do
Let me be specific about what a data engineer's week looks like at a mid-size tech company:
- Monday: A dbt model broke overnight because a source table added a new column. Debug the issue, update the model, add a data quality test (dbt tests), and re-run the pipeline. Takes 2 hours. Then review a PR from a junior engineer who's building a new pipeline for the marketing team.
- Tuesday: Design meeting for a new data pipeline. The product team wants real-time analytics for a feature launching next quarter. You propose a Kafka → Flink → Snowflake architecture. Write the technical design document. Estimate the work at 3 sprints.
- Wednesday: Cost optimization. The Snowflake bill spiked 30% last month because someone ran a full table scan on a 2TB table every hour. Find the offending query, optimize it, add a warehouse auto-suspend policy. Save $4,000/month.
- Thursday: Build a new Airflow DAG that ingests data from a third-party API (HubSpot CRM data), transforms it, and loads it into the warehouse. Write data quality checks. Set up alerts for failures.
- Friday: Documentation and onboarding. A new data analyst joined, and they need to understand the data warehouse schema, how to find tables, and how to request changes. Update the data catalog in Atlan. Help them write their first dbt model.
Data engineering is fundamentally about building and maintaining the infrastructure that makes data usable. It's plumbing. It's not glamorous. But without it, nothing else in the data organization works.
The Modern Data Stack: What You Need to Know
The "modern data stack" is the collection of tools that has come to define how companies handle data in the 2020s. Understanding it is essential if you're considering a data engineering career.
| Layer | Function | Key Tools | What a Data Engineer Does Here |
|---|---|---|---|
| Ingestion | Move data from sources into a central repository | Fivetran, Airbyte, Stitch, custom Python scripts | Configure connectors, build custom extractors, handle schema changes, monitor reliability |
| Storage | Store raw and processed data | Snowflake, Databricks, BigQuery, Redshift, Delta Lake, Iceberg | Design schemas, manage partitioning, optimize costs, implement access controls |
| Transformation | Clean, model, and structure data for analysis | dbt, Spark, SQL | Write transformation logic, build data models, implement testing and documentation |
| Orchestration | Schedule and monitor data pipelines | Apache Airflow, Dagster, Prefect, Mage | Build DAGs, set up dependencies, handle failures and retries, manage scheduling |
| Data Quality | Ensure data is accurate, complete, and timely | Great Expectations, dbt tests, Monte Carlo, Soda | Define expectations, build monitors, set up alerting for data quality issues |
| Catalog & Governance | Document, discover, and control access to data | Atlan, DataHub, Alation, Collibra | Maintain metadata, define data ownership, implement access policies, support compliance |
| BI & Analytics | Enable analysts and stakeholders to consume data | Looker, Tableau, Power BI, Metabase, Superset | Build semantic layers, optimize query performance, create data extracts |
| Streaming | Process data in real-time | Kafka, Flink, Spark Streaming, Kinesis | Build real-time pipelines, manage event schemas, handle backpressure and failures |
Every one of these layers represents skills that a data engineer needs. The breadth is enormous. This is one reason the role pays well — you need to be competent across the entire stack, not just one tool.
Tools and Skills Breakdown
Here's the honest skills hierarchy for a data engineer in 2026:
Non-Negotiable (must have)
- SQL: Not basic SQL. Advanced SQL: window functions, CTEs, recursive queries, query optimization, understanding query planners. SQL is 60% of the job. If you're weak at SQL, nothing else matters.
- Python: Scripting, data manipulation (pandas), API interactions, writing production-quality code (not notebook-quality code). Testing, packaging, error handling.
- Data modeling: Dimensional modeling (Kimball), data vault, wide tables. Understanding why you structure data a certain way, not just how.
- One cloud platform: AWS (S3, Glue, Redshift, Lambda), GCP (BigQuery, Cloud Storage, Dataflow), or Azure (Synapse, Data Factory, ADLS). Pick one, go deep.
- Git: Version control for pipelines and transformations. This should be obvious but it isn't for everyone coming from analyst backgrounds.
Expected (most mid-level roles require)
- dbt: The tool that defined analytics engineering. If you work with a modern data warehouse, you'll use dbt. Learn dbt Core and understand the project structure.
- Airflow or equivalent: Orchestration is core. Understand DAGs, task dependencies, retries, sensors, and how to debug failed runs.
- Snowflake or Databricks: One of the two dominant cloud data platforms. Both are everywhere. Snowflake for SQL-centric workloads, Databricks for Spark-centric workloads.
- Docker: Containerization for pipeline deployment and local development.
- Basic Spark: For large-scale data processing. You don't need to be an expert, but you need to understand partitioning, shuffles, and when Spark is the right tool.
Differentiators (senior and specialized roles)
- Kafka / streaming: Real-time data processing. High-demand skill, limited supply.
- Terraform / IaC: Infrastructure-as-code for data infrastructure. Valuable for platform-oriented data engineers.
- Kubernetes: Running Spark, Airflow, or Flink on Kubernetes.
- Data governance and compliance: GDPR, HIPAA, SOX. Understanding data lineage, PII detection, and access controls.
- ML engineering overlap: Feature stores, model serving, MLOps. The boundary between data engineering and ML engineering is blurry.
The Salary Crossover: Data Engineering vs Data Science
Let's put the salary data in a direct comparison. These numbers are based on U.S. market data from multiple sources:
| Level | Data Engineer (total comp) | Data Scientist (total comp) | Difference |
|---|---|---|---|
| Entry-level (0–2 yrs) | $95,000–$120,000 | $85,000–$110,000 | DE +$10,000 |
| Mid-level (3–5 yrs) | $130,000–$165,000 | $115,000–$150,000 | DE +$15,000 |
| Senior (6–10 yrs) | $165,000–$220,000 | $150,000–$200,000 | DE +$15,000–$20,000 |
| Staff / Principal | $200,000–$300,000 | $180,000–$280,000 | DE +$20,000 |
| FAANG-level Senior | $250,000–$400,000+ | $230,000–$380,000+ | DE +$20,000 (varies) |
Sources: Glassdoor Data Engineer, Glassdoor Data Scientist, Levels.fyi Data Engineer, Levels.fyi Data Scientist, Brent Ozar 2024 Survey, Dice 2024 Tech Salary Report.
Important nuance: These are medians. At the very top of the data science pyramid — ML researchers at DeepMind, OpenAI, or Anthropic — compensation exceeds anything in data engineering. A senior research scientist at a frontier AI lab can earn $500,000–$1,000,000+. But that's a tiny fraction of people who hold the "Data Scientist" title. For the 90th percentile and below, data engineering pays more.
In emerging markets (Azerbaijan, Turkey, Eastern Europe, Southeast Asia), data engineering roles pay $12,000–$45,000 for local positions and $25,000–$70,000 for remote international roles. Data scientist roles in the same markets pay $10,000–$40,000 locally. The gap is smaller but follows the same pattern: data engineering commands a premium because the supply shortage is global.
Why Data Scientists Are Becoming Data Engineers
One of the most interesting trends I've observed is the migration of data scientists into data engineering. Here's why it's happening:
1. The "Data Scientist Doing Data Engineering" Problem
This is the root cause. Many data scientists were hired to build models but spend most of their time on data infrastructure work: building ETL pipelines, cleaning data, managing data quality, and fighting with Spark jobs. A 2022 Anaconda State of Data Science survey found that data scientists spent 39% of their time on data preparation, making it the single largest time sink. Many data scientists eventually realize: "If I'm spending 60% of my time doing data engineering anyway, maybe I should just be a data engineer — and get paid more for it."
2. The Entry-Level Data Science Crunch
The data science bootcamp boom of 2016–2020 flooded the entry-level market. A junior data scientist position now receives hundreds of applications. The 365 Data Science analysis of the data job market found that entry-level data science roles receive 4–5x more applications than data engineering roles of similar seniority. Meanwhile, data engineering has a genuine shortage — there aren't enough people who know Airflow, dbt, and Spark to fill the open positions. The supply-demand dynamics are simply more favorable for data engineers.
3. The Skills Are Transferable
A data scientist who knows Python, SQL, and a bit of cloud already has 50% of the data engineering skillset. The leap isn't as large as going from, say, frontend development to data engineering. The main things to learn: proper software engineering practices (testing, CI/CD, error handling), orchestration tools (Airflow), data modeling, and cloud-native data services. A motivated data scientist can make this transition in 6–12 months of focused learning and practice.
4. The Career Ceiling Is Higher Than Expected
Data engineering has a clear progression path that many people don't see at first: Data Engineer → Senior Data Engineer → Staff Data Engineer → Principal Data Engineer or Data Architect or Engineering Manager. The staff and principal levels at top companies pay $250,000–$400,000+. The "Data Architect" path leads to VP-level roles at large organizations. It turns out that designing and maintaining the data infrastructure for a company with petabytes of data is a genuinely difficult engineering challenge that commands executive-level compensation.
The Analytics Engineer: The Role That Didn't Exist Five Years Ago
I need to talk about analytics engineering, because it's a critical piece of this story. The role was essentially invented by dbt Labs (creators of dbt) and popularized through the dbt community.
An analytics engineer sits between the data engineer and the data analyst. They write the transformation logic (in SQL, using dbt) that turns raw data into clean, well-modeled, well-documented, well-tested datasets that analysts can use directly. They don't build the ingestion pipelines (that's the data engineer). They don't build dashboards (that's the analyst). They build the data models that connect the two.
Why does this matter? Because analytics engineering is where many data scientists are landing. It requires SQL proficiency (which data scientists have), data modeling knowledge (which data scientists can learn quickly), and a product mindset (treating the data model as a product consumed by analysts). It pays well ($120,000–$165,000 at mid-level, per the dbt survey), has strong demand, and doesn't require the deep systems engineering skills of a full data engineer.
If you're a data scientist who likes SQL, enjoys building well-structured data models, and wants a role with strong demand, analytics engineering is worth serious consideration.
Controversy: Is the Data Science Hype Over?
Let me address this directly, because it comes up in every conversation about data engineering vs data science.
The "Data Science Is Dead" Argument
- The bootcamp pipeline oversaturated the entry-level market.
- Many companies hired data scientists and then didn't know what to do with them.
- AutoML tools (H2O, DataRobot, Google AutoML) automated much of the classical ML work.
- The "80% data wrangling" problem was never solved; it was just rebranded as data engineering.
- LLMs and foundation models are shifting the AI landscape away from custom-trained models toward prompt engineering and fine-tuning, which requires different skills.
The "Data Science Is Fine, Actually" Counterargument
- The BLS projects 36% growth for data scientists through 2032 — one of the fastest-growing occupations in the economy (source).
- Machine learning is being deployed in more industries than ever: healthcare (diagnostics), finance (fraud), logistics (optimization), and manufacturing (predictive maintenance).
- The oversaturation is primarily at the entry level. Experienced data scientists (5+ years, with deployment experience) are still in high demand.
- AI/ML engineering — a more engineering-focused flavor of data science — is booming alongside data engineering.
- The Kaggle State of Machine Learning and Data Science Survey 2023 showed continued strong interest and career satisfaction among practitioners.
The Reality
Data science isn't dead. But it has bifurcated. The top end — people with deep ML expertise, research publications, and production deployment experience — is thriving. The bottom end — people with a bootcamp certificate and basic scikit-learn skills — is oversaturated. The middle is being squeezed by a combination of AutoML, pre-trained models, and the growing realization that most business problems don't actually require custom ML models; they require good data infrastructure and well-written SQL.
Data engineering, by contrast, has a healthy demand curve at every level. Junior data engineers are in demand. Mid-level data engineers are in demand. Senior data engineers are desperately sought. The skills are less glamorous but more consistently valuable. And the supply hasn't caught up because nobody made a Netflix documentary about the sexiness of writing Airflow DAGs.
Data Engineer vs Data Scientist: The Full Comparison
| Dimension | Data Engineer | Data Scientist |
|---|---|---|
| Core question | "How do we make this data reliable and available?" | "What can we learn from this data?" |
| Primary output | Pipelines, data models, infrastructure | Models, analyses, insights |
| Key language | SQL (primary), Python (secondary) | Python (primary), SQL (secondary), R (sometimes) |
| Math required | Minimal (set theory, basic statistics) | Significant (linear algebra, calculus, probability, statistics) |
| Engineering depth | High (distributed systems, databases, cloud infra) | Medium (software best practices, model deployment) |
| Day-to-day work | Building pipelines, debugging data issues, optimizing queries, managing infrastructure | EDA, feature engineering, model training, A/B test analysis, stakeholder presentations |
| Job market (2026) | Shortage at all levels | Oversupplied at entry level, strong at senior level |
| Median salary (U.S.) | $130,000 | $120,000 |
| Career ceiling | Staff/Principal DE, Data Architect, VP Data Engineering | Staff/Principal DS, ML Research Lead, VP Data Science, Chief Data Officer |
| Personality fit | Builders, systems thinkers, reliability-minded | Curious, analytical, research-oriented, comfortable with ambiguity |
What I Actually Think
Here's my unfiltered take after watching the data job market closely for years:
Data engineering is the better career bet for most people right now. Not because data science is bad, but because the risk-reward profile of data engineering is superior. Data engineering has more jobs, higher median pay, lower entry barriers (no PhD needed), and a clearer skill progression. You don't need to publish papers or win Kaggle competitions. You need to be good at SQL, Python, and cloud infrastructure, and those are learnable, testable skills.
Data science is the better bet if and only if you're genuinely passionate about statistics and modeling. If you love research, if you get excited about a new paper on arxiv, if you want to push the boundaries of what's possible with data — data science is where that happens. But if you're choosing data science because it sounds impressive or because of the 2012 HBR article, reconsider. The market has changed. The median data scientist does not work at OpenAI. The median data scientist builds dashboards and runs A/B tests at a mid-size company, and they could do that work more efficiently with better data infrastructure — which a data engineer provides.
The smartest move might be to be a data engineer who understands data science. Knowing how to build a pipeline AND understanding how a model will consume that data makes you extraordinarily valuable. You can design data models that serve both analytics and ML use cases. You can spot data quality issues that would silently corrupt a model. You can have credible conversations with data scientists about what's feasible and what's not. This "full-stack data" profile is rare and commands the highest premiums.
For people in emerging markets like Azerbaijan: data engineering is especially attractive because the skills are cloud-native and location-independent. You can work as a remote data engineer for a European or American company from Baku, building pipelines in Snowflake and dbt, and earn significantly more than local market rates. Data science, by contrast, often requires more domain context and closer collaboration with business stakeholders, which can be harder to do remotely across large timezone gaps. The practical dynamics favor data engineering for remote work.
If You're Choosing Right Now: A Decision Framework
- Do you prefer building systems or analyzing data?
Building systems → Data Engineering. Analyzing → Data Science. - How do you feel about math?
Love it (linear algebra, probability) → Data Science. Tolerate it → Data Engineering. - Do you want to write SQL most of the day?
Yes → Data Engineering or Analytics Engineering. No → Data Science. - Do you have a strong background in software engineering?
Yes → Data Engineering (your skills transfer directly). No → Either, but Data Science entry is friendlier to non-engineering backgrounds. - Are you optimizing for job availability or for passion?
Job availability → Data Engineering (more openings, less competition). Passion → Whichever genuinely excites you. - Do you want to work at a frontier AI company?
Yes → Data Science / ML Engineering (that's where the research happens). No → Data Engineering is more broadly applicable.
If you answered 4+ toward data engineering, start learning SQL (really learning it — window functions, CTEs, optimization), pick up Airflow or Dagster, get hands-on with Snowflake or BigQuery, and build a portfolio project that demonstrates an end-to-end pipeline. The market is waiting.
Getting Started: A Practical Roadmap
If you've decided data engineering is the path, here's the order I'd learn things:
- SQL (4–6 weeks): Go beyond SELECT. Master window functions, CTEs, subqueries, JOINs (all types), aggregations, and query optimization. Use StrataScratch, LeetCode SQL, or Mode Analytics SQL tutorial.
- Python for data (4–6 weeks): pandas, file I/O, API requests, error handling, testing with pytest. Write production-quality scripts, not notebooks.
- Cloud fundamentals (2–4 weeks): Pick AWS, GCP, or Azure. Learn the core storage (S3/GCS/ADLS) and compute services. Get a free tier account and build things.
- dbt (2–3 weeks): Work through the dbt Learn courses (free). Build a project that transforms raw data into dimensional models with tests and documentation.
- Airflow (3–4 weeks): Install locally (or use Astronomer's free tier). Build a DAG that extracts from an API, transforms with Python/dbt, and loads into a database.
- Snowflake or Databricks (2–3 weeks): Get hands-on with one cloud data platform. Understand warehouses, virtual warehouses, stages, and access control.
- Portfolio project (4–6 weeks): Build an end-to-end pipeline. Ingest data from a public API, transform it, load it into a warehouse, add data quality tests, orchestrate with Airflow, and document everything. Put it on GitHub. This is what gets you interviews.
Total: roughly 5–7 months of focused part-time learning. That's faster than most data science bootcamps, and the job market on the other side is more favorable.
Sources
- Harvard Business Review — Data Scientist: The Sexiest Job of the 21st Century (2012)
- Bureau of Labor Statistics — Database Administrators and Architects
- Bureau of Labor Statistics — Data Scientists
- LinkedIn Jobs on the Rise 2025
- Dice 2024 Tech Salary Report
- Stack Overflow Developer Survey 2024
- dbt Community Survey 2024
- Brent Ozar 2024 Data Professional Salary Survey
- Glassdoor — Data Engineer Salaries
- Glassdoor — Data Scientist Salaries
- Levels.fyi — Data Engineer Compensation
- Levels.fyi — Data Scientist Compensation
- a16z — Emerging Architectures for Modern Data Infrastructure (2020)
- New York Times — For Big-Data Scientists, Hurdle to Insights Is Janitor Work (2014)
- Anaconda State of Data Science 2022
- 365 Data Science — Data Job Market Analysis
- Kaggle State of Machine Learning and Data Science Survey 2023
- Snowflake
- Databricks
- dbt Labs
- Apache Airflow
- Apache Kafka
- Apache Spark
- Great Expectations
- dbt Learn Courses
I'm Ismat, and I build BirJob — Azerbaijan's job aggregator. If you're looking for data engineering, data science, or analytics roles in the region or beyond, BirJob scrapes 91 job sources daily so you can find opportunities without checking every careers page. The data pipes itself.
You might also like
- The Analytics Role Confusion: Business Analyst, Data Analyst, BI Analyst — What's the Actual Difference?
- Data Analyst vs Data Scientist vs Data Engineer: The 2026 Decision Guide
- AI Engineer vs ML Engineer: What Actually Changed and Why It Matters
- DevOps vs SRE vs Platform Engineer: The Infrastructure Title Mess, Explained
