Chapter 79. The Two Types of Data Engineering and Data Engineers

Jesse Anderson

There are two types of data engineering.1 And there are two types of jobs with the title data engineer. This is especially confusing to organizations and individuals who are starting out learning about data engineering. This confusion leads to the failure of many teams’ big data projects.

Types of Data Engineering

The first type of data engineering is SQL-focused. The work and primary storage of the data is in relational databases. All of the data processing is done with SQL or a SQL-based language. Sometimes this data processing is done with an ETL tool.

The second type of data engineering is big data–focused. The work and primary storage of the data is in big data technologies like Apache Hadoop, Cassandra, and HBase. All of the data processing is done in big data frameworks like MapReduce, Spark, and Flink. While SQL is used, the primary processing is done with programming languages like Java, Scala, and Python.

Types of Data Engineers

The two types of data engineers closely match the types of data engineering.

The first type of data engineer does their data processing with SQL. They may use an ETL tool. Sometimes their title is database administrator (DBA), SQL developer, or ETL developer. These engineers have little to no programming experience.

The second type of data engineer is a software engineer who specializes in big data. They have extensive programming skills and can write SQL queries too. The major difference is that these data engineers have the programming and SQL skills to choose between the two.

Why These Differences Matter to You

It’s crucial that managers know the differences between these two types of data-engineering teams. Sometimes organizations will have a SQL-focused data-engineering team attempt a big data project. These sorts of efforts are rarely successful. For big data projects, you need the second type of data engineer and a data-engineering team that is big data–focused.

For individuals, it’s important to understand the required starting skills for big data. While there are SQL interfaces for big data, you need programming skills to get the data into a state that’s queryable. For people who have never programmed before, this is a more difficult learning curve. I strongly recommend that SQL-focused people understand the amount of time it takes to learn how to program and the difficulty involved.

Only by knowing and understanding these two definitions can you be successful with big data projects. You absolutely have to have the right people for the job.

1 A version of this chapter was originally published at jesse-anderson.com.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.216.123.120