Main image of article How to Become a Data Architect

Companies survive and thrive on data. The ability to store, secure, move, and analyze data is absolutely crucial to executives’ long-term strategies. Fortunately for tech professionals everywhere, that also means companies large and small will need huge numbers of data specialists for the foreseeable future.

One of those specialists, the data architect, is much in demand. According to Lightcast (formerly Emsi Burning Glass), the role is expected to grow 9.3 percent over the next decade, and the median salary is $115,379.

But becoming a data architect also requires mastering a key set of skills, as well as grasping some complicated concepts. If you’re interested in becoming a data architect, let’s look at what they do and what steps you must take to become one.

What is data architecture?

Data architecture deals with connecting data from many different data sources, combining the data together, and building it into something usable.

Suppose you work for the corporate headquarters of a large restaurant chain. The company is building out an online-ordering feature, complete with delivery and takeout options. In order to make this work, your company wants to integrate as many food ordering apps as possible, in addition to building their own customized app. If that wasn’t complicated enough, the executives also want to build an ordering feature into their website.

Aside from the technical aspects of building the software to accomplish these goals, this project requires multiple sources of data working together. Different food delivery apps have their own way of sending data to your company, each with their own data format. That data needs to be combined into a single dataset in your company’s chosen format.

Now think about what happens at that point: Does the data just get pushed to the restaurants and ignored from there? Probably not. Instead, your company wants to be able to save and track all that data, giving it massive customer dataset it can analyze for crucial insights. The company will likely want to know, for example, overall trends such as what areas are seeing the most online orders, the foods ordered most often, and so on.

This is where you come in: As the data architect, you will be modeling and architecting the data aspects of the system. You won’t be writing code to read the data or send it to the various parties interested in analyzing it; that part is handled mostly by software developers and engineers. Instead, you focus on how the data is stored  and managed, how it’s received from different apps and manipulated to fit into your own system, and how it’s manipulated and changed before being sent out again. 

In various industries, data architecture is an absurdly complicated endeavor. Depending on the nature of the job, you’ll have to work with a variety of programming languages (including, but not limited to, SQL, Python and R), figure out how to integrate datasets from all kinds of external sources, collaborate with data scientists and analysts on data modeling and visualizations, and often deal with data-related challenges in real time.

When it comes to important skillsets, data architects ““must be technically competent in data architecture modeling best practices, they must be technically competent in use of chosen data architecture modeling software, and they should have a better than average understanding of the business environment in which they’re working with,” Anne Marie Smith, PhD, vice president of education and chief methodologist for EWSolutions, recently told Dice.

What should you learn to become a data architect?

In order to accomplish all of the above, you need to fundamentally become an expert in how data is modeled. Data modeling refers to dividing up the data into different “types” of data, often called tables. For the restaurant, there would be a table representing a customer, plus a table for products/orders, and so on. Multiple tables might be used to represent the change of data over time (for example, a shift in menu prices).

At the core of data modeling is Structured Query Language, or SQL (often pronounced “sequel”). It’s critical to learn SQL is you’re working with data: fortunately, there are lots of training options. If you’re a self-learner, there are lots of online materials available, including this helpful offering from w3schools, which breaks down the various elements of SQL into “chapters.”

Other useful tools include database server software such as MySQL, SQL Server, Postgresql, and Oracle Database. These tools all understand the SQL language and are considered relational databases. In addition to relational databases, there are also databases that use other ways to store data; these are called NoSQL databases, since they diverge from the traditional relational databases that use SQL.

Here, then, are topics to learn with tools you can install on your own computer:

  • SQL language
  • Database servers including MySQL, SQL Server, PostgreSQL, Oracle Database Server
  • NoSQL tools, including MongoDB and Couchbase

Today, data is often spread out among multiple servers in the cloud, meaning you need to also learn a set of cloud-based tools that go far beyond the smaller data systems we just mentioned. Data can be modeled to span massive systems that only clouds can handle, and with that comes different tools for managing the data.

There are three “big” cloud providers, each of which has different toolsets for managing large sets of data:

There are also sets of tools used across different cloud providers meant for processing large amounts of data (i.e., “Big Data”). One topic to study is called data warehousing. Then with that, you’ll want to learn these technologies:

  • Apache Spark: This is an open-source tool for managing large amounts of data spread across multiple computers.
  • Hive: A data warehousing tool for handling massive amounts of data.
  • Big Data storage formats: This is a long list, but includes names like ORC, avro, and parquet.

These cloud-based tools are evolving rapidly, so plan to be a life-long learner. Tools that were popular ten years earlier might not be used as much; and the biggest cloud platforms are adding features on a rolling basis.

In addition to all of the above, you need to understand how data moves and flows between systems. This is a concept known as ETL, which stands for Extract, Transform, Load. Think of the example of the multiple food delivery apps sending data to the restaurant company. That company needs to extract the data from those sources, which will be in different formats, transform it into something their business can use, and then load it into their own database systems.

Finally, you’ll need to know some programming. You don’t need to be an expert software developer, but at the very least you’ll want to learn beginning to intermediate Python programming, because some of these tools require a bit of Python programming to manage them.

What about school?

In order to land a job as a data architect, you will definitely want to obtain at least a bachelor’s degree in either computer science, math, or a data-related field. And if you’re willing, you should seriously consider a master’s degree program, as well.

Conclusion

Data architect is an advanced field that pays quite well, but you will need to build a long-term plan to get there. Many jobs require at least a master’s degree plus 8 to 10 years of experience.

How do you build up that experience? You can work in related fields such as data analyst, or any position that requires a good deal of data modeling. Many software developers who work with big datasets on a frequent basis also make a successful jump to data architect. The key part is to craft a long-term plan and stick to it, adjusting as circumstances arise.

 

Related Data Architect Jobs Resources:

Data Architect Interview Questions