Data Analyst Primer: The Essential Guide

An essentials guide for aspiring data analysts, priming you with the key skills you need to know to become one.

Nathan Cheng
The Startup

--

Data Analyst Summer Internship
Data Analyst Summer Internship. Photo by Ethan Robertson on Unsplash.

Back in the summer of 2018, I was just starting my first internship as a Data Analyst.

Data science was all the rage back then, with the data scientist being heralded the sexiest job of the 21st century. I remember reading articles featuring the famous Venn diagram that described what a data scientist was.

In hindsight, the Venn Diagram wasn’t very descriptive, but it provided a starting point to pick up the tools and knowledge that would eventually help me launch my career as a data analyst.

Data Science Venn Diagram.
Data Science Venn Diagram.

After fumbling around with a bunch of different tools, working multiple stints in MNCs and startups over the years, I’ve developed a pretty good idea of the key skills and competencies companies look out for in data analysts. And so, this guide is designed to prime you with the bare essentials, and hopefully allow you to get your foot into the world of data, and land that first entry-level data analyst job.

Before that, let’s first cover some major shifts in computing that are important for an aspiring data analyst to know if they plan on jumping into the field.

The Rise of Cloud Services and Serverless Computing

With the advent of cloud services and growth of new support and tooling for data practitioners, the barriers to entry to access data have reduced significantly.

Collecting, storing, and disseminating data to others within the organization is now cheap and frictionless.

Big 3 Cloud Computing Providers.
Big 3 Cloud Computing Providers.

Companies are hosting computing workloads on GCP, AWS and Azure, so their data engineers can focus on building data pipelines that pipe data into a centralized data warehouse, instead of worrying about maintaining servers and hardware upgrades.

In today’s age, it is trivial to spin up a cluster of nodes in a remote data centre, and let your cloud provider manage the allocation of machine resources.

We call this shift towards a new model of building applications Serverless Computing, rendering the days of building an on-premise data centre feel like a distant dream.

Democratizing Data Access for Data Analysts

Frictionless data access means that now more than ever, organizations are finding it easy to build data science departments to take advantage of data assets, and are hiring analysts to crunch data, in the hopes that they will discover insights.

With this in mind, the goal of this article is to help you take advantage of this rising demand, and start your journey as an aspiring data analyst.

So what exactly do data analysts do? And what are the most important skills to start learning right now to get started on your journey?

What to Expect as a Data Analyst

Among the different roles in data science, the data analyst has by far the simplest learning curve.

You don’t have to code as much as a data engineer, nor do you have to know statistics well enough to the point of being a data scientist or machine learning engineer.

Referring to the data science hierarchy of needs below, you can see the data engineer is usually the person responsible for building data pipelines that move data from databases into a centralized data store called a data warehouse.

This is essentially the collect and move / store layers in the pyramid.

Data Science Hierarchy of Needs.
Data Science Hierarchy of Needs.

The data is then further transformed by data analysts to discover insights that are used by business users to influence decision-making.

Analysis can be presented in the form of a dashboard, a slide deck, or whatever tool is best suited to present insights and recommendations.

To illustrate this further, here are common descriptions curated from data analyst job descriptions on Linkedin:

  1. Strong analytical skills with the ability to collect, organize, analyze, and disseminate significant amounts of information with attention to detail and accuracy.
  2. Experience partnering with business and using data to influence stakeholders and provide actionable recommendations.
  3. Ability to conduct rigorous analysis and communicate conclusions to both technical and non-technical audiences.
  4. Proficiency in SQL with experience in querying large, complex data sets. Strong Excel skills, Python and R.
  5. Proficiency in Tableau, or similar data visualization tools is a plus.

In summary, data analysts work closely with business users to make sure they are satisfied with the insights generated from the data.

They are also the ones responsible for ensuring the analysis is accurate, communicated clearly, and stored in an accessible place business users can refer to.

Essential Skills and The Learning Journey

So now, knowing what value a data analyst brings to an organization, let us move on to the required skills necessary to become one.

I have condensed this down to 5 key points, and will not cover Microsoft Excel as I am assuming anyone who is interested to become a data analyst will already have the basic knowledge to crunch data using standard Excel functions.

I will also not be covering soft skills like stakeholder management and communication, although do note the effectiveness of a data analyst’s output is highly correlated with your stakeholder’s ability to understand it.

Therefore, data analysts have to be clear communicators, presenting their thoughts in a persuasive manner, in order to effectively influence stakeholders to take the correct course of action.

1. Master SQL

SQL will be the most helpful language you learn in your journey as a data analyst.

SQL is human-readable, and declarative, meaning that you do not have to tell the SQL query engine the exact steps to execute the query to pull data. The engine has free reign to explore and figure out the most efficient method to return the output back to you.

Contrast this with a procedural programming language like Python, where you, the programmer, will have to tell the program in what order to execute the data transformation steps to get the output you want. This is why SQL is not considered a traditional programming language by many.

Aside from its declarative nature, there is also increasing support for SQL in big data tools. SQL abstraction layers have been built on top of Big Data processing frameworks such as HiveQL and SparkSQL.

SparkSQL.
SparkSQL.

This means you can utilize the same SQL knowledge and tap on powerful big data processing frameworks at your disposal, in the event you run into limitations with your current data processing engine, which eliminates the need to learn a new language from scratch.

The SQL language has been relevant for over 40 years, and will continue to be the primary way data analysts query data for the foreseeable future. In that regard, it provides the best return on investment in your career.

2. Pick up a programming language. (I recommend Python)

Although not compulsory for entry-level data analyst jobs, I highly recommend data analysts to pick up a programming language and learn the basics of data structures and algorithms.

It is inevitable at some point in your career, you will reach the limits of what you can do with SQL and need a programing language to help you interact with APIs to pull data, automate A/B tests, or conduct sentiment analysis.

A programming language adds an essential tool in your arsenal that provides flexible options for manipulating and creating value with data.

If you are language agnostic, Python is a great language to start with. Many popular data science libraries such as numpy and pandas are written in Python, and there is increasing support for Python in big data processing frameworks such as PySpark.

Python also has a syntax that is easy to comprehend for someone new to programming.

Python Data Science Libraries 2020.
Python Data Science Libraries 2020.

3. Learn a tool for data visualization.

Visualization tools enable data analysts to disseminate their findings in the form of automated dashboards to business users, with an intuitive drag-and-drop interface that allows less technical folks slice and dice data.

Tools like Tableau, PowerBI, and Looker are ubiquitous in organizations.

While these tools provide similar features across the board, there are minor nuances when comparing different tools. The good news is that learning one will allow you to transfer the knowledge you have obtained to other tools.

Visualization tools are relatively intuitive to learn compared to SQL or Python, so pick one and roll with it. Tableau and Looker are both good choices with widespread adoption in many organizations.

Popular Data Visualization Tools.
Popular Data Visualization Tools.

4. Pick a problem space you are passionate about.

A lot of a data analyst’s day-to-day involves breaking down the business problem into a set of questions that can be answered using data.

The kinds of problems you face at an eCommerce company will be vastly different from those faced by a manufacturing company for example. In other words, being a good data analyst in an eCommerce company does not mean you’ll be able to come in and excel as a data analyst in a bank.

Domain knowledge and expertise matters. I would argue it is just as important as your technical abilities because it allows you to narrow down the scope of data to analyze. Your experience will guide you to the best areas in the data to mine for insights, enabling you to be more efficient at your job.

If there is a certain problem space that compels you to explore the underlying dataset, go ahead and build an awesome personal project showing the insights and actionable recommendations you found for that particular domain.

You can then host your project on Github. Alternatively, Tableau also allows users to post their own data visualization project on their site gallery.

Github Logo.
Github Logo.

This is the most effective way to land a job if you have no prior working experience, as it demonstrates initiative and skill to hiring managers and recruiters, and shows that you are passionate to make an impact in that particular industry or domain.

5. Revise basic statistical knowledge

Finally, a data analyst should have basic understanding of statistics, A/B testing, and online experiments.

Any company with a web or mobile app will definitely design experiments to validate whether new product features have improved metrics for the company, for example ARPU.

Hypothesis Testing Visualization.
Hypothesis Testing Visualization.

Often, product managers will look to data analysts to help design such experiments. To do so, understanding hypothesis testing, significance levels, p-values, sample size, Type I / Type II errors, and the various factors that could invalidate your test results are essential.

Peep Laja has done an excellent primer on A/B testing that covers these topics in depth.

Final Thoughts and Advice

Phew, and that’s it! That was a lot to take in. With some tenacity and a bit of luck, you are well-equipped on your way to landing that first data analyst job in no time.

It must be mentioned that the world of data changes rapidly, with new tools coming out every few months. If you relish a challenge, data is an extremely dynamic field to build your career and I’m sure you won’t regret your decision.

Just remember, no one had it easy, and learning these things take time, so don’t try to learn everything at once, otherwise you’ll find yourself getting overwhelmed. Instead, focus on reaching proficiency with one or two tools first and build an awesome project with what you’ve learned.

Practice makes perfect, it is crucial to apply the skills you have learned to real-world problems to internalize how they work. Once you’ve reached proficiency with a programming language or a visualization tool, the next one becomes much easier to learn.

A piece of parting advice is to be flexible, keep an open mind, and experiment with new tools often to keep up with the pace of change in the data world. With that, I wish you all the best in your journey!

--

--

Nathan Cheng
The Startup

🛠 Data Analytics | Machine Learning Engineer @ Rackspace Technology. 📚 Writer for Towards Data Science. linkedin.com/in/nathancheng-data