A Data Scientist’s Journey From Novice to Decent — Part 1

Mike Chung
Analytics Vidhya
Published in
6 min readMar 9, 2022

--

Most of the Data Scientists I see have this thing in common: they have a ton of professional courses and certifications under their belt.

Data science is an ever-growing field and there are breakthroughs seemingly every few months. Here are some highlights from the past years just to illustrate my point:

  • OpenAI released GPT-3 in June 2020, a new language model with 175 billions parameters that can generate impressively fluid text
  • Facebook created SEER in March 2021, a billion parameter self-supervised computer vision model that can learn from random groups of images that do not need to be labelled
  • TensorFlow and PyTorch, the two main deep learning frameworks, both had new versions released in 2021

Data science is also very broad and it’s impossible to be an expert in all the areas. This was something I struggled with and still struggle with as a Data Scientist. Many times, when I’m put on a new project, there are new libraries, packages, and models I need to learn, which gives me a serious case of imposter syndrome.

After working through different projects in natural language processing, computer vision, and building a variety of machine learning models, I feel like I’ve finally got to a point where I consider myself a decent Data Scientist. I’m currently working with a company that provides AI solutions for a large sportwear brand. Of course, I still acknowledge that I have much to learn.

Words from one of the greats.

For aspiring Data Scientists in 2022, I want to share my experience and honest opinion of the courses and programs that I’ve taken, as well as how my career progressed to where I am now. I’ll also try to include costs of the courses and programs (in CAD) since that’s an important consideration as well .This isn’t meant to be prescriptive, but rather to give you an idea of what it takes to become a Data Scientist.

There’s no single path for Data Scientists, but rather, do your research, make sure you have a solid foundation, fill in any skills gaps as you need, and explore the areas that interest you. And above all, make it a habit to learn something new every day.

My Academic Background

I got my bachelor’s degree in Chemistry in 2015 and my master’s degree in Chemical Engineering in 2018. Looking back, these were the courses that were helpful in my data science career:

  • One term of statistics
  • One term of linear algebra
  • Calculus up to ordinary differential equations
  • Physics courses in classical mechanics, electricity and magnetism — they taught basic Python and we had to use it in our assignments
  • Graduate chemical engineering mathematics course — we used MATLAB and learned about topics such as numerical approximation and solving systems of equations

During my time in school, I also did research projects in analytical chemistry, physical chemistry, and water desalination. This required some data analysis work that was done in Excel.

Beginner Programming Courses

I’ve always liked learning new things, so I fiddled with some programming online courses after I graduated with my bachelor’s degree too. Here are some of the more notable ones:

  • Codecademy: I went through their free Python, HTML, CSS, JavaScript, and PHP lessons
  • University of Waterloo professional development: Short courses on C++ and Java ($190 per course)

I liked that these courses presented the topics in digestible chunks of reading followed by problems that you need to actually get your hands dirty and write code to solve. If you are new to programming, I’d definitely recommend Codeacademy, although some of their courses now require a subscription to access. However, if you learn better by watching videos, you may be better off finding introduction to programming lectures on YouTube or offered by a university.

Udacity’s Data Analyst Nanodegree

This was my first concrete step into learning about data. I was working in a non-technical role at a non-profit organization offering funding for research projects at the time. I had a super supportive manager who encouraged me to pursue my interests, so I did some data analysis work on Excel. Also, there were some data entry tasks that I automated by writing Python scripts. There was also a data specialist working in the same team who gave me some technical tasks when I told him I liked programming and working with data. So I was introduced to VBA and had the opportunity to debug some Excel macros. Honestly, I found VBA pretty painful, although it does have its uses if you’re working with non-technical people who do most of their work in Microsoft Office.

I had a yearly allowance for professional development, so I decided to look for data analysis online programs. I came across Udacity’s Data Analyst Nanodegree, which was around $900 with the discount, and decided to go for it. I enjoyed the program — it went over some of the major Python packages that data professionals use, including numpy, pandas, matplotlib, and seaborn. It also covered SQL, Git, data wrangling, and some fundamental statistics such as A/B testing.

The program was structured so that you would watch short videos with quiz questions mixed in. After finishing the videos, you are given a project to work on. Most of the data was provided to you in a .csv file, but there was one project where you had to go through the Twitter API to get the data. The program also taught you how to install Anaconda and set up the Jupyter notebook environment, and assignments were submitted as a .html file of your notebook.

Each assignment was reviewed by a real person, which is great because you get feedback on your work. After you graduate, they also provide career services which includes a video chat where they give guidance on your resumé and LinkedIn.

The whole program is self-paced and can be finished in 3–4 months if you study part-time.

Microsoft Certified Solutions Associate (MCSA): BI Reporting

This was a professional certification consisting of 2 exams:

  • Analyzing and Visualizing Data with Microsoft Excel
  • Analyzing and Visualizing Data with Microsoft Power BI

I took this course because the non-profit organization I was working at started a Power BI working group which I was a part of. You just needed to pass the 2 exams to get the certification. To prepare for these exams, I audited the prep courses on edX and also went through the tutorials on Microsoft Learn.

Honestly, I would consider these courses as non-essential for data scientists, since most of your work will be done in a Python environment. Nevertheless, it was interesting to fully understand the capabilities of Excel and I can understand why many analysts can live on Excel as their staple tool. Power BI is Microsoft’s version of Tableau, and the value in learning it would depend on the situation — basically, if the company you’re working for loves Microsoft Office and has a license for it.

Dataquest Career Paths

Dataquest had a very similar layout to Codeacademy in that it provides a coding environment for you. It’s also structured the same in that you are given a short digestible chunk of reading and then immediately are put into writing code.

I got a one-year subscription for around $100 and completed the Data Scientist in Python career path. A lot of it reviewed the material in Udacity’s Data Analyst Nanodegree, but it also included lessons on the command line and machine learning. This was my first touch point with the sklearn package and building models that could actually predict stuff.

If you want to get straight into coding and skip the setup of a Python environment, then go for this option.

This is Part 1 of a two-part series detailing my data science journey. Most of the courses described in this article are more in the area of data analysis, which is how I started my journey. In Part 2, I will go over my adventures into the world of deep learning, neural networks, and data engineering. You can find the article for Part here.

--

--

Mike Chung
Analytics Vidhya

Data Scientist and Lifelong Learner. Currently working at Launchpad.AI and Nike.