A Data Scientist’s Journey From Novice to Decent — Part 2

Mike Chung
7 min readMar 27, 2022

Welcome to Part 2 of a two-part series describing my data science journey! If you haven’t read Part 1 of this series yet, you can find it here.

In this article, I’ll be focusing on the courses that I took to transition from data analyst to data scientist. The divide between a data analyst and data scientist is not always clear-cut, but in my experience, the major difference is that data scientists do more modeling and machine learning to predictive things, while data analysts tend to do more dashboarding, reporting, and data visualization to find insights.

Here is an article I found helpful to distinguish between data analysts and data scientists.

Without further ado, let’s dive into my journey.

Working as an Analyst

After my stint at the non-profit organization, I worked as an analyst at a manufacturing company. My main duties involved forecasting and market research, as well as some miscellaneous tasks helping business people with Excel. A lot of the work was wrangling data, so I automated the tasks using Python whenever possible. Much of the tooling was in Microsoft Office and VBA, so I also had many opportunities to apply my skills learned from my MCSA certification.

For my job, I was presented with open-ended problems of improving forecast accuracy and gathering market intelligence, so I had numerous discussions with my manager about different techniques we could apply. I read a ton of articles on time series forecasting such as naive, ARIMA, and other statistical methods. I also tried some machine learning methods such as decision trees just for fun and to see how they compared to the classical methods. For market intelligence, I searched for open-source data and built automated reports and dashboards that used web APIs.

I eventually resigned from my analyst job to study data science full-time.

Coursera’s Deep Learning Specialization

This is a course I recommend to all data scientists who want to build deep learning models. It provides a gentle introduction without getting bogged down by all the gory math details. The instructor, Andrew Ng, is a professor at Stanford and has a knack for explaining complex subjects in an easy-to-understand way. His catchphrase is “Don’t worry about it if you don’t understand”.

“Silent Protector” Andrew Ng meme.

The specialization brings you through 5 courses with a natural progression from the foundation of neural networks, forward and back propagation, to convolutional neural networks (CNNs) and sequence models. The lecture videos are an easy watch and every course has a challenging assignment where you implement the principles you just learned into Python code in a notebook environment. You are also introduced to TensorFlow, one of the most popular frameworks for deep learning.

This course is great because it empowers you with the fundamentals and also exposes you to the wide variety of algorithms and models use in machine learning. To this day, whenever I am put on a new data science project, I still refer back to the content I learned in this course when exploring what approach to use.

A month of access to the Deep Learning Specialization costs $62 and it took me two months to finish the course studying full-time.

Coursera’s Generative Adversarial Networks (GANs) Specialization

After being equipped with the basics from the Deep Learning Specialization, I wanted to differentiate myself from the crowd by focusing on an area I was passionate about. At that time, I came to learn about Deepfakes. Deepfakes use deep learning to synthesize media, with one of the most common uses being to replace a person’s face in an image or video with another person’s. The ramifications of this frightened me and I wanted to do something about malicious actors abusing this technology. I learned that Deepfakes were powered through GANs and decided to take the GANs Specialization, which was offered by the same organization that offered the Deep Learning Specialization.

The GANs Specialization is an advanced specialization with 3 courses. The format was similar to the Deep Learning Specialization. The instructor, Sharon Zhou, is a graduate student at Stanford, and provided links to many academic papers if you wanted to get into the mathematical details. The assignments are done in Python notebooks and use PyTorch, another popular deep learning framework. The assignments were quite challenging, but they had a channel where you could ask questions when you get stuck. The content is a nice mix of the concepts and algorithms behind GANs and applications of GANs in image synthesis.

For those interested in breaking into the world of GANs, this is the only online course I found besides some one-off courses in Udemy. GANs are still pretty new, so I’m hoping they release more courses on GANs in the future.

The cost of the GANs Specialization was the same as the Deep Learning Specialization at $62 per month and it also took me two months to complete.

LinkedIn Learning Courses

I purchased a LinkedIn Learning yearly subscription for $320 and took some grab bag courses to improve my programming skills and familiarize myself with data engineering. The courses are mainly videos with short multiple choice quizzes after each major section. There’s a lack of actual coding in these courses, although some courses do have exercises that you can do on your local machine. For the most part, I just watch these videos when I have downtime at work (basically, when I’m waiting for my code to run) and want to do something fruitful.

Here’s the list of the courses that I took:

  • Using Python for Automation
  • Data Engineering Foundations
  • Apache Spark Essential Training: Big Data Engineering
  • Learning Hadoop
  • Big Data Analytics with Hadoop and Apache Spark

Free Udacity Courses

For those who want to try out Udacity’s platform or don’t want to commit a few months for a Nanodegree, there are many free courses offered by Udacity. These courses offer the same environment as a Nanodegree with short videos and quizzes, but you don’t get the reviewed assignments or human feedback.

These are the free courses I took from Udacity:

  • Intro to Data Analysis
  • Time Series Forecasting
  • Problem Solving with Advanced Analytics
  • Data Visualization with Tableau
  • Machine Learning

Free Kaggle Courses

Kaggle provides data science and machine learning competitions with prize money for the best-performing models. They also provide a bunch of courses that provide a starting point for you to enter into their competitions.

Their courses are bite-sized and well-sequenced bits of knowledge that are short readings followed by coding in a notebook environment. It was a nice and practical review of many of the techniques I learned in my previous courses. The courses are suitable for beginners and even though I already went through a data science program when I took their courses, I still learned many useful functions and tricks to deal with different kinds of data.

Here’s the complete list of courses I took from Kaggle:

  • Python
  • Intro to Machine Learning
  • Pandas
  • Intermediate Machine Learning
  • Data Visualization
  • Feature Engineering
  • Intro to SQL
  • Advanced SQL
  • Intro to Deep Learning
  • Computer Vision
  • Time Series
  • Data Cleaning
  • Intro to AI Ethics
  • Geospatial Analysis
  • Machine Learning Explainability

Working as a Data Scientist and Continued Learning

I found a job as a Data Scientist about a year ago and have been put on a variety of projects in computer vision, natural language processing, price modeling, and MLOps initiatives for digital commerce.

Even though I have a job, I’m still on my data science journey and will still be on my journey in the many years to come. I try to spend roughly 10% of my time taking online courses, attending conferences, watching presentations, reading articles and papers.

These are some of things I’m currently doing to stay up-to-date on the field of data science and machine learning:

They say that data science + field you’re passionate about = success, so I’m also on the lookout for courses and materials in the following fields:

  • Price modeling
  • Computer vision
  • Audio engineering
  • Anomaly detection and fraud detection
  • Applications in cybersecurity and digital forensics
  • Machine learning engineering and MLOps

To get better at data science, you need to practice it, similar to how you need to exercise to get fit. And one of my favorite fitness quotes is

Something is better than nothing.

If you don’t think you can do a program, do a course. If you don’t think you can do a course, watch a video or read an article. Do a little each day and soon, you’ll find that you have all these skills in your arsenal and be ready to tackle any data science problem out there.

Thanks for reading! See you next time.

--

--

Mike Chung

Data Scientist and Lifelong Learner. Currently working at Launchpad.AI and Nike.