Starter Resources

These are some starter resources to provide beginners with a base to get started learning Machine Learning. 

If you find any of these links or concepts difficult, feel free to email us at and we will try to provide resources better suited to you.

General Resources

A great course for beginners introducing Machine Learning with self-paced programming exercises - ~3 hours

A great video explaining the important concept of Bias and Variance:

Resources for Problem Statement 1

Can you predict the number of shares an article will get?

Here is a competition about predicting the price of a house given the size, location, etc. This would be similar to what you have to do regarding the number of shares.

This is how your final work would look like in the ML-thon (in less depth)

Great Videos for Learning:

StatQuest by John Starmer is an amazing Youtube channel for learning ML algorithms.

Here are some important concepts. If you come across any other new algorithm, try searching this channel, you might find it well explained here.

Resources for Problem Statement 2

Can you predict the face attributes of a celebrity?

You are going to have to be familiar with Neural Networks!

Here is a good series by 3Blue1Brown to understand how they work:

Part 1

Part 2

Part 3

You should also go through CNNs (Convolutional Neural Networks). They are the most important for image tasks. You could look it up online, but the below video summarizes them pretty well

Practice Challenges

MNIST is the standard dataset to benchmark and get started with Computer Vision. The task is to predict which digit is written from the image of a handwritten digit.

This is how you would go about training a model in this dataset.

If you are up for the challenge, look at "Fashion MNIST" or the "CIFAR-10 dataset". I would recommend the CIFAR-10 dataset as these images are colored (which is how the ML-thon dataset will be).

There is also a concept of transfer learning, where you use ML models already trained by others, and use it for your task. The CIFAR-10 dataset is good to explore it. You will need it for the ML-thon.