Published on

DataCamp

I started learning ML. Here's a resource I found helpful.

Authors

I've been developing an interest in Machine Learning recently, and I decided to start getting my feet wet. It didn't take me long to figure out that most of it boils down to Data Science and Statistical analysis. At least most of the words I had too look up led me to those subjects.

It's been a few years since I toyed around with Python, but it's pretty much the De facto language for Data Science and Machine Learning right now. If not for the sake of capability, but for the sheer number or libraries available. There's NumPy, Pandas, and others for helping you manage and visualize large datasets. Keras and TensorFlow's Python API are both wonderful for developing neural networks.

So I started seeking out resources for learning a bit more about using Python for Data Science applications.

Dataquest

I stumbled upon Dataquest, whose homepage offers helpful testimonials such as "Dataquest helped me get a job at SpaceX", or "I'm a Data Scientist because of Dataquest". Despite the dubious claims of having furnished science degrees, I decided to have a look. It was alright, the curriculum seemed solid, but I was constantly running into bugs. Their editor was cumbersome and had issues pretty regularly (I'm used to vim, which didn't help). Eventually I just lost interest.

DataCamp

Next I tried DataCamp. These guys had a more beginner-friendly focus, which honestly was a bit off putting to me. I already knew or could look up the basics of using Python, I really just wanted to get my hands dirty. But I liked the way they guided me through lists in a way that allowed them to later show my how NumPy held several advantages over the built-in data types. If I had less programming experience it would have been perfect.

# heights and positions are available as lists

# Import numpy
import numpy as np

# Convert positions and heights to numpy arrays: np_positions, np_heights
np_positions = np.array(positions)
np_heights = np.array(heights)


# Heights of the goalkeepers: gk_heights
gk_heights = np_heights[np_positions == 'GK']

# Heights of the other players: other_heights
other_heights = np_heights[np_positions != 'GK']

# Print out the median height of goalkeepers. Replace 'None'
print("Median height of goalkeepers: " + str(np.median(gk_heights)))

# Print out the median height of other players. Replace 'None'
print("Median height of other players: " + str(np.median(other_heights)))

I quickly ran through the free lessons, and made the choice to buy the subscription, which is about $30/month. Since then I've been digging into Pandas dataframes, and I'm really enjoying it!

# Import cars data
import pandas as pd
cars = pd.read_csv('cars.csv', index_col = 0)

# Import numpy, you'll need this
import numpy as np

# Create medium: observations with cars_per_cap between 100 and 500
cpc = cars['cars_per_cap']
between = np.logical_and(cpc > 100, cpc < 500)
medium = cars[between]

# Print medium
print(medium)

The videos are great, and the practice sessions are nice little bite-sized chunks that build upon each other. Each section in every chapter has a more challenging project where you have to synthesize what you've learned.

What I'd really like to see would be some more challenging projects where little to no guidance is given. Perhaps a few of those at the end of each chapter would help the knowledge "stick". For now, I'm happy to have fun learning, and will look elsewhere for applications.

If you're getting started with Data Science or Machine Learning and have some resources/insights to share, send me a message at nathandcornell -at- gmail -dot- com. I love learning and want to keep making connections with others that feel the same.