Online Resources for Learning Data Science | CORPORATE ETHOS

Online Resources for Learning Data Science

By: | April 16, 2018
DataScience

Thanks to the developments in e-commerce and IoT (Internet of Things) we are flooded with billions of devices collecting all kinds of information. What one can do with this data that leave these devices across and get stored somewhere in the Net cloud? This is where the discipline of data science, which strives to figure out ways to make all this data useful, comes in handy.

Data science involves data collection, analysis and drawing conclusions from data. Modern data science is a combination of disciplines such as computing and statistics. At its core data science involves activities such as data exploration (finding the patterns that exist in the data, using statistics, data visualisation etc), inference (to determine if the patterns we figured out are reliable) and predicting (making informed guesses).

muralicolOrganisations across the world are becoming more and more data-driven. There is a huge demand for people with data science skills. In this regard, you may note that data science is not rocket science and anyone with enough passion and basic arithmetic/algebra skills can pick it up. Moreover, lots of resources are out there to learn and get started with data science. A couple of them are discussed here.

DataCamp

DataCamp is an online data science school that strives to teach data science skills (such as data visualisation, data exploration, machine learning etc).  DataCamp gives you a hands-on experience through video lessons and interactive coding assignments. For each course you finish, you can earn a certificate too. One can start every course on the site for free- you simply need to sign up with the service.

The upside of DataCamp is that it has an embedded IDE (integrated development environment)  for all programming languages supported by it.  Currently, DataCamp supports Python and R. The highlight of the courses from Datacamp is the interactive and step by step method of teaching. As of this writing, the site offers more than hundred fourteen (114) courses.

1 Learning

As soon as you start a course in DataCamp you are in its interactive environment (screenshot above). As mentioned earlier, the approach is highly hands-on and from the very beginning, the service gives you some tasks with a little bit of code. You have to add some code, then run it without any errors and move to the next stage. An interesting aspect of DataCamp is the instructional videos that explain the concepts in a very comprehensible fashion.

A free Data Science course from UC Berkeley

If you wish to go through a serious course on data science, then take a look at the free course ‘Foundations of Data Science’ from UC Berkeley. The course is currently available at Edx, a MOOC platform.

The course comes in three parts. The first part focuses on programming and data visualisation. So, if you are new to programming this is an opportunity to pick up this skill. The programming language used in this course is Python. The second part will focus on statistical inference and the third part deals with prediction and machine learning. No prior programming or statistics experience is required.

A hallmark of this course is its interactive nature. For example, during the video lecture, if you come across a programming example, the service lets you access those examples and play around with it. You can access these examples using the button labelled ‘Launch Examples’ (screenshot below).

1 Learning2

When you click on this button it will open up a browser tab with the code that is being discussed in the video.

1Learning3

This means you can try out the example discussed in the video right away, without moving away from the current learning context. You don’t need to launch another programming environment or type out the code. You can run the code without any hassle, modify it and see how the code responds to those changes. This will certainly make the learning process very effective.

The programming environment used is Jupyter Notebook, which has recently become the defacto learning environment for teaching data science/programming. Jupyter Notebook is an interactive programming environment that contains three parts: the web-based notebook application (runs as a web server); the backend kernel that executes the code and returns the output and the notebook itself where you will be writing and annotating the code.

Basically, a notebook contains a set of cells. It supports three types of cells: code cells where you write code; markdown cells for describing the code/its output and the cells for raw text. As mentioned in an earlier column, “This ability to weave data/code/explanations/graphs into an interactive document has made Jupyter Notebook the darling of academics and professionals”

In the context of this Edx course,  we don’t need to worry about installing the kernel or the notebook application server. When you access the ‘Launch Examples’ button you will be taken to Notebook front end, where you can immediately start playing with code (screenshot above).

Tech Tip

You may have a site that offers excellent content or service. But if it takes too long to load, the visitor will certainly get bored/distracted and she will move away from your site and find a different one. This means you have lost a customer. Moreover, your bounce rate will explode further and search engines will see that your site is not so great and may slash its rank.

What causes a site to have slow load time and how one can fix it? Several factors control the load time of a website- server speed, the size of the images, the number of plug-ins and so on. Naturally, identifying the factors that affect the loading time of a website is an important task, especially if you are running a commercial site. This is where website speed analysis tools (like PageSpeed Insights) gain significance.

PageSpeed Insights (from Google) is a good tool that can be used to find issues affecting a site’s load time. When you visit the service, paste your site’s URL and it will give you a score out of 100 and highlight problem areas of the site. The purpose of the tool is to point out the issues, which if fixed, would improve the site’s overall load time.

1Learning4

For example, one problem area is image optimisation. When you run PageSpeed Insights, it looks at your site and checks whether any of the images are larger than they need to be and guides you as to what to do with each of the images.