The popularity of Jupyter Notebook, the technology that helps us combine programming code and explanatory text in a browser-based interface, is exploding and is creating waves across different domains. Jupyter Notebook is the emerging standard for publishing reproducible documents/computer programming scripts. Many educational establishments have adopted Jupyter as a preferred technology for delivering content.
As mentioned in an earlier column (http://corporateethos.com/opinion/reproducible-research-with-the-jupyter-notebook/), Jupyter Notebook is a web application that lets you run live code in an interactive fashion. In addition, one can embed explanatory text, graphs, data tables, etc all in one place. Basically, Jupyter is a sequence of what are called cells that contain information. These cells are of different types. An important type of cell is the ‘code’ cell in which you enter the programming code. Another type is the ‘markdown’ cell, where you can type your text. So, you can write your explanatory text and embed the report with code and its output.
One of the domains in which this technology is heavily used is data science. When a data analyst reports/presents some stuff based on a data set, she needs to explain how she collected the data, how she cleaned it, the methods/techniques used in the analysis and how she arrived at her conclusions. Ideally, if the researcher or analyst explains all her steps in a live interactive way, her audience will comprehend it without much effort. This is where Jupyter Notebook excels as a wonderful content delivery tool.
Assume you have stumbled on a data set and wish to run an ad hoc data exploration without relying on IT support. In addition, once done with the initial data exploration you may wish to continue the analysis along with your team members. So, you want to do an initial analysis of your data on the Jupyter Notebook environment, then continue analysis with your colleagues in an interactive fashion. This is where the cloud computing offering from Jupyter assumes significance.
Jupyter cloud offering
The Jupyter service offers a free cloud version of its product for free. The cloud version supports a bunch of programming languages that include Python, Ruby and Scala. So, if you are learning any of these programming languages and want to experience the benefits of Jupyter Notebook environment, jump over to the service (https://try.jupyter.org/).
Now, let us come back to the issue of our data analyst discussed above. To do data any kind of data analysis, we need one or more data analytics applications (like R, Python, Julia etc.) at our disposal. Along with the general purpose programming languages mentioned above, the Jupyter cloud service provides a few statistical programming languages (like R, Julia etc.) as well. Therefore, if you use the Jupyter cloud offering you don’t need to worry about installing a special statistical software – unless your favourite statistical tool is not supported by Jupyter Notebook.
To access the Jupyter cloud service, jump over to the link at: https://try.jupyter.org/ . When you invoke the service you will find different notebooks. These notebooks represent notebook templates for different kernels (programming languages or data analysis packages) supported by the cloud application. Depending on your expertise/preference, you can choose the one that suits you. For example, If your preferred statistical tool is ‘R’ and wish to perform data analysis with it, select the notebook ‘Welcome R ….’. When you access a template or create a new notebook, you will get into the usual notebook interface.
Here, you should remember one thing: this is a cloud service and you are accessing it without any account. This means, it will not be saving anything in your current notebook on the server. However, to help you save the content of your session, Jupyter allows you to download (an option under the ‘File’ menu) the notebook to your local machine. You can download the notebook either in notebook format or in other common file formats such as HTML, PDF and Markdown. And to restart the session at a later stage, you can upload this saved notebook and resume the data analysis task (Upload option the right-corner).
Unlike a programming project, a data analysis project involves, generally, data files. This means you need to somehow upload the data to an appropriate location in the cloud server. Now, let us see how we can upload the data before moving further. When you access the service, in its initial interface, you will find a folder icon with the name ‘datasets’ (see the screenshot below).
Now, click on this folder and you will obtain a list of folders and files. Here, you will find another folder, which is also labelled ‘datasets’; click on this. Upload your data to this location, using the ‘Upload’ button.
At this point go back to the home page of Jupyter cloud, start your notebook and access the data file with appropriate commands. When you enter the name of the source data make sure to provide its full-path – for example, if the name of the file is ‘life.csv’, prepend it with the path: “datasets/datasets/listeria.csv”.
So, if you wish to analyse a data set and demonstrate/discuss the computational process and findings without worrying about the statistical tools available locally, you would certainly love the Jupyter cloud offering.
A tech tip: PDF to Word converter
Many documents come in PDF format and on many occasions, you may face the need to convert some of these documents into another format (Word, Excel etc). Though free utilities like pdf2txt (http://www.pdf2text.com/) exist, such solutions may not always produce proper output, especially if the document contains lots of data tables. If you are looking for a better, practical and free method to turn a PDF into an editable Word document, take a look at the online PDF to Word conversion service, PDF2DOCX (http://pdf2docx.com/). Access the service and upload the PDF document to it; the service will convert it into the specified format and in a few seconds the service will make the converted document available to you and let you collect it via the ‘Download’ button. You can submit more than one file (in a batch, you can submit up to 20 files) and the service will convert each of them and pack them all together into a single zip format archive file.