Deep Learning

Google Drive + Google Colab + GitHub; Don’t Just Read, Do It!

How GitHub, Google Colab, and Google Drive work together; How to deal with custom files, and push Jupyter notebook changes to GitHub...

Written by Vortana Say · 7 min read

How GitHub, Google Colab, and Google Drive work together; How to deal with custom files, and push Jupyter notebook changes to GitHub Repo.

Interaction Between The Three Components
Interaction Between The Three Components

I have recently been accepted into Phase I of the Bertelsmann Tech Scholarship Challenge Course — AI Track Nanodegree Program. In the lessons, Jupyter notebook exercises are provided by Udacity in this GitHub repo. I have been using Anaconda Jupyter and Jupyter lab locally to run the exercises, but I noticed there are more tasks later on in those lessons that require more extensive computing, and running time keeps taking longer and longer. With my old laptop, it’s too much work for the CPU, so I have to find other alternatives.

Google Colab has been suggested in the Slack community. It has a pre-set-up environment for the Jupyter notebook which runs entirely on the cloud (my laptop is very grateful 🐌). Most importantly, It provides “FREE” GPUs (Graphics Processing Unit).

The goal of this article is to understand:

  • How to integrate Github repo to Google Colab (with steps provided)
  • How to deal with custom Python files
  • How to save your changes on Jupyter notebooks to different branches, and keep master branch clean.

Oleg Żero’s article and nataliasverchkova’s GitHub repo provided very good guidance and helped me start the setup. However, for me, I found several challenges while going through the implementation. I’ve spent hours looking for additional resources to resolve those issues that I have encountered but there’s no straightforward guide to achieve what I want, so I hope this article can help others who face the same issues.

Out of Box Solution Google Colab and GitHub

In this notebook, there are several ways of loading/browsing notebook files directly from GitHub. You can access your notebooks in your private repository (GitHub authentication required) or in public repositories.

Colab-Github-demo Jupyter notebook
Colab-Github-demo Jupyter notebook

If you click on http://colab.research.google.com/github, it will direct you to a pop-up where you can search by GitHub URL, organization, or user.

After I make changes, I can save those files back to the GitHub branch. It’s pretty easy, right? Not so fast! This approach will work well if your notebook doesn’t have any custom files that you want to import. So, I need to find another solution.

Let’s Get Started With Google Colab + Google Drive + GitHub

Before we discuss in detail, let’s take a look at each role of those components (Google Drive, Google Colab, GitHub) and their interactions.

  • Google Colab: All the operations resize here. It is used as a shell to run bash commands and git commands. And of course, we use Jupyter notebook to run our Udacity notebooks.
  • Google Drive: When we use Google Colab, our work is stored temporarily in a virtual machine for around 8 to 12 hours. So it’s not ideal since our tasks for processing can go on for days, months, or years depending on each project. One of the solutions is to store your training in cloud storage hosting. As we all know, Google Drive is a cloud storage provided by Google. It provides free 15 GB storage and it easy to integrate with Google Colab. In this case, we use this as a location to store the clone GitHub repo that we work on permanently.
  • GitHub: A code hosting platform for version control and collaboration. It’s a good practice to use version control and branch strategy. I forked the Udacity deep-learning-v2-pytoch repository.

1. Mount Google Drive to Google Colab

In Google Colab, we are going to create a notebook to execute our commands. If you are logged in, once you create the notebook, your file will be stored in a folder called Colab Notebooks.

Colab Notebooks Folder in Google Drive
Colab Notebooks Folder in Google Drive

I divided the part of codes provided in Oleg Żero’s article into different cells in the notebook to better understand the process. To mount Google Drive to Colab, we can use:

This is very straight forward thanks to the libraries provided by Google. Follow the instruction to authenticate with Google Drive.

Running commands to mount Google Drive
Running commands to mount Google Drive

Result after authenticated and mounted successfully
The result after authenticated and mounted successfully

If you see “Mounted at /content/drive”, it means that Google Drive was mounted successfully. 🎉

If you are accustomed to terminal command, you can double-check the locations and directories:

Please refer to each comment that I’ve added for each command. Noticed the output of the last command above. We have ‘Colab Notebooks’ and ‘MyDrive’ folders. Those folders are stored in the root of my Google Drive.

2. Clone GitHub repository to Google Drive

Now we are ready to clone our GitHub repository project and store it in Google Drive via Google Colab. I am going to clone the forked repository deep-leanring-v2-pytorch.

  • In my case, I am going to store the cloned GitHub repository in this directory: “/content/drive/My Drive/MyDrive/Udacity/deep-learning-v2-pytorch”

Google Drive Directories
Google Drive Directories

We need to define a few variables that will be used in the script:

  • MY_GOOGLE_DRIVE_PATH
  • GIT_USERNAME
  • GIT_TOKEN (GitHub Access Token)
  • GIT_REPOSITORY

How to generate GitHub Access Token:

Go to user profile at the top right corner → click on setting → then choose developer settings.

GitHub personal access tokens
GitHub personal access tokens

select scope for your access token
select scope for your access token

In this case, the Repo scope is enough. To understand more about the scopes that define the access for personal tokens: https://developer.github.com/apps/building-oauth-apps/understanding-scopes-for-oauth-apps/

Note: Don’t share your access token with the public

  • In a new cell of our notebook, we are going to set up the required information:

  • After setting up the required information, let’s execute the cloning.

We have two options:

Option 1

Finally, we should see the content of the GitHub repository saved in the directory where we specified. Note that we have folder .git. It’s to indicate that this folder is a git folder.

Remember that we want to make changes in develop branch, not the master branch. so, let’s verify a few things. Here is a good resource if you are not familiar with git or need a refresh.

In the output of !git branch, there are two branches, develop and master. “*” indicates the current branch we are currently working on. By default, you will start with the master branch unless you clone a specific branch. You can check out develop branch using the below command:

Option 2

If you want to copy all the files/folders from your cloned repository in google drive to Google Colab local run time, then you can follow Oleg Żero’s article.

Note: data/ is a folder that contains large data that I want to exclude. Feel free to change this to a different folder path/name that you want to exclude.

As explained in Oleg Żero’s article:

The above snippet mounts the Google Drive at /content/drive and creates our project’s directory. It then pulls all the files from Github and copies them over to that directory. Finally, it collects everything that belongs to the Drive directory and copies it over to our local runtime.

A nice thing about this solution is that it won’t crash if executed multiple times. Whenever executed, it will only update what is new and that’s it. Also, with rsync we have the option to exclude some of the content, which may take too long to copy (…data?).

3. Make changes / Working on your notebooks

So far, here are what we did:

  • Mounted Google Drive to Google Colab
  • Cloned GitHub repository to Google Drive

This is the fun part 😃. Now we are ready to make changes and work on our notebook

4. Dealing with custom file issue

In Part 2 — Neural Networks in PyTorch (Exercises).ipynb notebook, we need to import a custom file, helper.py, to access the helper functions.

As can be seen, helper.view_classify(img.view(1, 28, 28), ps) is working.

5. Save changes to GitHub

If you choose option 2 above, then please follow Oleg Żero’s article section “Saving, calling it a day” to save your changes.

For Option 1, now that you’re done with your changes, it’s time to save and push the changes to the GitHub repository in your desired branch. Here is a git cheat sheet.

I would suggest running these commands in different cells to make sure all the file changes are properly added to the commit.

This command above will add all files; if you want to add only modified files, you can use:

Please also make changes to the comment on the commit; user.email and user.name.

Here I can verify that the changes are successfully pushed:

We are done — I hope you enjoy doing it along with this article.


I will keep adding any issues that I found related to Google Colab while I work through the course:

Issue #1

Google Colab script throws “OSError: Transport endpoint is not connected”

A solution that works for me:

  • unmount Google Drive and remount it again.
  • Runtime → manages session and terminate the notebook

Issue #2

I encounter this issue even though; I already import the custom file.

A solution that works for me:

When I import Google Drive to Google Colab; I need to make sure that the current path is the project folder.

Issue #3

Error: Mount point must not contain a space.

A solution that works for me:

  • Avoid using space in the path

Issue #4

  • Images in the asset folder are not loaded in Jupyter Notebook;

A solution that works for me:

  • Get a shareable google drive link from that image.
  • Use the id of the image from the shareable google drive link in this URL: https://docs.google.com/uc?id=[image_id] [1]

For example:

https://docs.google.com/uc?id=1HfsWHk0UiWleEiuz_bwb1dxNoQRApZo-

Issue #5

Reference image in the google drive from code in Jupyter. In custom_filters.ipynb notebook, we need to refer to the image in the data folder “data/curved_lane.jpg”

A solution that works for me:

  1. Import google drive
  2. You then have two options:
  • Reference full path to the image in the code. For instance, “./drive/My Drive/MyDrive/Udacity/deep-learning-v2-pytorch/convolutional-neural-networks/conv-visualization/data/curved_lane.jpg”
  • Change directory to the parent folder and access data folder. So the path to the image can be shortened to, “./data/curved_lane.jpg”

Issue #6

I encountered the following error in conv_visualization.ipynb notebook.

— — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — –
error Traceback (most recent call last)
<ipython-input-1–9775b898dc2d> in <module>()
 10 bgr_img = cv2.imread(img_path)
 11 # convert to grayscale
— -> 12 gray_img = cv2.cvtColor(bgr_img, cv2.COLOR_BGR2GRAY)
 13 
 14 # normalize, rescale entries to lie in [0,1]

error: OpenCV(4.1.2) /io/opencv/modules/imgproc/src/color.cpp:182: error: (-215:Assertion failed) !_src.empty() in function ‘cvtColor’

A solution that works for me:

This issue happens because the path of the input image is incorrect or there is a special character in the input image name. After I corrected the input image path, it’s working fine.