Schematic diagram of the resulting architecture with Nginx and Auth0.

Published: January 29, 2022

Download Folders from a GitHub Repo using Python (… files, too)

In this short post I want to share a cool trick: How to download single folders from GitHub. It's easy to clone the entire repo, or download a single file, but how do you solve the problem of downloading just a specific folder?

At some point during your life as a programmer, you will end up with the problem of downloading a single folder from a GitHub repository. For me, it was because I had an auxiliary repo containing files relevant to some unit tests. To solve this problem, I found a very elegant (and super easy) solution, which I want to share here today. In other words, this is a short how-to on how to download/copy files and folders from GitHub using python.

The solution uses the awesome fsspec library. fsspec is a pythonic approach to filesystem management, and allows you to use python to access data in various kinds of locations: on your local machine, on all major cloud providers, and – most importantly – on GitHub. There are many more locations, so the library is worth checking out if you have the time.

Installation

To get started, install fsspec. (Chances are you already have it, because increasing parts of the pydata ecosystem use it internally.)

pip install fsspec

Copy a Folders

import fsspec
from pathlib import Path

# flat copy
destination = Path.home() / "test_folder_copy"
destination.mkdir(exist_ok=True, parents=True)
fs = fsspec.filesystem("github", org="githubtraining", repo="hellogitworld")
fs.get(fs.ls("src/"), destination.as_posix())

# recursive copy
destination = Path.home() / "test_recursive_folder_copy"
destination.mkdir(exist_ok=True, parents=True)
fs = fsspec.filesystem("github", org="githubtraining", repo="hellogitworld")
fs.get(fs.ls("src/"), destination.as_posix(), recursive=True)

The above snippet does the following: We first declare a destination (where to store the folder’s content). Then we use fsspec to turn the repo into a pythonic filesystem. Finally, we list all the files in the target folder of the repo (fs.ls(…)) and download them all using fs.get. Simple, elegant, and convenient. I love it!

Copy Files

Copying/Downloading individual files works the same way; however, this time the destination has to be a file.

import fsspec
from pathlib import Path

# download a single file
destination = Path.home() / "downloaded_readme.txt"
fs = fsspec.filesystem("github", org="githubtraining", repo="hellogitworld")
fs.get("README.txt", destination.as_posix())

And that is all there is to it. Happy coding!