At some point during your life as a programmer, you will end up with the problem of downloading a single folder from a GitHub repository. For me, it was because I had an auxiliary repo containing files relevant to some unit tests. To solve this problem, I found a very elegant (and super easy) solution, which I want to share here today. In other words, this is a short how-to on how to download/copy files and folders from GitHub using python.
The solution uses the awesome fsspec library. fsspec is a pythonic approach to filesystem management, and allows you to use python to access data in various kinds of locations: on your local machine, on all major cloud providers, and – most importantly – on GitHub. There are many more locations, so the library is worth checking out if you have the time.
Installation
To get started, install fsspec. (Chances are you already have it, because increasing parts of the pydata ecosystem use it internally.)
pip install fsspec
Copy a Folders
import fsspec
from pathlib import Path
# flat copy
destination = Path.home() / "test_folder_copy"
destination.mkdir(exist_ok=True, parents=True)
fs = fsspec.filesystem("github", org="githubtraining", repo="hellogitworld")
fs.get(fs.ls("src/"), destination.as_posix())
# recursive copy
destination = Path.home() / "test_recursive_folder_copy"
destination.mkdir(exist_ok=True, parents=True)
fs = fsspec.filesystem("github", org="githubtraining", repo="hellogitworld")
fs.get(fs.ls("src/"), destination.as_posix(), recursive=True)
The above snippet does the following: We first declare a destination (where to
store the folder’s content). Then we use fsspec to turn the repo into a pythonic
filesystem. Finally, we list all the files in the target folder of the repo
(fs.ls(…)
) and download them all using fs.get
. Simple, elegant, and
convenient. I love it!
Copy Files
Copying/Downloading individual files works the same way; however, this time the destination has to be a file.
import fsspec
from pathlib import Path
# download a single file
destination = Path.home() / "downloaded_readme.txt"
fs = fsspec.filesystem("github", org="githubtraining", repo="hellogitworld")
fs.get("README.txt", destination.as_posix())
And that is all there is to it. Happy coding!