Switching to the Python Pathlib Module

Posted on | 829 words | ~4mins
python3

It’s been about a year or so since we officially upgraded all of our tooling at my job at FarSounder from Python 2.7 to Python 3 (3.6 at the moment). Aside from the syntactic changes, there have been a handful of updates in Python 3 that I’ve found to really increase the readability of our scripts. One of those updates (from back in Python 3.4) has been the introduction of the pathlib module.

The pathlib module contains high level classes that represent file system paths

  • and I’ve mainly been in using it cases where I would have previously used the os module. Here are some examples of the most common uses that I’ve seen occurring in the wild, and how they can be implemented using both modules.

Path Construction

In this case, the goal is to store a path to the test_dir directory, that lives in the arbitrary hard-coded path in my system. An example of how to do this using both modules is:

# Construct the paths
# 1. Using the os module and os.path.join
ospath = os.path.join("c:\\", "dev", "sandbox-heath", "pathlib_test", "test_dir")
# 2. Using the pathlib module and pathlib.Path (and the overloaded "/")
plpath = pathlib.Path("c:\\") / "dev" / "sandbox-heath" / "pathlib_test" / "test_dir"

Many of the scripts I work with day-to-day deal with opening, manipulating and writing other files, so the os module is used a lot. Specifically, os.path.join, used to create a valid path from the variable number of path components (given as one or more arguments to the os.path.join method). It works great, but it’s a little verbose, especially as the number of path components gets larger.

The / operator is implemented for Path objects, so might have noticed above, that path components can be joined using a / in the pathlib.Path version. I personally really like this syntax because of similarity to an actual file system path - mentally it feels easier to parse. I would be interested to hear what you think, especially if you haven’t seen it before.

Iterating Over Directory Contents

Another super common pattern when working with the file system is iterating over files in a given directory. Here’s an example of how to do this using both modules:

# Loop over files in a directory
for f in os.listdir(ospath):
  full_path = os.path.join(ospath, f)
  print(F"Filename: {f}")
  print(F"Fullpath: {full_path}")
 
for f in plpath.iterdir():
  print(F"Filename: {f.name}")
  print(F"Fullpath: {f}")

The example above iterates over files in the directory, and prints the filename, along with the full path to the file - so both loops above output:

c:\dev\heath\pathlib_test>py -3.6 pathlib_test.py
Filename: a.dat
Fullpath: c:\dev\heath\pathlib_test\test_dir\a.dat
Filename: a.py
Fullpath: c:\dev\heath\pathlib_test\test_dir\a.py
Filename: b.dat
Fullpath: c:\dev\heath\pathlib_test\test_dir\b.dat
Filename: b.py
Fullpath: c:\dev\heath\pathlib_test\test_dir\b.py

Looping over the files in the directory is simple in both cases, however some of utility of the Path object is starting to show through - first we can iterate over the directory using the built-in iterdir() method and given that it is yielding Path objects itself, we can use this objects .name to print the name of the file, and it’s str representation to print the full path.

Often, there is a requirement to iterate only over a specific file type in a directory. An option for this using each method is given below:

# Looping over specific files (.py for example)
# 1. Using os module
for f in os.listdir(ospath):
  full_path = os.path.join(ospath, f)
  name, ext = os.path.splitext(f)
  if f.endswith(".py"):
    print(F"Found a python file: {f}")
    print(F"Name: {name}, Ext: {ext}")
    print(F"Fullpath: {full_path}")
 
# 2. Using pathlib module
for f in plpath.glob("*.py"):
  print(F"Found a python file: {f.name}")
  print(F"Name: {f.stem}, Ext: {f.suffix}")
  print(F"Fullpath: {f}")

This results in the same output from each loop, and depending on the contents of your test_dir, looks something like:

Found a python file: a.py
Name: a, Ext: .py
Fullpath: c:\dev\heath\pathlib_test\test_dir\a.py
Found a python file: b.py
Name: b, Ext: .py
Fullpath: c:\dev\heath\pathlib_test\test_dir\b.py

Again, both methods are pretty similar - but in the pathlib version, it’s convenient to take advantage of pathlib.Path‘s glob() method to iterate over only the files we want. Further, when we need to work with the file, pathlib.Path offers a ton of useful properties that are a lot simpler than the os version.

Conclusion

There is a lot more you can do with this module, and you can read more about this in the Python docs. There are a ton examples and a lot of useful information there.

I don’t think that there would be a huge upside to rewriting / refactoring existing scripts to use pathlib instead of os, and I’m certainly not advocating that. However, I know that due to the ease of manipulating file system paths using pathlib, I am definitely going to default to using it whenever I’m implementing a new script or adding a significant chunk of functionality to an existing one.

What do you think? Is it really cleaner and easier to read? Or is it just me? Can you think of cases where you would still prefer to use the underlying os module instead?