PEP 622 new intro

Abstract

This PEP proposes to add a pattern matching statement to Python, inspired by similar syntax found in Scala and many other languages.

The pattern syntax builds on Python’s existing syntax for sequence unpacking (e.g., a, b = value), but is wrapped in a match statement which compares its subject to several different “shapes” until one is found that fits. In addition to specifying the shape of a sequence to be unpacked, patterns can also specify the shape to be a mapping with specific keys, an instance of a given class with specific attributes, a specific value, or a wildcard. Patterns can be composed in several ways.

Syntactically, a match statement contains a subject expression and one or more case clauses, where each case clause specifies a pattern (the overall shape to be matched), an optional “guard” (a condition to be checked if the pattern matches), and a code block to be executed if the case clause is selected.

The rest of the PEP motivates why we believe pattern matching makes a good addition to Python, explains our design choices, and contains a precise syntactic and runtime specification. We also give guidance for static type checkers (and one small addition to the typing module) and discuss the main objections and alternatives that have been brought up during extensive discussion of the proposal, both within the group of authors and in the python-dev community. Finally, we discuss some possible extensions that might be considered in the future, once the community has ample experience with the currently proposed syntax and semantics.

Overview

Since patterns are a new syntactical category, with their own rules and exceptions, and since they mix input (given values) and output (captured variables) in novel ways, they require a bit of getting used to. It is the experience of the authors that this happens quickly when a brief introduction to the basic concepts such as the following is presented. Note that this section is not intended to be complete or perfectly accurate.

A new syntactic construct called pattern is introduced. Syntactically patterns look like a subset of expressions; the following are patterns:

  • [first, second, *rest]
  • Point2d(x, 0)
  • {"name": "Bruce", "age": age}
  • 42

The above look like examples of object construction. A constructor takes some values as parameters and builds an object from those components. But as a pattern the above mean the inverse operation of construction which we call destructuring: it takes a subject object and extracts its components. The syntactic similarity between construction and destructuring is intentional and follows the existing Pythonic style which makes assignment targets (write contexts) look like expressions (read contexts). Pattern matching never creates objects, in the same way that [a, b] = my_list doesn't create a new [a, b] list, nor reads the values of a and b.

The intuition we are trying to build in users as they learn this is that matching a pattern to a subject binds the free variables (if any) to subject components in a way that would result in the original subject if the pattern was read as an expression. During this process, the structure of the pattern may not fit the subject, in which case the matching fails. For example matching the pattern Point2d(x, 0) to the subject Point2d(3, 0) successfully matches and binds x to 3. However, if the subject is [3, 0] the match fails because a list is not a Point2d. And if the subject is Point2D(3, 3) the match fails because its second coordinate is not 0.

The match statement tries to match each of the patterns in its case clauses with a single subject. At the first successful match, the variables in the pattern are assigned and a corresponding block is executed. Each of the multiple branches of this conditional statement can also have a boolean condition as a guard.

Here's an example of a match statement, used to define a function building 3D points that can accept as input either tuples of size 2 or 3, or existing (2D or 3D) points:

def make_point_3d(pt):
    match pt:
        case (x, y): return Point3d(x, y, 0)
        case (x, y, z): return Point3d(x, y, z)
        case Point2d(x, y): return Point3d(x, y, 0)
        case Point3d(_, _, _): return pt
        case _: raise TypeError("not a point we support")

Writing this function in the traditional fashion would require several isinstance() checks, one or two len() calls, and a more convoluted control flow. While the match version translates into similar code under the hood, to a reader familiar with patterns it is much clearer.

Rationale and Goals

Python programs frequently need to handle data which varies in type, presence of attributes/keys, or number of elements. Typical examples are operating on nodes of a mixed structure like an AST, handling UI events of different types, processing structured input (like structured files or network messages), or “parsing” arguments for a function that can accept different combinations of types and numbers of parameters.

Much of the code to do so tends to consist of complex chains of nested if/elif statements, including multiple calls to len(),  isinstance() and index/key/attribute access. Inside those branches sometimes we need to destructure the data further to extract the required component values, which may be nested behind several layers of objects.

Pattern matching as present in many other languages provides an elegant solution to this problem. These range from statically compiled functional languages like F# and Haskell, via mixed-paradigm languages like Scala and Rust, to dynamic languages like Elixir and Ruby, and is under consideration for JavaScript. We are indebted to these languages for guiding the way to Pythonic pattern matching, as Python is indebted to so many other languages for many of its features: many basic syntactic features were inherited from C, exceptions from Modula-3, classes were inspired by C++, slicing came from Icon, regular expressions from Perl, decorators resemble Java annotations, and so on.

The usual logic for operating on heterogeneous data can be summarized in the following way:

  • Some analysis is done on the shape (type and components) of the data: This could involve isinstance() calls and/or extracting components (via indexing or attribute access) which are checked for specific values or conditions.
  • If the shape is as expected, possibly some more components are extracted and some operation is done using the extracted values.