carry on with incremental improvements / bug fixes / performance improvements
Overview
A
Matplotlib is the defacto standard plotting library in the scientific Python ecosystem. Over the last 16 years the library has organically grown to cover a wide range of use-cases. Further, over the past decade there have been significant advances in data structures and data access that Matplotlib does not currently exploit. We propose building on the established code base by codifying conventions for Matplotlib extensions, so domain-specific plotting tools will compose well with core Matplotlib and oneanother, unifying the internal data abstractions, to both better support modern structured data and to ease development of responsive, interactive, and streaming plots, and unify the way we internally encode plot properties to enable exporting to other plotting tools, such as javascript libraries or openGL. With this work Matplotlib will continue to be an integral component of the scipy stack for the next 15 years.
B
Matplotlib is a mature widely used and highly impactful library used across a wide range of science[LIGO, EVHT, NSLS-II, CellProfiler, cartopy, XX% of arxiv] and industry[used at google, bloomberg, MSFT, …]. It has developed organically over the last 16 years; however to continue to be impactful for the next 16 years we need to adapt.
One of the biggest changes from when Matplotlib started to now is the development of structured and streaming data. Although Matplotlib has some support[data kwarg], fixed-size array-like data structures that primarily hold numeric data are the fundamental primitives of the API. The need for structured-data aware primitives is shown by some data structures having their own built-in plotting(xarray and pandas) and third-party libraries(exseaborn) that have structured data as their primitive inputs.
The end goal is to enable highly-tuned domain specific plotting tools to be easily built.
Architecture
The architecture of Matplotlib[AAOSP link] can be thought of as having three layers
The user-facing API(eitherpyplot or the OO API)
The Artist representation
The backends
Roughly speaking, the users and library authors use(1) to express to Matplotlib what data they have and how they want it plotted which generates the Python objects of(2). To render the figure, either to a GUI or file, the Artists from(2) are passed to layer(3) which uses them to render the final output.
We propose:
adding a new Data layer to abstract over data storage and access
extending the Artist layer to have richer semantic artists
extending the backend layer to provide an export entrypoint
Data Model(data structures)
Currently when users pass in data it is(sometimes) transformed and then stored across one or more Artists and numpy-array likes. While this is easy to implement and has been very successful it has several drawbacks
each Artist stores the data in a slightly different place
common processing is done in many places(ex, unit handling, masked data, scaler / vector processing)
if multiple Artists involved, can become decoupled
do not keep raw data to allow re-processing(ex,hist, contour)
To that end we propose to develop a Matplotlib data layer that will handle these details.
Requirements
be able to be shared across Artists
handle units
handle smart down/upsampling of data
handle updates to data
handle streaming data
use cases
data-shader style binning for large data sets
‘smart’ resampling based on data limits of lines
Native support for non-numeric datatypes(like panda extension types)
THIS IS STILL A WORK IN PROGRESS
Roadmap
Overview
A
B
Architecture
Data Model (data structures)
Requirements
use cases