Loading...
Pangeo ML WG Blog Post
Newer outline
Title: ML Patterns for Geo-sciences
Author:
Introduction
Why patterns vs libraries?
flow-chart of typical ML process?
Data preparation
(one
article)
Depends on where the data is produced
Chunking depends on sampling strategy
Chunking good for ML training is not necessarily good for evaluation
column-based
image-like
Evaluation is more complicated
(another)
multivariate
(temperature,
humidity, etc)
spatial structure
(even
if training is column-based)
ML metrics not necessary computed easily on the training data
training data is good for ML training!
Sometimes need to couple ML to
Machine workflows
(another
article)
Outline
Data Loading
Potential bottlenecks
(need
data locality for each step to work well)
HD to main memory
main memory to CPU caches
main memory to GPU memory
Static model outputs
open_mfdataset is still slow
Streaming processing
Storage formats:
row-based vs column based
How much/how to preserve metadata
(e.g.
variable names, coordinate info, units) as data flows to ML algorithms
Shuffling
Coupling to climate models
copy-free semantics needed
easy of use/flexibility is key
can python be used efficiently?
Please turn on JavaScript to use Paper in all of its awesomeness. ^_^
Newer outline
Outline