Loading...
🕘
Meeting notes:
ML Working Group
Jul 1, 2019
Attendees
​
@
Noah B
​
Â
Jim Bednar / Anaconda/ @jbednar
Stephan Rasp
Tom Augspurger
DavidÂ
Agenda
What have we been up to in the last month?
Noah:
New Job at Vulcan
Reworked pre-processing pipeline to use zarr. Each chunk is a sample
Wrote a pytorch Loader.
Jim
pyviz-examples infrastructure
Writing new ML examples next
Stephan
Back to Lorentz model, online learning
tried binder for the first time
(mybinder.org)
Tom
PR for shuffling dask arrays:
https://github.com/dask/dask/pull/3901
H
yperband
optimization
(not
bayesian) in dask
:
https://ml.dask.org/modules/generated/dask_ml.model_selection.HyperbandSearchCV.html#dask_ml.model_selection.HyperbandSearchCV
David
Some difficulties
First: Hurricane intensity
(~4TB
raw). Hurricane centered; winds, temp, dew point, multiple levels. 50,000-75,000 timesteps.
suggest using one big zarr file
Wants to train DL model using full dataset
Distributed training slow
(IO
bottleneck). Seems to affect NetCDF & Zarr, metadata reading is slow
(milliseconds
per sample)?
https://github.com/pydata/xarray/issues/2501
Using horovod for distributed training. Works well, but CNN error isn’t decreasing during training.
64 x 64 grid
(smaller)
isn’t an IO problem. Depends on size
Data Parallel partial dependence plots
(using
dask)
Ran into bottleneck with TF. Hard to distribute TF models across processes.
Maybe check if can start dask workers without a nanny
Maybe try
--
n-threads=1
for each dask worker
threads_per_worker, n_workers, nanny for LocalCluster
Cloud environments?
gotchas for docker + deep learning:
use nvidia docker images
k8s: install nvidia cuda driver
(look
at GCP documentation)
issues with firewall at NOAA
Blog post
Next meeting time?
August 5 9AM
Action items
Still write blog post
​
@
Noah B
​
​
@
Tom
​
​
+
Pangeo ML WG Blog Post
​
Â
Please turn on JavaScript to use Paper in all of its awesomeness. ^_^
Jul 1, 2019
Attendees
Agenda
Action items