🕘 Meeting notes: Machine Learning WG
July 12?
Rescheduled from US Independence Day holiday on Monday, July 5?
(However,  the Pangeo Cloud monthly meeting is at 11:30 ET on Monday, July 12)
Would 12:30ET for Pangeo ML work?)

No meeting today (July 12)?

June 7

  • Attendees
  • Joe Hamman
  • Tom Augspurger
  • Cindy Chiao
  • Tom Nicholas
  • Stephen Haddad (UK Met Office Informatics Labs)
  • Douglas Rao (CISESS/NOAA NCEI)
  • Ray Bell
  • Alec Engell (NextGen Federal Systems)
  • Valentina Staneva (University of Washington)
  • Updates
  • Joe: Working on global biomass mapping
  • Tom’s looking for thoughts on geospatial ML frameworks
  • Cindy: working on global biomass with Joe, currently training xgboost model with input pipeline using the pangeo suite. Looking to expand xbatcher to complete the transition from xarray/dask → ML model. 
  • Tom: maybe ML project with Ryan on parameterization of ocean turbulence - interested in state of pangeo ML tools
  • Stephen: from informatics lab, about to submit a paper on ML estimation of metadata for ocean temperature, QA for ML projects at the met office
  • Douglas: developing training materials for ML applications within NOAA, also focusing on AI ready data; 3rd NOAA AI workshop (09/13 – 09/17) is accepting abstract until June 18 (https://2021noaaaiworkshop.sched.com).
  • Alec: work for DOD contractor for AirForce weather, previously SM and spaceweather ML.
  • Martin’s email: mdurant@anaconda.com  
  • Valentina: from UW,  doing new projects with NetCDF / Zarr for geospatial&time series ML 
  • Agenda
  • Probably depends on target audience: prototyping / development maybe want flexibility. Newcomers / established production process might benefit from the “framework”-style.
  • Also concerns about longevitity of new projects (for use and teaching).
  • “Standard” scikit-learn vs. Deep learning. For DL models, the input format might change (it’s not just observations x features). Maybe mask handling changes.
  • Nice to be able to swap out components.
  • Anything unique about high-dimensional data?
  • The rows x features layout loses the fact that values inside an n-d grid are related.
  • Stephen - Machine Learning QA processes

May 3

  • Attendees
  • noah brenowitz
  • Cindy Chiao
  • Joe Hamman
  • Jeff Sadler (USGS)
  • Tom Nicholas
  • Ryan Abernathey