🕘 Meeting notes: Machine Learning WG

February 1

  • Attendees:
  • David John Gagne
  • Noah Brenowitz
  • Charlie Becker
  • Sam Adams (met office)
  • Daniel Rothenberg
  • Zac Flamig (AWS)
  • Douglas Rao
  • Joe Hamman
  • Stephen Haddad (met office)
  • Tom Augspurger
  • John Schreck
  • Cindy Chiao (carbon plan)
  • Chris Slocum
  • Ray Bell (DTN)
  • Agenda:
  • mldata.pangeo.io
  • Chunking you are reading with needs to match the order of the cog. Read horizontally not vertically.
  • Q: What ML libraries are people using?
  • Some mix of tensorflow, pytorch, scikit-learn. Similar but not the same APIs, which requires extra code to handle.
  • Daniel: team member wanted to use fastai and had to spend two weeks writing code to adapt data loading for zarr into fastai.
  • Issue: discovering GPL licensed dependencies in the open source tool chain. Can cause lots of legal issues.
  • Group number of collaborations:
  • Sam: Met Office not too many external collaborations but internally there is more demand for collaborations than there are data scientists available. Two issues for science teams:
  • Idea but no ML expertise, like what are the right algorithms for the problems, data wrangling, pipelines.
  • Do not have the expertise to carry on afterwards.
  • Scientists not used to working in the cloud. 
  • Community of practice
  • Noah: fixed collaborator with close relationship
  • Doug: R2O is really challenging and learning about how to accomplish that transition is important.
  • Chris: closing that knowledge gap on spinning up people in ML. Develop domain-specific training materials. 
  • Noah: Are we at peak funding for ML?
  • Government agencies still ramping up funding
  • Potentially past peak in industry
  • Flood of people learning data science/ML. Eventually may become standard part of toolbox.

January 4

  • Attendees:
  • Joe Hamman
  • Noah Brenowitz
  • DG
  • Charlie Becker
  • John Shreck