Breakout Session: Tech Logistics

This breakout will focus on Datasets, Infrastructure, and Tools to make a hackathon successful


Participants 

(please add yourself to table!)

Name
Organization
Email
Twitter
Dave Goodsmith
DataScience
dave@datascience.com
thegoodsmith,datascienceinc


















Facilitators or a volunteer from the group will report out (4:00PM - Challenges/Lessons Learned, and 4:45PM - Curating and Building Solutions)


Team countries: France, Kenya, USA
Team sectors: Universities, research center (non-profit), industry

Challenges and Lessons Learned:

  • Challenge:  Tools planned to use were not adequate to deal with problems trying to address.
  • Challenge:  last minute glitches.  Unexpected changes in the network.  Lesson learned: have failover plans.
  • Challenge: finding datasets.  Data scientist = data hunter.  Lesson learned: took team a week to find right data to use.  Don’t assume the open datasets are freely available.  Public data set not always robust enough.
  • Challenge: having ground truth for analysis.  Domain expert to validate research.
  • Challenge: datasets are just collected by organizations and they don’t realize it’s meaningless.
  • Challenge: how to get companies to share network data, e.g. network data, intellectual property/patent.
  • Lessons learned: have really good administrative staff on hand to solve issues, e.g. network issues.
  • Lessons learned: use virtual, collaborative document tool, e.g. Etherpad.  Gives commonality and structure.
  • Lessons learned: best practice from Dept. of Commerce hackathon.  Give clear guidelines for output.  Have staff that vet the projects. Hackathon can be a way to demonstrate what is cool about an API.  See commerce.gov/datausability
  • Challenge: can’t get the type of data you want.  Lesson learned: create a synthetic set.  Provide an end-to-end set of instructions.
  • Lessons learned: hackathon is a way to recruit/select data scientists. (highervue)
  • Lessons learned: predictions using fun, pop culture data, e.g. predict which Taylor Swift song will go viral.
  • Lessons learned: participants can sometimes cheat without breaking rules you’ve defined.

Resources:

RESOURCE
TESTIMONIALS/COMMENTS (Including “how might we build / scale this?”)
Cloudera

Data Robot
















Tips/ how-to guide or other solutions and approaches to the challenges mentioned:

  • insert here