[Workshop] How to Build Great Data Science Projects - Part 1
  • Use this document as a guide and a template for conceptualizing, planning and working on your data science project. Use the “Create Doc” button above to start with a fresh copy of this doc. Feel free to delete any sections/explanation to streamline your doc. 

What is a Project?
For the purpose of this workshop, we’ll use the word project to refer to some original work that will require at least 20-30 hours of work and produce a substantial output in the form of one or more of these things: Jupyter notebook, code repository, project report, blog post, web application, presentation or video walkthrough.

What are the steps involved in building a great project?
  1. Finding a project topic/idea/domain
  1. Finding a dataset for your project
  1. Preparing an outline and setting deadlines
  1. Executing and iterating on your project
  1. Project Documentation & Presentation
  1. Showcasing your Project on your Resume/LinkedIn
  1. Maximizing your reach and improving your project

In Part 1 of this workshop, we’ll cover steps 1 to 4. Let’s get started!

Step 1: Finding a Project Topic/Idea/Domain

It’s OK to feel lost when you’re trying to figure out what topic you should build your next/first project on. Generally speaking, it should be approachable while being just a bit challenging, so that you can both complete the project and learn something.

What should be the topic of your project?

  • Something you have learned and want to practice e.g. Data Analysis with Python
  • For every course you take, you should have a project, ideally 2-3.
  • Something you are interested in learning e.g. scikit-learn, plotly etc.
  • It’s OK if you don’t already know the topics, you can learn while doing the project
  • Something that will fill a gap in your portfolio/Resume e.g. model deployment
  • Go through your Resume, or check out some job listings on LinkedIn 
  • Something interesting you came across and want to replicate with a different dataset

Topics for Projects in Data Science & Machine Learning

  • Exploratory Data Analysis of a dataset
  • Data Visualization with Seaborn, Plotly, Folium (maps)
  • Supervised machine learning (regression/classification) on tabular data 
  • Unsupervised machine learning (clustering/recommendations) on tabular data
  • Deep Learning on images (Computer vision) 
  • Deep Learning for natural language processing
  • Data cleaning and feature engineering
  • Reinforcement learning
  • Dataset creation using web scraping or official APIs
  • Model deployment using Flask & Heroku
  • Creating a web/mobile application powered by machine learning
  • Participating in a active/completed data science competition on Kaggle etc.
  • A practical tutorial on any topic related to machine learning
  • Implementing a paper on a different dataset
  • Create and publish a Python library with utilities, models, functions etc.
  • Apply any of the above to a specific domain e.g. energy, astronomy, Covid-19, 
Over the course of your learning journey, try to cover most, if not all of the above topics (a single project can cover multiple topics)

Inspiration for interesting projects

Whatever topic you have in mind, it’s quite likely that you can find projects done by others on the same topic. Look through 10-12 projects for inspiration before you finalize your topic.