Lab 11: IMDB as an Undirected Graph
Bard College – Computer Science – Data Structures

In this lab, we will explore undirected graphs using the IMDB actor-movie dataset. Specifically, we will investigate degrees of separation (pg. 548-555) between actors and movies.

Updated (2017), larger (~50K and 300K) versions of movies.txt can be found on Google Classroom. Those files use == as the separator marker.

Questions

Supply an answer along with an explanation of how you arrived at that answer.

  • Co-Stars 
  • How many actors have co-starred with Kevin Bacon?
  • Find all the actors who have co-starred with Kevin Bacon at least twice. 
  • What pair of actors has co-starred together most often?
  • Find the Bacon # of all the actors in the graph; 
  • Who has the largest and second largest Bacon #'s? 
  • Center of the Hollywood universe. 
  • We can measure the centrality of an actor by computing their Hollywood number. For example, the Hollywood number of Kevin Bacon is the average Bacon number of all the actors. The Hollywood number of another actor is computed the same way, but we make them be the source instead of Kevin Bacon. Compute Kevin Bacon's Hollywood number and find another actor with better Hollywood numbers. (From Sedgewick and Wayne)
  • Discover something else interesting about Hollywood using this data set.

Submission

Submit a PDF of your lab report:

lab11.pdf 

Some Unix Tips

Viewing the whole file
cat movies.txt

Viewing a little bit of the file
less movies.txt

Counting the number of lines in a file:
wc -l movies.txt

Looking at the first 7 lines of a file:
head -7 movies.txt

Looking at the last 7 lines of a file:
tail -7 movies.txt

Searching a file for a term:
grep "Footloose (1984)" movies.txt

uniq is another useful command line tool. (type man uniq to learn about it)

Of course all these tools can be composed together with PIPES (|) or FILE REDIRECITON (<>). UNIX ROCKS!

You can send the output of a program to a file, or get the input from a file.