Bard College – Computer Science – Data Structures
In this lab, we will explore undirected graphs using the IMDB actor-movie dataset. Specifically, we will investigate degrees of separation(pg. 548-555) between actors and movies.
Updated(2017), larger(~50K and 300K) versions of movies.txt can be found on Google Classroom. Those files use == as the separator marker.
Questions
Supply an answer along with an explanation of how you arrived at that answer.
Find the Bacon # of all the actors in the graph;
Who has the largest and second largest Bacon #'s?
Center of the Hollywood universe.
We can measure the centrality of an actor by computing their Hollywood number. For example, the Hollywood number of Kevin Bacon is the average Bacon number of all the actors. The Hollywood number of another actor is computed the same way, but we make them be the source instead of Kevin Bacon. Compute Kevin Bacon's Hollywood number and find another actor with better Hollywood numbers.(FromSedgewick and Wayne)
Discover something else interesting about Hollywood using this data set.
Submission
Submit a PDF of your lab report:
lab11.pdf
Some Unix Tips
Viewing the whole file
cat movies.txt
Viewing a little bit of the file
less movies.txt
Counting the number of lines in a file:
wc -l movies.txt
Looking at the first 7 lines of a file:
head -7 movies.txt
Looking at the last 7 lines of a file:
tail -7 movies.txt
Searching a file for a term:
grep "Footloose (1984)" movies.txt
uniq is another useful command line tool.(typeman uniq to learn about it)
Of course all these tools can be composed together with PIPES(|) or FILE REDIRECITON(<>). UNIX ROCKS!
You can send the output of a program to a file, or get the input from a file.
Questions
Submission
lab11.pdf
Some Unix Tips