Bard College – Computer Science – Data Structures
In this lab, we will explore undirected graphs using the IMDB actor-movie dataset. Specifically, we will investigate degrees of separation(pg. 548-555) between actors and movies.
An updated larger(~50K and 300K) versions of movies.txt can be found on Google Classroom and repl.it. Those files use == as the separator marker.
symbol graph API
Questions
Supply an answer along with an explanation of how you arrived at that answer.
Find the Bacon # of all the actors in the graph; Who has the largest Bacon #?
Co-Stars
How many actors have co-starred with Kevin Bacon?
Find all the actors who have co-starred with Kevin Bacon at least twice.
What pair of actors has co-starred together most often?
Center of the Hollywood universe.
We can measure the centrality of an actor by computing their Hollywood number. For example, the Hollywood number of Kevin Bacon is the average Bacon number of all the actors. The Hollywood number of another actor is computed the same way, but we make them be the source instead of Kevin Bacon. Compute Kevin Bacon's Hollywood number and find another actor with better Hollywood numbers.(FromSedgewick and Wayne)
Discover something else interesting about Hollywood using this data set.
Submission
Submit a PDF of your lab report:
lab10.pdf
Some Unix Tips
Viewing the whole file
cat movies.txt
Viewing a little bit of the file
less movies.txt
Counting the number of lines in a file:
wc -l movies.txt
Looking at the first 7 lines of a file:
head -7 movies.txt
Looking at the last 7 lines of a file:
tail -7 movies.txt
Searching a file for a term:
grep "Footloose (1984)" movies.txt
uniq is another useful command line tool.(typeman uniq to learn about it)
Of course all these tools can be composed together with PIPES(|) or FILE REDIRECITON(<>). UNIX ROCKS!
Questions
Submission
lab10.pdf
Some Unix Tips