Bard College – Computer Science – Data Structures
In this lab, we will explore undirected graphs using the book's IMDB actor-movie dataset. Specifically, we will investigate degrees of separation(pg. 548-555) between actors and movies.
Updated(2017) and larger(~50K and 300K) versions of movie.txt can be found on Google Classroom. Those files use == as the separator marker.
Questions
For each question, supply an answer along with an explanation of how you arrived at that answer.
Center of the Hollywood universe. We can measure how good of a center that Kevin Bacon is by computing their Hollywood number. The Hollywood number of Kevin Bacon is the average Bacon number of all the actors. The Hollywood number of another actor is computed the same way, but we make them be the source instead of Kevin Bacon. Compute Kevin Bacon's Hollywood number and find an actor and actress with better Hollywood numbers."(FromSedgewick and Wayne)
Discover something else interesting about Hollywood using this data set.
Submission
Submit via moodle a PDF of your lab report.
lab10.pdf
Some Unix Tips
Viewing the whole file
cat movies.txt
Viewing a little bit of the file
less movies.txt
Counting the number of lines in a file:
wc -l movies.txt
Looking at the first 7 lines of a file:
head -7 movies.txt
Looking at the last 7 lines of a file:
tail -7 movies.txt
Searching a file for a term:
grep "Footloose (1984)" movies.txt
uniq is another useful command line tool.(typeman uniq to learn about it)
Of course all these tools can be composed together with PIPES(|) or FILE REDIRECITON(<>). UNIX ROCKS!
You can send the output of a program to a file, or get the input from a file.
echo "Hello" > out.txt
rev < out.txt
You can send an input to a program with echo and a pipe rather than typing manually:
Questions
Submission
lab10.pdf
Some Unix Tips