Lab 1: Benford’s Law
Bard College – Computer Science – Data Structures

In our first lab we will introduce Java, the UNIX command line, pair programming, and begin employing an empirical approach to algorithms that we will adopt throughout the course. This week, we will use all of these tools to explore something known as Benford’s law.  

You will write a lab report and create two Java classes:

  1. Stats — A program that reports descriptive statistics of a set of numbers.
  1. Benford — A program that displays only the first digit of a set of numbers.

These programs should be useful in analyzing and presenting the performance of various data structures throughout the semester.

Read through this lab prompt aloud together in pairs; take turns reading and listening. 

WARM UP: Collaboratively Coding on Repl.it

Repl.it allows us to collaboratively write programs together, a-la Google docs. Another bonus feature is that it provides everyone a Unix environment in the cloud instantly. 

  1. Make sure Keith has your repl.it username so he can add you to the BardCMSC team.
  1. You (or your partner) should fork my Algs4 repl.it project.
  • You can choose to make your project public to the world, or only accessible within the BardCMSC team. For small experiments, such public REPLs are just fine, but for lab submissions please use the assigned private project.
  1. Make a new class Benford that when run (hit the play button) simply prints: 
  • “Benford's law, also called the Newcomb–Benford law, the law of anomalous numbers, or the first-digit law, is an observation about the frequency distribution of leading digits in many real-life sets of numerical data.” 
  • Again, the Java class should be named Benford not Main
  1. Show Keith, or the tutor, and they will send you the starter code for the next part.

Benford’s Law

Benford’s law says that the first digits of a collection of measurements are not uniform across 1,2,3,4,5,6,7,8,91,2,3,4,5,6,7,8,9 as one might expect. In fact, we can expect to see ‘1’ not 1/9 of the time but almost 1/3 of the time as the most significant digit.  Ideally, the distribution looks something like:
digit
frequency
1
30.1%
2
17.6%
3
12.5%
4
9.7%
5
7.9%
6
6.7%
7
5.8%
8
5.1%
9
4.6%
With the the mean digit being 3.443.44 and standard deviation being 2.462.46.

Create a program Benford that reads numbers from standard input and prints out only the first digit of those numbers. The String class has a method to access individual characters.

java Benford < heights.txt
8
6
6
6
5
4
<SNIP>
7
7