Lab 1: Benford’s Law
Bard College – Computer Science – Data Structures

In our first lab we will introduce Java, the UNIX command line, pair programming, and begin employing an empirical approach to algorithms that we will adopt throughout the course. This week, we will use all of these tools to explore something known as Benford’s law.  

You will write a lab report and create two Java classes:

  1. Stats — A program that reports descriptive statistics of a set of numbers.
  1. Benford — A program that displays only the first digit of a set of numbers.

Read through this lab prompt aloud together in pairs; take turns reading and listening. 

WARM UP: Collaboratively Coding on Repl.it

Repl.it allows us to collaboratively write programs together, a-la Google docs. Another bonus feature is that it provides everyone a Unix environment in the cloud instantly. 

  1. Make sure we have your repl.it username so we can add you to the repl.it team.
  1. You (or your partner) should accept the lab1-warmup project and the other as a collaborator.
  1. Make a new class Benford that when run (i.e., hit the play button) prints: 
  • “Benford's law, also called the Newcomb–Benford law, the law of anomalous numbers, or the first-digit law, is an observation about the frequency distribution of leading digits in many real-life sets of numerical data.” 
  • Again, the Java file should be named Benford.java not Main.java
  1. Compile & run Benford.java from the command-line.
  1. Create a custom .replit file to compile & run custom files beyond Main.
run = "javac Main.java && java Main"
  1. Create the standard README.md file with the standard header:
 Name <bard email>
 Name <bard email>
 date
 CMSC 201
 Assignment Name
 Collaboration  I (worked alone | collaborated) on this assigment 
 Statement      [with (student | tutor) X]
                [with assistance from 
                (website link | book | student | tutor) Z].
  1. Show Keith, Kerri, or the tutor.

Benford’s Law

Benford’s law says that the first digits of a collection of measurements are not uniform across 1,2,3,4,5,6,7,8,91,2,3,4,5,6,7,8,9 as one might expect. In fact, we can expect to see 11 not 1/9 of the time but almost 1/3 of the time as the most significant digit.  Ideally, the distribution looks something like:
digit
frequency
1
30.1%
2
17.6%
3
12.5%
4
9.7%
5
7.9%
6
6.7%
7
5.8%
8
5.1%
9
4.6%
With the the mean digit being 3.443.44 and standard deviation being 2.462.46.

Create a program Benford that reads numbers from standard input and prints out only the first digit of those numbers. The String class has a method to access individual characters.

java Benford < heights.txt