Gitlet Persistence

Note: These notes are intentionally vague about what classes and data structures you should use for your Gitlet implementation, and any mention of a class or data structure doesn’t necesssarily mean you should use it. The intent of these notes are to help with persistence if you need it.

What is Persistence?

Since our Java program Gitlet will only be run one command at a time, we need to store all of our data on our filesystem (disk) so it will remain between runs of the program.

We store this data in our .gitlet folder.

How should I approach persistence?

Key Idea: Design with persistence in mind from the start.

You might be tempted to figure out your classes and data structures for an idealized Gitlet where we don’t need to worry about persistence first, and then figure out persistence after.

Don’t do this, it’ll be annoying and painful.

Keep the fact that you need to be able to load/save things from disk in mind when picking your classes and data structures.

Ways that persistence affects your design:

Runtime / memory constraints. If you need information from one commit, you shouldn’t load the information for ALL commits at the same time (it takes up too much time & memory).

How does this influence your design?

You need to make sure you can retrieve small pieces of information quickly.

If you are using serialization for a class, you need to make sure you don’t have pointers to other Java objects if those objects also have pointers to other Java objects and so on, since deserialization follows pointers. (How do we get around this?)

Example: Serializing a CommitTree class can be bad. If you load the entire tree, you are loading a lot of data.

Identity. You need to be able to locate the data for an object on your filesystem given some piece of information. What will you use to identify each object? How will you know where its information is on your file system?

Key Idea: Give yourself persistence “for free”.

Try to abstract away persistence as much as possible.

For each thing that needs to persist, write a helper method that will load it from your file system and a helper method that will save it to your file system.

You should never have to worry about how or where something is loaded or saved when you are implementing your gitlet commands. *

Get familiar with the concept of lazy loading and caching.

* To a reasonable extent.

Lazy Loading and Caching

Lazy Loading: Only retrieve information from your file system when you need it, not all at once in the beginning.

Caching: Once you load something from your file system, save it in your Java program so you don’t need to load it again. (E.g. as an attribute or an entry in a Map.)

Writing back: If you cached something and then modified it, make sure at the end of your Java program, you write the changes to your file system.

Easiest way to implement lazy loading:

function getThing(): // returns Thing

if Thing is not loaded:

Thing = loadThing() // save the Thing

return Thing

Example:

Let’s say you store the state of which files have been gitlet add ed to your repo in your filesytem. Lazy loading: The first time you want that list of files when you run your Java program, you need to load it from disk. Caching: The second time you need that list of files in the same run of the Java program, don’t load it from disk again, but use the same list as you loaded before. If you need to, you can then add multiple files to that list object in your Java program. Writing back: When you Java program is finished, at the very end, since you had loaded that list of files and may have modified it, write it back to your file system.

Key Idea: Implement persistence first.

When you go about coding your design, implement the helper methods that help you guarantee persistence first since all other gitlet commands will rely on them. When you start implementing your gitlet commands, you should find yourself using those helper methods. If you find yourself directly reading from or writing to your file system at that time, consider writing another helper method.

Exercise: Look at the .git folder in your repo.

Your repo (~/repo) is also its own git repo with a .git folder located at ~/repo/.git. Look inside there and see how they save things!

Especially look at:

HEAD

objects

​​What is Persistence?

​​How should I approach persistence?

​​Key Idea: Design with persistence in mind from the start.

​​Key Idea: Give yourself persistence “for free”.

​​Key Idea: Implement persistence first.