In-Memory Merge Plan

Milestones

M1

The first milestone for landing in-memory merge work upstream is to introduce an optional mode to defer (to in-memory) all writes during the merge process, and then write them out to disk at the end of the merge process (or if any exception is raised). This is a “proof of work” phase of the riskiest part of in-memory-merge that might have the most hidden edge cases; the goal is to have as many merges and rebases use this logic as possible. With M1 in use, we can focus on M2.

Status: Patches here. Tests pass and it seems to work fine. Need to:
  • Figure out if the functionality of the overlay contexts I used could be rolled into existing classes
  • Rebase ontop of default (Jun's rebase changes)
  • Do a pre-code-review with Jun and Durham
  • Send more in-depth proposal to Sean, Augie, Martin.
  • Figure out what order to stage this as patches when the freeze listed.

Details:
  • The contract for merge.update doesn't change. All callers are supported so we can find unexpected edge cases.
  • No changes to the dirstate are needed in this phase (because we flush just beforecalling recordupdates())
  • If it works, it can successfully enforce some boundaries on the problem (e.g.: multiple writes to one file during a merge can be combined into one write).
  • This mode is optional, and can be disabled to make all writes go immediately to the filesystem. This lets us keep use cases like largefiles around that'd be a pain to convert.
  • Contents can be flushed in-flight. This is done on an error or before launching an external merge tool.
  • If using workers in batchget/batchremove, each forked worker must flush before and after running.
  • merge.update has to be wrapped in a giant try/except (to flush on exceptions).

Concerns raised:
  • (By Durham) Flushing all the writes all at once might be slower than the old method of distributing them throughout the merge process.
  • Also, you might run out of memory space during a large update (probably a bigger issue!). Consider:
  • Adding a limit to the number of files that can be in-memory at one time, and flush early if we hit it.
  • Threading the flushing to multiple files, like backgroundclosing.
  • Optionally disabling this mode for big operations like hg update
Jun suggests: traceprofile, but disable workers.

M2

M2 takes the work of M1 and makes the whole process work without a working copy:
  • Instead of flushing the writes at the end, we simply convert the in-memory contents to a changectx.
  • The overlayworkingcontext needs to be able to set an arbitrary other changectx as the fallthrough (not just the current workingctx). It probably needs a new name.
  • Renames need to be tracked in the overlay ctxs, not dirstate.
  • recordupdates() (anything in the working copy) is not called if not working in the working copy.
  • Any case where we'd flush to disk (e.g. running a merge tool), an exception is raised instead.
Discussion:
  • merge.update takes a dest ctx which defaults to repo[None]. So the calling function can decide whether to run in-memory or not.
  • rebase: check whether . is in rebaseset.
  • Should be able to always discard an in-menmory merge without worry. No user data can be introduced in the merge at this point (no tools or conflict resolution).
  • Possibly include “retry without using in-memory merge if it fails” logic in this cut.

Code

Currently at https://bitbucket.org/phillco/hg-imm/commits/all with patches for M1, though they’ll be tweaked with some feedback from Jun.

I have a rough M2 that works locally but needs to be cleaned up / turned into patches.

Status

Initially I was planning to roll the milestones as described here — however, I think I’m close enough to the second milestone to try and ship them together. This means we can save time:
  • No need to change the M1 context (deferred writes that get flushed later) for M2 (memory writes that get committed directly). This saves a bit of reviewer time.
  • We can skip beta testing M1 on many types of merge.update calls and just scope it to just rebase.