The first milestone for landing in-memory merge work upstream is to introduce an optional mode to defer(to in-memory) all writes during the merge process, and then write them out to disk at the end of the merge process(or if any exception is raised). This is a“proof of work” phase of the riskiest part of in-memory-merge that might have the most hidden edge cases; the goal is to have as many merges and rebases use this logic as possible. With M1 in use, we can focus on M2.
Status:Patches here. Tests pass and it seems to work fine. Need to:
Figure out if the functionality of the overlay contexts I used could be rolled into existing classes
Rebase ontop of default(Jun's rebase changes)
Do a pre-code-review with Jun and Durham
Send more in-depth proposal to Sean, Augie, Martin.
Figure out what order to stage this as patches when the freeze listed.
Details:
The contract for merge.update doesn't change. All callers are supported so we can find unexpected edge cases.
No changes to the dirstate are needed in this phase(because we flush just beforecalling recordupdates())
If it works, it can successfully enforce some boundaries on the problem(e.g.: multiple writes to one file during a merge can be combined into one write).
This mode is optional, and can be disabled to make all writes go immediately to the filesystem. This lets us keep use cases like largefiles around that'd be a pain to convert.
Contents can be flushed in-flight. This is done on an error or before launching an external merge tool.
If using workers in batchget/batchremove, each forked worker must flush before and after running.
merge.update has to be wrapped in a giant try/except(to flush on exceptions).
Concerns raised:
(By Durham) Flushing all the writes all at once might be slower than the old method of distributing them throughout the merge process.
Also, you might run out of memory space during a large update(probably a bigger issue!). Consider:
Adding a limit to the number of files that can be in-memory at one time, and flush early if we hit it.
Threading the flushing to multiple files, like backgroundclosing.
Optionally disabling this mode for big operations like hg update
Jun suggests: traceprofile, but disable workers.
M2
M2 takes the work of M1 and makes the whole process work without a working copy:
Instead of flushing the writes at the end, we simply convert the in-memory contents to a changectx.
The overlayworkingcontext needs to be able to set an arbitrary other changectx as the fallthrough(not just the current workingctx). It probably needs a new name.
Renames need to be tracked in the overlay ctxs, not dirstate.
recordupdates()(anything in the working copy) is not called if not working in the working copy.
Any case where we'd flush to disk(e.g. running a merge tool), an exception is raised instead.
Discussion:
merge.update takes a dest ctx which defaults to repo[None]. So the calling function can decide whether to run in-memory or not.
rebase: check whether . is in rebaseset.
Should be able to always discard an in-menmory merge without worry. No user data can be introduced in the merge at this point(no tools or conflict resolution).
Possibly include“retry without using in-memory merge if it fails” logic in this cut.
I have a rough M2 that works locally but needs to be cleaned up / turned into patches.
Status
Initially I was planning to roll the milestones as described here — however, I think I’m close enough to the second milestone to try and ship them together. This means we can save time:
No need to change the M1 context(deferred writes that get flushed later) for M2(memory writes that get committed directly). This saves a bit of reviewer time.
We can skip beta testing M1 on many types of merge.update calls and just scope it to just rebase.
Milestones
M1
M2
Code
Status