The ort merge machinery hit an assertion failure in a history with
criss-cross merges renamed a directory and a non-directory, which
has been corrected.
* en/ort-recursive-d-f-conflict-fix:
merge-ort: fix corner case recursive submodule/directory conflict handling
Running "git diff" with "--name-only" and other options that allows
us not to look at the blob contents, while objects that are lazily
fetched from a promisor remote, caused use-after-free, which has
been corrected.
* ds/diff-lazy-fetch-with-name-only-fix:
diff: avoid segfault with freed entries
Code clean-up.
* rs/tag-wo-the-repository:
tag: stop using the_repository
tag: support arbitrary repositories in parse_tag()
tag: support arbitrary repositories in gpg_verify_tag()
tag: use algo of repo parameter in parse_tag_buffer()
When computing a diff in a partial clone, there is a chance that we
could trigger a prefetch of missing objects at the same time as we are
freeing entries from the global diff queue. This is difficult to
reproduce, as we need to have some objects be freed from the queue
before triggering the prefetch of missing objects. There is a new test
in t4067 that does trigger the segmentation fault that results in this
case.
The fix is to set the queue pointer to NULL after it is freed, and then
to be careful about NULL values in the prefetch.
The more elaborate explanation is that within diffcore_std(), we may
skip the initial prefetch due to the output format (--name-only in the
test) and go straight to diffcore_skip_stat_unmatch(). In that method,
the index entries that have been invalidated by path changes show up as
entries but may be deleted because they are not actually content diffs
and only newer timestamps than expected. As those entries are deleted,
later entries are checked with diff_filespec_check_stat_unmatch(), which
uses diff_queued_diff_prefetch() as the missing_object_cb in its diff
options. That can trigger downloading missing objects if the appropriate
scenario occurs to trigger a call to diff_popoulate_filespec(). It's
finally within that callback to diff_queued_diff_prefetch() that the
segfault occurs.
The test was hard to find because it required some real differences,
some not-different files that had a newer modified time, and the order
of those files alphabetically was important to trigger the deletion
before the prefetch was triggered.
I briefly considered a "lock" member for the diff queue, but it was a
much larger diff and introduced many more possible error scenarios.
Signed-off-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Replace 'test -f' with the test_path_is_file in
t5403-post-checkout-hook.sh. This helper provides better error
messages when tests fail, making it easier to debug issues.
Signed-off-by: Deveshi Dwivedi <deveshigurgaon@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
At GitHub, a few repositories were triggering errors of the form:
git: merge-ort.c:3037: process_renames: Assertion `newinfo && !newinfo->merged.clean' failed.
Aborted (core dumped)
While these may look similar to both
a562d90a350d (merge-ort: fix failing merges in special corner case,
2025-11-03)
and
f6ecb603ff8a (merge-ort: fix directory rename on top of source of other
rename/delete, 2025-08-06)
the cause is different and in this case the problem is not an
over-conservative assertion, but a bug before the assertion where we did
not update all relevant state appropriately.
It sadly took me a really long time to figure out how to get a simple
reproducer for this one. It doesn't really have that many moving parts,
but there are multiple pieces of background information needed to
understand it.
First of all, when we have two files added at the same path, merge-ort
does a two-way merge of those files. If we have two directories added
at the same path, we basically do the same thing (taking the union of
files, and two-way merging files with the same name). But two-way
merging requires components of the same type. We can't merge the
contents of a regular file with a directory, or with a symlink, or with
a submodule. Nor can any of those other types be merged with each
other, e.g. merging a submodule with a directory is a bad idea. When
two paths have the same name but their types do not match, merge-ort is
forced to move one of them to an alternate filename (using the
unique_path() function).
Second, if two commits being merged have more than one merge-base,
merge-ort will merge the merge-bases to create a virtual merge-base, and
use that as the base commit.
Third, one of the really important optimizations in merge-ort is trivial
tree-level resolution (roughly meaning merging trees without recursing
into them). This optimization has some nuance to it that is important
to the current bug, and to understand it, it helps to first look at the
high-level overview of how merge-ort runs; there are basically three
high-level functions that the work is divided between:
collect_merge_info() - walks the top-level trees getting individual
paths of interest
detect_renames() - detect renames between paths in order to match up
paths for three-way merging
process_entries() - does a few things of interest:
* three-way merging of files,
* other special handling (e.g. adjusting paths with conflicting
types to avoid path collisions)
* as it finishes handling all the files within a subdirectory,
writes out a new tree object for that directory
If it were not for renames, we could just always do tree-level merging
whenever the tree on at least one side was unmodified. Unfortunately,
we need to recurse into trees to determine whether there are renames.
However, we can also do tree-level merging so long as there aren't any
*relevant* renames (another merge-ort optimization), which we can
determine without recursing into trees.
We would also be able to do tree-level merging if we somehow apriori
knew what renames existed, by only recursing into the trees which we
could otherwise trivially merge if they contained files involved in
renames. That might not seem useful, because we need to find out the
renames and we have to recurse into trees to do so, but when you find
out that the process_entries() step is more computationally expensive
than the collect_merge_info() step, it yields an interesting strategy:
* run collect_merge_info()
* run detect_renames()
* cache the renames()
* restart -- rerun collect_merge_info(), using the cached renames to
only recurse into the needed trees
* we already have the renames cached so no need to re-detect
* run process_entries() on the reduced list of paths
which was implemented back in 7bee6c100431 (merge-ort: avoid recursing
into directories when we don't need to, 2021-07-16) Crucially, this
restarting only occurs if the number of paths we could skip recursing
into exceeds the number we still need to recurse into by some safety
factor (wanted_factor in handle_deferred_entries()); forgetting this
fact is a great way to repeatedly fail to create a minimal testcase for
several days and go down alternate wrong paths).
Now, I earlier summarized this optimization as "merging trees without
recursing into them", but this optimization does not require that all
three sides of history has a directory at a given path. So long as the
tree on one side matches the tree in the base version, we can decide to
resolve in favor of whatever the other side of history has at that path
-- be it a directory, a file, a submodule, or a symlink. Unfortunately,
the code in question didn't fully realize this, and was written assuming
the base version and both sides would have a directory at the given
path, as can be seen by the "ci->filemask == 0" comment in
resolve_trivial_directory_merge() that was added as part of 7bee6c100431
(merge-ort: avoid recursing into directories when we don't need to,
2021-07-16). A few additional lines of code are needed to handle cases
where we have something other than a directory on the other side of
history.
But, knowing that resolve_trivial_directory_merge() doesn't have
sufficient state updating logic doesn't show us how to trigger a bug
without combining with the other bits of information we provided above.
Here's a relevant testcase:
* branches A & B
* commit A1: adds "folder" as a directory with files tracked under it
* commit B1: adds "folder" as a submodule
* commit A2: merges B1 into A1, keeping "folder" as a directory
(and in fact, with no changes to "folder" since A1), discarding the
submodule
* commit B2: merges A1 into B1, keeping "folder" as a submodule
(and in fact, with no changes to "folder" since B1), discarding the
directory
Here, if we try to merge A2 & B2, the logic proceeds as follows:
* we have multiple merge-bases: A1 & B1. So we have to merge those
to get a virtual merge base.
* due to "folder" as a directory and "folder" as a submodule, the
path collision logic triggers and renames "folder" as a submodule
to "folder~Temporary merge branch 2" so we can keep it alongside
"folder" as a directory.
* we now have a virtual merge base (containing both "folder"
directory and a "folder~Temporary merge branch 2" submodule) and
can now do the outer merge
* in the first step of the outer merge, we attempt to defer recursing
into folder/ as a directory, but find we need to for rename
detection.
* in rename detection, we note that "folder~Temporary merge branch 2"
has the same hash as "folder" as a submodule in B2, which means we
have an exact rename.
* after rename detection, we discover no path in folder/ is needed
for renames, and so we can cache renames and restart.
* after restarting, we avoid recursing into "folder/" and realize we
can resolve it trivially since it hasn't been modified. The
resolution removes "folder/", leaving us only "folder" as a
submodule from commit B2.
* After this point, we should have a rename/delete conflict on
"folder~Temporary merge branch 2" -> "folder", but our marking of
the merge of "folder" as clean broke our ability to handle that and
in fact triggers an assertion in process_renames().
When there was a df_conflict (directory/"file" conflict, where "file"
could be submodule or regular file or symlink), ensure
resolve_trivial_directory_merge() handles it properly. In particular:
* do not pre-emptively mark the path as cleanly merged if the
remaining path is a file; allow it to be processed in
process_entries() later to determine if it was clean
* clear the parts of dirmask or filemask corresponding to the matching
sides of history, since we are resolving those away
* clear the df_conflict bit afterwards; since we cleared away the two
matching sides and only have one side left, that one side can't
have a directory/file conflict with itself.
Also add the above minimal testcase showcasing this bug to t6422, **with
a sufficient number of paths under the folder/ directory to actually
trigger it**. (I wish I could have all those days back from all the
wrong paths I went down due to not having enough files under that
directory...)
I know this commit has a very high ratio of lines in the commit message
to lines of comments, and a relatively high ratio of comments to actual
code, but given how long it took me to track down, on the off chance
that we ever need to further modify this logic, I wanted it thoroughly
documented for future me and for whatever other poor soul might end up
needing to read this commit message.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
gpg_verify_tag() shows the passed in object name on error. Both callers
provide one. It falls back to abbreviated hashes for future callers
that pass in a NULL name. DEFAULT_ABBREV is default_abbrev, which in
turn is a global variable that's populated by git_default_config() and
only available with USE_THE_REPOSITORY_VARIABLE.
Don't let that hypothetical hold us back from getting rid of
the_repository in tag.c. Fall back to full hashes, which are more
appropriate for error messages anyway. This allows us to stop setting
USE_THE_REPOSITORY_VARIABLE.
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Allow callers of parse_tag() pass in the repository to use. Let most of
them pass in the_repository to get the same result as before. One of
them has stopped using the_repository in ef9b0370da (sha1-name.c: store
and use repo in struct disambiguate_state, 2019-04-16); let it pass in
its stored repository.
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Allow callers of gpg_verify_tag() specify the repository to use by
providing a parameter for that. One of the two has not been using
the_repository since 43a8391977 (builtin/verify-tag: stop using
`the_repository`, 2025-03-08); let it pass in the correct repository.
The other simply passes the_repository to get the same result as before.
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Stop using "the_hash_algo" explicitly and implictly via parse_oid_hex()
and instead use the "hash_algo" member of the passed in repository,
which is more correct.
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-12-29 22:02:53 +09:00
16 changed files with 198 additions and 27 deletions
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.