The ort merge machinery hit an assertion failure in a history with
criss-cross merges renamed a directory and a non-directory, which
has been corrected.
* en/ort-recursive-d-f-conflict-fix:
merge-ort: fix corner case recursive submodule/directory conflict handling
Running "git diff" with "--name-only" and other options that allows
us not to look at the blob contents, while objects that are lazily
fetched from a promisor remote, caused use-after-free, which has
been corrected.
* ds/diff-lazy-fetch-with-name-only-fix:
diff: avoid segfault with freed entries
Code clean-up.
* rs/tag-wo-the-repository:
tag: stop using the_repository
tag: support arbitrary repositories in parse_tag()
tag: support arbitrary repositories in gpg_verify_tag()
tag: use algo of repo parameter in parse_tag_buffer()
Use hook API to replace ad-hoc invocation of hook scripts with the
run_command() API.
* ar/run-command-hook:
receive-pack: convert receive hooks to hook API
receive-pack: convert update hooks to new API
hooks: allow callers to capture output
run-command: allow capturing of collated output
hook: allow overriding the ungroup option
reference-transaction: use hook API instead of run-command
transport: convert pre-push to hook API
hook: convert 'post-rewrite' hook in sequencer.c to hook API
hook: provide stdin via callback
run-command: add stdin callback for parallelization
run-command: add first helper for pp child states
Workaround the "iconv" shipped as part of macOS, which is broken
handling stateful ISO/IEC 2022 encoded strings.
* rs/macos-iconv-workaround:
macOS: use iconv from Homebrew if needed and present
macOS: make Homebrew use configurable
Update HTTP tests to adjust for changes in curl 8.18.0
* jk/test-curl-updates:
t5563: add missing end-of-line in HTTP header
t5551: handle trailing slashes in expected cookies output
Brown-paper-bag fix to a recently graduated
'kn/maintenance-is-needed' topic.
* gf/maintenance-is-needed-fix:
refs: dereference the value of the required pointer
Even when there is no changes in the packfile and no need to
recompute bitmaps, "git repack" recomputed and updated the MIDX
file, which has been corrected.
* ps/repack-avoid-noop-midx-rewrite:
midx-write: skip rewriting MIDX with `--stdin-packs` unless needed
midx-write: extract function to test whether MIDX needs updating
midx: fix `BUG()` when getting preferred pack without a reverse index
Prepare test suite for Git for Windows that supports symbolic
links.
* js/test-symlink-windows:
t7800: work around the MSYS path conversion on Windows
t6423: introduce Windows-specific handling for symlinking to /dev/null
t1305: skip symlink tests that do not apply to Windows
t1006: accommodate for symlink support in MSYS2
t0600: fix incomplete prerequisite for a test case
t0301: another fix for Windows compatibility
t0001: handle `diff --no-index` gracefully
mingw: special-case `open(symlink, O_CREAT | O_EXCL)`
apply: symbolic links lack a "trustable executable bit"
t9700: accommodate for Windows paths
More object database related information are shown in "git repo
structure" output.
* jt/repo-struct-more-objinfo:
builtin/repo: add object disk size info to structure table
builtin/repo: add disk size info to keyvalue stucture output
builtin/repo: add inflated object info to structure table
builtin/repo: add inflated object info to keyvalue structure output
builtin/repo: humanise count values in structure output
strbuf: split out logic to humanise byte values
builtin/repo: group per-type object values into struct
When computing a diff in a partial clone, there is a chance that we
could trigger a prefetch of missing objects at the same time as we are
freeing entries from the global diff queue. This is difficult to
reproduce, as we need to have some objects be freed from the queue
before triggering the prefetch of missing objects. There is a new test
in t4067 that does trigger the segmentation fault that results in this
case.
The fix is to set the queue pointer to NULL after it is freed, and then
to be careful about NULL values in the prefetch.
The more elaborate explanation is that within diffcore_std(), we may
skip the initial prefetch due to the output format (--name-only in the
test) and go straight to diffcore_skip_stat_unmatch(). In that method,
the index entries that have been invalidated by path changes show up as
entries but may be deleted because they are not actually content diffs
and only newer timestamps than expected. As those entries are deleted,
later entries are checked with diff_filespec_check_stat_unmatch(), which
uses diff_queued_diff_prefetch() as the missing_object_cb in its diff
options. That can trigger downloading missing objects if the appropriate
scenario occurs to trigger a call to diff_popoulate_filespec(). It's
finally within that callback to diff_queued_diff_prefetch() that the
segfault occurs.
The test was hard to find because it required some real differences,
some not-different files that had a newer modified time, and the order
of those files alphabetically was important to trigger the deletion
before the prefetch was triggered.
I briefly considered a "lock" member for the diff queue, but it was a
much larger diff and introduced many more possible error scenarios.
Signed-off-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Replace 'test -f' with the test_path_is_file in
t5403-post-checkout-hook.sh. This helper provides better error
messages when tests fail, making it easier to debug issues.
Signed-off-by: Deveshi Dwivedi <deveshigurgaon@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
At GitHub, a few repositories were triggering errors of the form:
git: merge-ort.c:3037: process_renames: Assertion `newinfo && !newinfo->merged.clean' failed.
Aborted (core dumped)
While these may look similar to both
a562d90a350d (merge-ort: fix failing merges in special corner case,
2025-11-03)
and
f6ecb603ff8a (merge-ort: fix directory rename on top of source of other
rename/delete, 2025-08-06)
the cause is different and in this case the problem is not an
over-conservative assertion, but a bug before the assertion where we did
not update all relevant state appropriately.
It sadly took me a really long time to figure out how to get a simple
reproducer for this one. It doesn't really have that many moving parts,
but there are multiple pieces of background information needed to
understand it.
First of all, when we have two files added at the same path, merge-ort
does a two-way merge of those files. If we have two directories added
at the same path, we basically do the same thing (taking the union of
files, and two-way merging files with the same name). But two-way
merging requires components of the same type. We can't merge the
contents of a regular file with a directory, or with a symlink, or with
a submodule. Nor can any of those other types be merged with each
other, e.g. merging a submodule with a directory is a bad idea. When
two paths have the same name but their types do not match, merge-ort is
forced to move one of them to an alternate filename (using the
unique_path() function).
Second, if two commits being merged have more than one merge-base,
merge-ort will merge the merge-bases to create a virtual merge-base, and
use that as the base commit.
Third, one of the really important optimizations in merge-ort is trivial
tree-level resolution (roughly meaning merging trees without recursing
into them). This optimization has some nuance to it that is important
to the current bug, and to understand it, it helps to first look at the
high-level overview of how merge-ort runs; there are basically three
high-level functions that the work is divided between:
collect_merge_info() - walks the top-level trees getting individual
paths of interest
detect_renames() - detect renames between paths in order to match up
paths for three-way merging
process_entries() - does a few things of interest:
* three-way merging of files,
* other special handling (e.g. adjusting paths with conflicting
types to avoid path collisions)
* as it finishes handling all the files within a subdirectory,
writes out a new tree object for that directory
If it were not for renames, we could just always do tree-level merging
whenever the tree on at least one side was unmodified. Unfortunately,
we need to recurse into trees to determine whether there are renames.
However, we can also do tree-level merging so long as there aren't any
*relevant* renames (another merge-ort optimization), which we can
determine without recursing into trees.
We would also be able to do tree-level merging if we somehow apriori
knew what renames existed, by only recursing into the trees which we
could otherwise trivially merge if they contained files involved in
renames. That might not seem useful, because we need to find out the
renames and we have to recurse into trees to do so, but when you find
out that the process_entries() step is more computationally expensive
than the collect_merge_info() step, it yields an interesting strategy:
* run collect_merge_info()
* run detect_renames()
* cache the renames()
* restart -- rerun collect_merge_info(), using the cached renames to
only recurse into the needed trees
* we already have the renames cached so no need to re-detect
* run process_entries() on the reduced list of paths
which was implemented back in 7bee6c100431 (merge-ort: avoid recursing
into directories when we don't need to, 2021-07-16) Crucially, this
restarting only occurs if the number of paths we could skip recursing
into exceeds the number we still need to recurse into by some safety
factor (wanted_factor in handle_deferred_entries()); forgetting this
fact is a great way to repeatedly fail to create a minimal testcase for
several days and go down alternate wrong paths).
Now, I earlier summarized this optimization as "merging trees without
recursing into them", but this optimization does not require that all
three sides of history has a directory at a given path. So long as the
tree on one side matches the tree in the base version, we can decide to
resolve in favor of whatever the other side of history has at that path
-- be it a directory, a file, a submodule, or a symlink. Unfortunately,
the code in question didn't fully realize this, and was written assuming
the base version and both sides would have a directory at the given
path, as can be seen by the "ci->filemask == 0" comment in
resolve_trivial_directory_merge() that was added as part of 7bee6c100431
(merge-ort: avoid recursing into directories when we don't need to,
2021-07-16). A few additional lines of code are needed to handle cases
where we have something other than a directory on the other side of
history.
But, knowing that resolve_trivial_directory_merge() doesn't have
sufficient state updating logic doesn't show us how to trigger a bug
without combining with the other bits of information we provided above.
Here's a relevant testcase:
* branches A & B
* commit A1: adds "folder" as a directory with files tracked under it
* commit B1: adds "folder" as a submodule
* commit A2: merges B1 into A1, keeping "folder" as a directory
(and in fact, with no changes to "folder" since A1), discarding the
submodule
* commit B2: merges A1 into B1, keeping "folder" as a submodule
(and in fact, with no changes to "folder" since B1), discarding the
directory
Here, if we try to merge A2 & B2, the logic proceeds as follows:
* we have multiple merge-bases: A1 & B1. So we have to merge those
to get a virtual merge base.
* due to "folder" as a directory and "folder" as a submodule, the
path collision logic triggers and renames "folder" as a submodule
to "folder~Temporary merge branch 2" so we can keep it alongside
"folder" as a directory.
* we now have a virtual merge base (containing both "folder"
directory and a "folder~Temporary merge branch 2" submodule) and
can now do the outer merge
* in the first step of the outer merge, we attempt to defer recursing
into folder/ as a directory, but find we need to for rename
detection.
* in rename detection, we note that "folder~Temporary merge branch 2"
has the same hash as "folder" as a submodule in B2, which means we
have an exact rename.
* after rename detection, we discover no path in folder/ is needed
for renames, and so we can cache renames and restart.
* after restarting, we avoid recursing into "folder/" and realize we
can resolve it trivially since it hasn't been modified. The
resolution removes "folder/", leaving us only "folder" as a
submodule from commit B2.
* After this point, we should have a rename/delete conflict on
"folder~Temporary merge branch 2" -> "folder", but our marking of
the merge of "folder" as clean broke our ability to handle that and
in fact triggers an assertion in process_renames().
When there was a df_conflict (directory/"file" conflict, where "file"
could be submodule or regular file or symlink), ensure
resolve_trivial_directory_merge() handles it properly. In particular:
* do not pre-emptively mark the path as cleanly merged if the
remaining path is a file; allow it to be processed in
process_entries() later to determine if it was clean
* clear the parts of dirmask or filemask corresponding to the matching
sides of history, since we are resolving those away
* clear the df_conflict bit afterwards; since we cleared away the two
matching sides and only have one side left, that one side can't
have a directory/file conflict with itself.
Also add the above minimal testcase showcasing this bug to t6422, **with
a sufficient number of paths under the folder/ directory to actually
trigger it**. (I wish I could have all those days back from all the
wrong paths I went down due to not having enough files under that
directory...)
I know this commit has a very high ratio of lines in the commit message
to lines of comments, and a relatively high ratio of comments to actual
code, but given how long it took me to track down, on the off chance
that we ever need to further modify this logic, I wanted it thoroughly
documented for future me and for whatever other poor soul might end up
needing to read this commit message.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
gpg_verify_tag() shows the passed in object name on error. Both callers
provide one. It falls back to abbreviated hashes for future callers
that pass in a NULL name. DEFAULT_ABBREV is default_abbrev, which in
turn is a global variable that's populated by git_default_config() and
only available with USE_THE_REPOSITORY_VARIABLE.
Don't let that hypothetical hold us back from getting rid of
the_repository in tag.c. Fall back to full hashes, which are more
appropriate for error messages anyway. This allows us to stop setting
USE_THE_REPOSITORY_VARIABLE.
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Allow callers of parse_tag() pass in the repository to use. Let most of
them pass in the_repository to get the same result as before. One of
them has stopped using the_repository in ef9b0370da (sha1-name.c: store
and use repo in struct disambiguate_state, 2019-04-16); let it pass in
its stored repository.
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Allow callers of gpg_verify_tag() specify the repository to use by
providing a parameter for that. One of the two has not been using
the_repository since 43a8391977 (builtin/verify-tag: stop using
`the_repository`, 2025-03-08); let it pass in the correct repository.
The other simply passes the_repository to get the same result as before.
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Stop using "the_hash_algo" explicitly and implictly via parse_oid_hex()
and instead use the "hash_algo" member of the passed in repository,
which is more correct.
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The code path that enumerates promisor objects have been optimized
to skip pointlessly parsing blob objects.
* ap/packfile-promisor-object-optim:
packfile: skip hash checks in add_promisor_object()
object: apply skip_hash and discard_tree optimizations to unknown blobs too
Documentation update.
* jc/doc-commit-signoff-config:
signoff-option: linkify the reference to gitfaq
commit: document that $command.signoff will not be added
git_config_get_expiry_in_days() calls git_parse_signed() with the
maximum value of int, which is equivalent to calling git_parse_int().
Do that instead, as its shorter and clearer.
This requires demoting "days" to int to match. Promote "scale" to
intmax_t in turn to arrive at the same result when multiplying them.
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This converts the last remaining hooks to the new hook API, for
the same benefits as the previous conversions (no need to toggle
signals, manage custom struct child_process, call find_hook(),
prepares for specifyinig hooks via configs, etc.).
I noticed a performance degradation when processing large amounts
of hook input with just 1 line per callback, due to run-command's
poll loop, therefore I batched 500 lines per callback, to ensure
similar pipe throughput as before and to avoid hook child waiting
on stdin.
Signed-off-by: Emily Shaffer <emilyshaffer@google.com>
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Adrian Ratiu <adrian.ratiu@collabora.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Use the new hook sideband API introduced in the previous commit.
The hook API avoids creating a custom struct child_process and other
internal hook plumbing (e.g. calling find_hook()) and prepares for
the specification of hooks via configs or running parallel hooks.
Execution is still sequential through the current hook.[ch] via the
run_process_parallel_opts.processes=1 arg.
Signed-off-by: Emily Shaffer <emilyshaffer@google.com>
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Adrian Ratiu <adrian.ratiu@collabora.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Some server-side hooks will require capturing output to send over
sideband instead of printing directly to stderr. Expose that capability.
Signed-off-by: Emily Shaffer <emilyshaffer@google.com>
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Some callers, for example server-side hooks which wish to relay hook
output to clients across a transport, want to capture what would
normally print to stderr and do something else with it. Allow that via a
callback.
By calling the callback regardless of whether there's output available,
we allow clients to send e.g. a keepalive if necessary.
Because we expose a strbuf, not a fd or FILE*, there's no need to create
a temporary pipe or similar - we can just skip the print to stderr and
instead hand it to the caller.
Signed-off-by: Emily Shaffer <emilyshaffer@google.com>
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Adrian Ratiu <adrian.ratiu@collabora.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When calling run_process_parallel() in run_hooks_opt(), the
ungroup option is currently hardcoded to .ungroup = 1.
This causes problems when ungrouping should be disabled, for
example when sideband-reading collated output from child hooks,
because sideband-reading and ungrouping are mutually exclusive.
Thus a new hook.h option is added to allow overriding.
The existing ungroup=1 behavior is preserved in the run_hooks()
API and the "hook run" command. We could modify these to take
an option if necessary, so I added two code comments there.
Signed-off-by: Adrian Ratiu <adrian.ratiu@collabora.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Convert the reference-transaction hook to the new hook API,
so it doesn't need to set up a struct child_process, call
find_hook or toggle the pipe signals.
The stdin feed callback is processing one ref update per
call. I haven't noticed any performance degradation due
to this, however we can batch as many we want in each call,
to ensure a good pipe throughtput (i.e. the child does not
wait after stdin).
Helped-by: Emily Shaffer <nasamuffin@google.com>
Signed-off-by: Emily Shaffer <emilyshaffer@google.com>
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Adrian Ratiu <adrian.ratiu@collabora.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Move the pre-push hook from custom run-command invocations to
the new hook API which doesn't require a custom child_process
structure and signal toggling.
Signed-off-by: Emily Shaffer <emilyshaffer@google.com>
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Adrian Ratiu <adrian.ratiu@collabora.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Replace the custom run-command calls used by post-rewrite with
the newer and simpler hook_run_opt(), which does not need to
create a custom 'struct child_process' or call find_hook().
Another benefit of using the hook API is that hook_run_opt()
handles the SIGPIPE toggle logic.
Signed-off-by: Emily Shaffer <emilyshaffer@google.com>
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Adrian Ratiu <adrian.ratiu@collabora.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This adds a callback mechanism for feeding stdin to hooks alongside
the existing path_to_stdin (which slurps a file's content to stdin).
The advantage of this new callback is that it can feed stdin without
going through the FS layer. This helps when feeding large amount of
data and uses the run-command parallel stdin callback introduced in
the preceding commit.
Signed-off-by: Emily Shaffer <emilyshaffer@google.com>
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Adrian Ratiu <adrian.ratiu@collabora.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
If a user of the run_processes_parallel() API wants to pipe a large
amount of information to the stdin of each parallel command, that
data could exceed the pipe buffer of the process's stdin and can be
too big to store in-memory via strbuf & friends or to slurp to a file.
Generally this is solved by repeatedly writing to child_process.in
between calls to start_command() and finish_command(). For a specific
pre-existing example of this, see transport.c:run_pre_push_hook().
This adds a generic callback API to run_processes_parallel() to do
exactly that in a unified manner, similar to the existing callback APIs,
which can then be used by hooks.h to convert the remaining hooks to the
new, simpler parallel interface.
Signed-off-by: Emily Shaffer <emilyshaffer@google.com>
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Adrian Ratiu <adrian.ratiu@collabora.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
There is a recurring pattern of testing parallel process child states
and file descriptors to determine if a child is running, receiving any
input or if it's ready for cleanup.
Name the pp_child structure and introduce a first helper to make these
checks more readable. Next commits will add more helpers and checks.
Suggested-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Adrian Ratiu <adrian.ratiu@collabora.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Building a list using commit_list_insert_by_date() has quadratic worst
case complexity. Avoid it by using prio_queue.
Use prio_queue_peek()+prio_queue_replace() instead of prio_queue_get()+
prio_queue_put() if possible, as the former only rebalance the
prio_queue heap once instead of twice.
In sane repositories this won't make much of a difference because the
number of items in the list or queue won't be very high:
Benchmark 1: ./git_v2.52.0 show-branch origin/main origin/next origin/seen origin/todo
Time (mean ± σ): 538.2 ms ± 0.8 ms [User: 527.6 ms, System: 9.6 ms]
Range (min … max): 537.0 ms … 539.2 ms 10 runs
Benchmark 2: ./git show-branch origin/main origin/next origin/seen origin/todo
Time (mean ± σ): 530.6 ms ± 0.4 ms [User: 519.8 ms, System: 9.8 ms]
Range (min … max): 530.1 ms … 531.3 ms 10 runs
Summary
./git show-branch origin/main origin/next origin/seen origin/todo ran
1.01 ± 0.00 times faster than ./git_v2.52.0 show-branch origin/main origin/next origin/seen origin/todo
That number is not limited, though, and in pathological cases like the
one in p6010 we see a sizable improvement:
Test v2.52.0 HEAD
------------------------------------------------------------------
6010.4: git show-branch 2.19(2.19+0.00) 0.03(0.02+0.00) -98.6%
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The library function iconv(3) supplied with macOS versions 15.7.2
(Sequoia) and 26.1 (Tahoe) is unreliable when doing conversions from
ISO-2022-JP to UTF-8 in multiple steps; t3900 reports this breakage:
not ok 17 - ISO-2022-JP should be shown in UTF-8 now
not ok 25 - ISO-2022-JP should be shown in UTF-8 now
not ok 38 - commit --fixup into ISO-2022-JP from UTF-8
As a workaround, use libiconv from Homebrew, if available. Search it in
its default locations: /opt/homebrew for Apple Silicon and /usr/local
for macOS Intel, with the former taking precedence. Respect ICONVDIR if
already set by the user, though.
Helped-by: Koji Nakamaru <koji.nakamaru@gree.net>
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
On macOS we opportunistically use Homebrew-installed versions of
gettext(3) and msgfmt(1). Make that behavior configurable by providing
make variables to disable Homebrew usage (NO_HOMEBREW) and to allow
using a non-default installation location (HOMEBREW_PREFIX).
Include and link only the gettext keg via the symlink opt/gettext
pointing to its installed version instead of using the Homebrew prefix.
This is simpler and prevents accidentally including other libraries.
Suggested-by: Carlo Marcelo Arenas Belón <carenas@gmail.com>
Suggested-by: Torsten Bögershausen <tboegi@web.de>
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>