The ort merge machinery hit an assertion failure in a history with
criss-cross merges renamed a directory and a non-directory, which
has been corrected.
* en/ort-recursive-d-f-conflict-fix:
merge-ort: fix corner case recursive submodule/directory conflict handling
Running "git diff" with "--name-only" and other options that allows
us not to look at the blob contents, while objects that are lazily
fetched from a promisor remote, caused use-after-free, which has
been corrected.
* ds/diff-lazy-fetch-with-name-only-fix:
diff: avoid segfault with freed entries
Code clean-up.
* rs/tag-wo-the-repository:
tag: stop using the_repository
tag: support arbitrary repositories in parse_tag()
tag: support arbitrary repositories in gpg_verify_tag()
tag: use algo of repo parameter in parse_tag_buffer()
Use hook API to replace ad-hoc invocation of hook scripts with the
run_command() API.
* ar/run-command-hook:
receive-pack: convert receive hooks to hook API
receive-pack: convert update hooks to new API
hooks: allow callers to capture output
run-command: allow capturing of collated output
hook: allow overriding the ungroup option
reference-transaction: use hook API instead of run-command
transport: convert pre-push to hook API
hook: convert 'post-rewrite' hook in sequencer.c to hook API
hook: provide stdin via callback
run-command: add stdin callback for parallelization
run-command: add first helper for pp child states
Workaround the "iconv" shipped as part of macOS, which is broken
handling stateful ISO/IEC 2022 encoded strings.
* rs/macos-iconv-workaround:
macOS: use iconv from Homebrew if needed and present
macOS: make Homebrew use configurable
Update HTTP tests to adjust for changes in curl 8.18.0
* jk/test-curl-updates:
t5563: add missing end-of-line in HTTP header
t5551: handle trailing slashes in expected cookies output
Brown-paper-bag fix to a recently graduated
'kn/maintenance-is-needed' topic.
* gf/maintenance-is-needed-fix:
refs: dereference the value of the required pointer
Even when there is no changes in the packfile and no need to
recompute bitmaps, "git repack" recomputed and updated the MIDX
file, which has been corrected.
* ps/repack-avoid-noop-midx-rewrite:
midx-write: skip rewriting MIDX with `--stdin-packs` unless needed
midx-write: extract function to test whether MIDX needs updating
midx: fix `BUG()` when getting preferred pack without a reverse index
Prepare test suite for Git for Windows that supports symbolic
links.
* js/test-symlink-windows:
t7800: work around the MSYS path conversion on Windows
t6423: introduce Windows-specific handling for symlinking to /dev/null
t1305: skip symlink tests that do not apply to Windows
t1006: accommodate for symlink support in MSYS2
t0600: fix incomplete prerequisite for a test case
t0301: another fix for Windows compatibility
t0001: handle `diff --no-index` gracefully
mingw: special-case `open(symlink, O_CREAT | O_EXCL)`
apply: symbolic links lack a "trustable executable bit"
t9700: accommodate for Windows paths
More object database related information are shown in "git repo
structure" output.
* jt/repo-struct-more-objinfo:
builtin/repo: add object disk size info to structure table
builtin/repo: add disk size info to keyvalue stucture output
builtin/repo: add inflated object info to structure table
builtin/repo: add inflated object info to keyvalue structure output
builtin/repo: humanise count values in structure output
strbuf: split out logic to humanise byte values
builtin/repo: group per-type object values into struct
When computing a diff in a partial clone, there is a chance that we
could trigger a prefetch of missing objects at the same time as we are
freeing entries from the global diff queue. This is difficult to
reproduce, as we need to have some objects be freed from the queue
before triggering the prefetch of missing objects. There is a new test
in t4067 that does trigger the segmentation fault that results in this
case.
The fix is to set the queue pointer to NULL after it is freed, and then
to be careful about NULL values in the prefetch.
The more elaborate explanation is that within diffcore_std(), we may
skip the initial prefetch due to the output format (--name-only in the
test) and go straight to diffcore_skip_stat_unmatch(). In that method,
the index entries that have been invalidated by path changes show up as
entries but may be deleted because they are not actually content diffs
and only newer timestamps than expected. As those entries are deleted,
later entries are checked with diff_filespec_check_stat_unmatch(), which
uses diff_queued_diff_prefetch() as the missing_object_cb in its diff
options. That can trigger downloading missing objects if the appropriate
scenario occurs to trigger a call to diff_popoulate_filespec(). It's
finally within that callback to diff_queued_diff_prefetch() that the
segfault occurs.
The test was hard to find because it required some real differences,
some not-different files that had a newer modified time, and the order
of those files alphabetically was important to trigger the deletion
before the prefetch was triggered.
I briefly considered a "lock" member for the diff queue, but it was a
much larger diff and introduced many more possible error scenarios.
Signed-off-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Replace 'test -f' with the test_path_is_file in
t5403-post-checkout-hook.sh. This helper provides better error
messages when tests fail, making it easier to debug issues.
Signed-off-by: Deveshi Dwivedi <deveshigurgaon@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
At GitHub, a few repositories were triggering errors of the form:
git: merge-ort.c:3037: process_renames: Assertion `newinfo && !newinfo->merged.clean' failed.
Aborted (core dumped)
While these may look similar to both
a562d90a350d (merge-ort: fix failing merges in special corner case,
2025-11-03)
and
f6ecb603ff8a (merge-ort: fix directory rename on top of source of other
rename/delete, 2025-08-06)
the cause is different and in this case the problem is not an
over-conservative assertion, but a bug before the assertion where we did
not update all relevant state appropriately.
It sadly took me a really long time to figure out how to get a simple
reproducer for this one. It doesn't really have that many moving parts,
but there are multiple pieces of background information needed to
understand it.
First of all, when we have two files added at the same path, merge-ort
does a two-way merge of those files. If we have two directories added
at the same path, we basically do the same thing (taking the union of
files, and two-way merging files with the same name). But two-way
merging requires components of the same type. We can't merge the
contents of a regular file with a directory, or with a symlink, or with
a submodule. Nor can any of those other types be merged with each
other, e.g. merging a submodule with a directory is a bad idea. When
two paths have the same name but their types do not match, merge-ort is
forced to move one of them to an alternate filename (using the
unique_path() function).
Second, if two commits being merged have more than one merge-base,
merge-ort will merge the merge-bases to create a virtual merge-base, and
use that as the base commit.
Third, one of the really important optimizations in merge-ort is trivial
tree-level resolution (roughly meaning merging trees without recursing
into them). This optimization has some nuance to it that is important
to the current bug, and to understand it, it helps to first look at the
high-level overview of how merge-ort runs; there are basically three
high-level functions that the work is divided between:
collect_merge_info() - walks the top-level trees getting individual
paths of interest
detect_renames() - detect renames between paths in order to match up
paths for three-way merging
process_entries() - does a few things of interest:
* three-way merging of files,
* other special handling (e.g. adjusting paths with conflicting
types to avoid path collisions)
* as it finishes handling all the files within a subdirectory,
writes out a new tree object for that directory
If it were not for renames, we could just always do tree-level merging
whenever the tree on at least one side was unmodified. Unfortunately,
we need to recurse into trees to determine whether there are renames.
However, we can also do tree-level merging so long as there aren't any
*relevant* renames (another merge-ort optimization), which we can
determine without recursing into trees.
We would also be able to do tree-level merging if we somehow apriori
knew what renames existed, by only recursing into the trees which we
could otherwise trivially merge if they contained files involved in
renames. That might not seem useful, because we need to find out the
renames and we have to recurse into trees to do so, but when you find
out that the process_entries() step is more computationally expensive
than the collect_merge_info() step, it yields an interesting strategy:
* run collect_merge_info()
* run detect_renames()
* cache the renames()
* restart -- rerun collect_merge_info(), using the cached renames to
only recurse into the needed trees
* we already have the renames cached so no need to re-detect
* run process_entries() on the reduced list of paths
which was implemented back in 7bee6c100431 (merge-ort: avoid recursing
into directories when we don't need to, 2021-07-16) Crucially, this
restarting only occurs if the number of paths we could skip recursing
into exceeds the number we still need to recurse into by some safety
factor (wanted_factor in handle_deferred_entries()); forgetting this
fact is a great way to repeatedly fail to create a minimal testcase for
several days and go down alternate wrong paths).
Now, I earlier summarized this optimization as "merging trees without
recursing into them", but this optimization does not require that all
three sides of history has a directory at a given path. So long as the
tree on one side matches the tree in the base version, we can decide to
resolve in favor of whatever the other side of history has at that path
-- be it a directory, a file, a submodule, or a symlink. Unfortunately,
the code in question didn't fully realize this, and was written assuming
the base version and both sides would have a directory at the given
path, as can be seen by the "ci->filemask == 0" comment in
resolve_trivial_directory_merge() that was added as part of 7bee6c100431
(merge-ort: avoid recursing into directories when we don't need to,
2021-07-16). A few additional lines of code are needed to handle cases
where we have something other than a directory on the other side of
history.
But, knowing that resolve_trivial_directory_merge() doesn't have
sufficient state updating logic doesn't show us how to trigger a bug
without combining with the other bits of information we provided above.
Here's a relevant testcase:
* branches A & B
* commit A1: adds "folder" as a directory with files tracked under it
* commit B1: adds "folder" as a submodule
* commit A2: merges B1 into A1, keeping "folder" as a directory
(and in fact, with no changes to "folder" since A1), discarding the
submodule
* commit B2: merges A1 into B1, keeping "folder" as a submodule
(and in fact, with no changes to "folder" since B1), discarding the
directory
Here, if we try to merge A2 & B2, the logic proceeds as follows:
* we have multiple merge-bases: A1 & B1. So we have to merge those
to get a virtual merge base.
* due to "folder" as a directory and "folder" as a submodule, the
path collision logic triggers and renames "folder" as a submodule
to "folder~Temporary merge branch 2" so we can keep it alongside
"folder" as a directory.
* we now have a virtual merge base (containing both "folder"
directory and a "folder~Temporary merge branch 2" submodule) and
can now do the outer merge
* in the first step of the outer merge, we attempt to defer recursing
into folder/ as a directory, but find we need to for rename
detection.
* in rename detection, we note that "folder~Temporary merge branch 2"
has the same hash as "folder" as a submodule in B2, which means we
have an exact rename.
* after rename detection, we discover no path in folder/ is needed
for renames, and so we can cache renames and restart.
* after restarting, we avoid recursing into "folder/" and realize we
can resolve it trivially since it hasn't been modified. The
resolution removes "folder/", leaving us only "folder" as a
submodule from commit B2.
* After this point, we should have a rename/delete conflict on
"folder~Temporary merge branch 2" -> "folder", but our marking of
the merge of "folder" as clean broke our ability to handle that and
in fact triggers an assertion in process_renames().
When there was a df_conflict (directory/"file" conflict, where "file"
could be submodule or regular file or symlink), ensure
resolve_trivial_directory_merge() handles it properly. In particular:
* do not pre-emptively mark the path as cleanly merged if the
remaining path is a file; allow it to be processed in
process_entries() later to determine if it was clean
* clear the parts of dirmask or filemask corresponding to the matching
sides of history, since we are resolving those away
* clear the df_conflict bit afterwards; since we cleared away the two
matching sides and only have one side left, that one side can't
have a directory/file conflict with itself.
Also add the above minimal testcase showcasing this bug to t6422, **with
a sufficient number of paths under the folder/ directory to actually
trigger it**. (I wish I could have all those days back from all the
wrong paths I went down due to not having enough files under that
directory...)
I know this commit has a very high ratio of lines in the commit message
to lines of comments, and a relatively high ratio of comments to actual
code, but given how long it took me to track down, on the off chance
that we ever need to further modify this logic, I wanted it thoroughly
documented for future me and for whatever other poor soul might end up
needing to read this commit message.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
gpg_verify_tag() shows the passed in object name on error. Both callers
provide one. It falls back to abbreviated hashes for future callers
that pass in a NULL name. DEFAULT_ABBREV is default_abbrev, which in
turn is a global variable that's populated by git_default_config() and
only available with USE_THE_REPOSITORY_VARIABLE.
Don't let that hypothetical hold us back from getting rid of
the_repository in tag.c. Fall back to full hashes, which are more
appropriate for error messages anyway. This allows us to stop setting
USE_THE_REPOSITORY_VARIABLE.
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Allow callers of parse_tag() pass in the repository to use. Let most of
them pass in the_repository to get the same result as before. One of
them has stopped using the_repository in ef9b0370da (sha1-name.c: store
and use repo in struct disambiguate_state, 2019-04-16); let it pass in
its stored repository.
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Allow callers of gpg_verify_tag() specify the repository to use by
providing a parameter for that. One of the two has not been using
the_repository since 43a8391977 (builtin/verify-tag: stop using
`the_repository`, 2025-03-08); let it pass in the correct repository.
The other simply passes the_repository to get the same result as before.
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Stop using "the_hash_algo" explicitly and implictly via parse_oid_hex()
and instead use the "hash_algo" member of the passed in repository,
which is more correct.
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The code path that enumerates promisor objects have been optimized
to skip pointlessly parsing blob objects.
* ap/packfile-promisor-object-optim:
packfile: skip hash checks in add_promisor_object()
object: apply skip_hash and discard_tree optimizations to unknown blobs too
Documentation update.
* jc/doc-commit-signoff-config:
signoff-option: linkify the reference to gitfaq
commit: document that $command.signoff will not be added
git_config_get_expiry_in_days() calls git_parse_signed() with the
maximum value of int, which is equivalent to calling git_parse_int().
Do that instead, as its shorter and clearer.
This requires demoting "days" to int to match. Promote "scale" to
intmax_t in turn to arrive at the same result when multiplying them.
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This converts the last remaining hooks to the new hook API, for
the same benefits as the previous conversions (no need to toggle
signals, manage custom struct child_process, call find_hook(),
prepares for specifyinig hooks via configs, etc.).
I noticed a performance degradation when processing large amounts
of hook input with just 1 line per callback, due to run-command's
poll loop, therefore I batched 500 lines per callback, to ensure
similar pipe throughput as before and to avoid hook child waiting
on stdin.
Signed-off-by: Emily Shaffer <emilyshaffer@google.com>
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Adrian Ratiu <adrian.ratiu@collabora.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Use the new hook sideband API introduced in the previous commit.
The hook API avoids creating a custom struct child_process and other
internal hook plumbing (e.g. calling find_hook()) and prepares for
the specification of hooks via configs or running parallel hooks.
Execution is still sequential through the current hook.[ch] via the
run_process_parallel_opts.processes=1 arg.
Signed-off-by: Emily Shaffer <emilyshaffer@google.com>
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Adrian Ratiu <adrian.ratiu@collabora.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Some server-side hooks will require capturing output to send over
sideband instead of printing directly to stderr. Expose that capability.
Signed-off-by: Emily Shaffer <emilyshaffer@google.com>
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Some callers, for example server-side hooks which wish to relay hook
output to clients across a transport, want to capture what would
normally print to stderr and do something else with it. Allow that via a
callback.
By calling the callback regardless of whether there's output available,
we allow clients to send e.g. a keepalive if necessary.
Because we expose a strbuf, not a fd or FILE*, there's no need to create
a temporary pipe or similar - we can just skip the print to stderr and
instead hand it to the caller.
Signed-off-by: Emily Shaffer <emilyshaffer@google.com>
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Adrian Ratiu <adrian.ratiu@collabora.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When calling run_process_parallel() in run_hooks_opt(), the
ungroup option is currently hardcoded to .ungroup = 1.
This causes problems when ungrouping should be disabled, for
example when sideband-reading collated output from child hooks,
because sideband-reading and ungrouping are mutually exclusive.
Thus a new hook.h option is added to allow overriding.
The existing ungroup=1 behavior is preserved in the run_hooks()
API and the "hook run" command. We could modify these to take
an option if necessary, so I added two code comments there.
Signed-off-by: Adrian Ratiu <adrian.ratiu@collabora.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Convert the reference-transaction hook to the new hook API,
so it doesn't need to set up a struct child_process, call
find_hook or toggle the pipe signals.
The stdin feed callback is processing one ref update per
call. I haven't noticed any performance degradation due
to this, however we can batch as many we want in each call,
to ensure a good pipe throughtput (i.e. the child does not
wait after stdin).
Helped-by: Emily Shaffer <nasamuffin@google.com>
Signed-off-by: Emily Shaffer <emilyshaffer@google.com>
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Adrian Ratiu <adrian.ratiu@collabora.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Move the pre-push hook from custom run-command invocations to
the new hook API which doesn't require a custom child_process
structure and signal toggling.
Signed-off-by: Emily Shaffer <emilyshaffer@google.com>
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Adrian Ratiu <adrian.ratiu@collabora.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Replace the custom run-command calls used by post-rewrite with
the newer and simpler hook_run_opt(), which does not need to
create a custom 'struct child_process' or call find_hook().
Another benefit of using the hook API is that hook_run_opt()
handles the SIGPIPE toggle logic.
Signed-off-by: Emily Shaffer <emilyshaffer@google.com>
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Adrian Ratiu <adrian.ratiu@collabora.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This adds a callback mechanism for feeding stdin to hooks alongside
the existing path_to_stdin (which slurps a file's content to stdin).
The advantage of this new callback is that it can feed stdin without
going through the FS layer. This helps when feeding large amount of
data and uses the run-command parallel stdin callback introduced in
the preceding commit.
Signed-off-by: Emily Shaffer <emilyshaffer@google.com>
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Adrian Ratiu <adrian.ratiu@collabora.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
If a user of the run_processes_parallel() API wants to pipe a large
amount of information to the stdin of each parallel command, that
data could exceed the pipe buffer of the process's stdin and can be
too big to store in-memory via strbuf & friends or to slurp to a file.
Generally this is solved by repeatedly writing to child_process.in
between calls to start_command() and finish_command(). For a specific
pre-existing example of this, see transport.c:run_pre_push_hook().
This adds a generic callback API to run_processes_parallel() to do
exactly that in a unified manner, similar to the existing callback APIs,
which can then be used by hooks.h to convert the remaining hooks to the
new, simpler parallel interface.
Signed-off-by: Emily Shaffer <emilyshaffer@google.com>
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Adrian Ratiu <adrian.ratiu@collabora.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
There is a recurring pattern of testing parallel process child states
and file descriptors to determine if a child is running, receiving any
input or if it's ready for cleanup.
Name the pp_child structure and introduce a first helper to make these
checks more readable. Next commits will add more helpers and checks.
Suggested-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Adrian Ratiu <adrian.ratiu@collabora.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Building a list using commit_list_insert_by_date() has quadratic worst
case complexity. Avoid it by using prio_queue.
Use prio_queue_peek()+prio_queue_replace() instead of prio_queue_get()+
prio_queue_put() if possible, as the former only rebalance the
prio_queue heap once instead of twice.
In sane repositories this won't make much of a difference because the
number of items in the list or queue won't be very high:
Benchmark 1: ./git_v2.52.0 show-branch origin/main origin/next origin/seen origin/todo
Time (mean ± σ): 538.2 ms ± 0.8 ms [User: 527.6 ms, System: 9.6 ms]
Range (min … max): 537.0 ms … 539.2 ms 10 runs
Benchmark 2: ./git show-branch origin/main origin/next origin/seen origin/todo
Time (mean ± σ): 530.6 ms ± 0.4 ms [User: 519.8 ms, System: 9.8 ms]
Range (min … max): 530.1 ms … 531.3 ms 10 runs
Summary
./git show-branch origin/main origin/next origin/seen origin/todo ran
1.01 ± 0.00 times faster than ./git_v2.52.0 show-branch origin/main origin/next origin/seen origin/todo
That number is not limited, though, and in pathological cases like the
one in p6010 we see a sizable improvement:
Test v2.52.0 HEAD
------------------------------------------------------------------
6010.4: git show-branch 2.19(2.19+0.00) 0.03(0.02+0.00) -98.6%
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The library function iconv(3) supplied with macOS versions 15.7.2
(Sequoia) and 26.1 (Tahoe) is unreliable when doing conversions from
ISO-2022-JP to UTF-8 in multiple steps; t3900 reports this breakage:
not ok 17 - ISO-2022-JP should be shown in UTF-8 now
not ok 25 - ISO-2022-JP should be shown in UTF-8 now
not ok 38 - commit --fixup into ISO-2022-JP from UTF-8
As a workaround, use libiconv from Homebrew, if available. Search it in
its default locations: /opt/homebrew for Apple Silicon and /usr/local
for macOS Intel, with the former taking precedence. Respect ICONVDIR if
already set by the user, though.
Helped-by: Koji Nakamaru <koji.nakamaru@gree.net>
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
On macOS we opportunistically use Homebrew-installed versions of
gettext(3) and msgfmt(1). Make that behavior configurable by providing
make variables to disable Homebrew usage (NO_HOMEBREW) and to allow
using a non-default installation location (HOMEBREW_PREFIX).
Include and link only the gettext keg via the symlink opt/gettext
pointing to its installed version instead of using the Homebrew prefix.
This is simpler and prevents accidentally including other libraries.
Suggested-by: Carlo Marcelo Arenas Belón <carenas@gmail.com>
Suggested-by: Torsten Bögershausen <tboegi@web.de>
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We received a report that invoking "git restore -source my_base_branch"
resulted in the confusing error message "fatal: could not resolve
ource". This looked like a typo in our error message, but it is
actually because "-source" is missing its second dash and is being
resolved as "-s ource". However, due to the lack of the quoting
recommended in CodingGuidelines, this is confusing to the reader and
we can do better.
Add the necessary quoting to this message. With this change, we now get
this less confusing message:
fatal: could not resolve 'ource'
Reported-by: Zhelyo Zhelev <zhelyo@gmail.com>
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git fetch" that involves fetching tags, when a tag being fetched
needs to overwrite existing one, failed to fetch other tags, which
has been corrected.
* kn/fix-fetch-backfill-tag-with-batched-ref-updates:
fetch: fix failed batched updates skipping operations
fetch: fix non-conflicting tags not being committed
fetch: extract out reference committing logic
"git diff-files -R --find-copies-harder" has been taught to use
the potential copy sources from the index correctly.
* rs/diff-files-r-find-copies-fix:
diff-files: fix copy detection
Further application of MEMZERO_ARRAY() macro to the rest of the
code base.
* jc/memzero-array:
cocci: use MEMZERO_ARRAY() a bit more
coccicheck: emit the contents of cocci patch
MEMZERO_ARRAY() helper is introduced to avoid clearing only the
first N bytes of an N-element array whose elements are larger than
a byte.
* tc/memzero-array:
contrib/coccinelle: pass include paths to spatch(1)
git-compat-util: introduce MEMZERO_ARRAY() macro
In-code comment update to clarify that single-letter options are
outside of the scope of command line completion script.
* jc/completion-no-single-letter-options:
completion: clarify support for short options and arguments
"git submodule add" to add a submodule under <name> segfaulted,
when a submodule.<name>.something is already in .gitmodules file
without defining where its submodule.<name>.path is, which has been
corrected.
* jc/submodule-add:
submodule add: sanity check existing .gitmodules
"git replay --onto=<commit> ...", when <commit> is mistyped,
started to segfault with recent change, which has been corrected.
* rs/replay-wrong-onto-fix:
replay: move onto NULL check before first use
"git replay" documentation updates.
* kh/doc-replay-updates:
doc: replay: link section using markup
replay: improve --contained and add to doc
doc: replay: mention no output on conflicts
Code refactoring around alternate object store.
* ps/odb-alternates-object-sources:
odb: write alternates via sources
odb: read alternates via sources
odb: drop forward declaration of `read_info_alternates()`
odb: remove mutual recursion when parsing alternates
odb: stop splitting alternate in `odb_add_to_alternates_file()`
odb: move computation of normalized objdir into `alt_odb_usable()`
odb: resolve relative alternative paths when parsing
odb: refactor parsing of alternates to be self-contained
* use imperative mood for consistency in options descriptions
* add missing parenthesis
* reword verbose phrase in git-repack.adoc
Signed-off-by: Jean-Noël Avila <jn.avila@free.fr>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* fix incorrect use of backticks for markup in
git-checkout.adoc, git-worktree.adoc
* switch tabs to spaces in git-send-email.adoc list items
Signed-off-by: Jean-Noël Avila <jn.avila@free.fr>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The GitFAQ is a proper manual page in the section 7, so refer to it
using the usual linkgit:stuff[7] syntax.
Helped-by: Kristoffer Haugsbakk
Signed-off-by: Junio C Hamano <gitster@pobox.com>
From e509b5b8be (rust: support for Windows, 2025-10-15), we check
cargo's information to decide which library to build. However, that
check mistakenly used "sed -s" ("consider files as separate rather than
as a single, continuous long stream"), which is a GNU extension. The
build thus fails on macOS with "meson -Drust=enabled", which comes with
BSD-derived sed.
Instead, use the intended "sed -n" and print the matching section of the
output. This failure mode likely went unnoticed on systems with GNU sed
(common for developer machines and CI) because, in those instances, the
output being matched by case is the full cargo output (which either
contains the string "-windows-" or doesn't).
Helped-by: Eric Sunshine <sunshine@sunshineco.com>
Helped-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: D. Ben Knoble <ben.knoble+github@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* ps/ci-rust:
rust: support for Windows
ci: verify minimum supported Rust version
ci: check for common Rust mistakes via Clippy
rust/varint: add safety comments
ci: check formatting of our Rust code
ci: deduplicate calls to `apt-get update`
t8020: fix test failure due to indeterministic tag sorting
gitlab-ci: upload Meson test logs as JUnit reports
gitlab-ci: drop workaround for Python certificate store on Windows
gitlab-ci: ignore failures to disable realtime monitoring
gitlab-ci: dedup instructions to disable realtime monitoring
ci: enable Rust for breaking-changes jobs
ci: convert "pedantic" job into full build with breaking changes
BreakingChanges: announce Rust becoming mandatory
varint: reimplement as test balloon for Rust
varint: use explicit width for integers
help: report on whether or not Rust is enabled
Makefile: introduce infrastructure to build internal Rust library
Makefile: reorder sources after includes
meson: add infrastructure to build internal Rust library
Currently, this always prints yes because required is non-null.
This is the wrong behavior. The boolean must be
dereferenced.
Signed-off-by: Greg Funni <gfunni234@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Thankfully, it is set to NULL, so no security consequences.
However, this is still a mistake that must be rectified.
Signed-off-by: Greg Funni <gfunni234@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When 58aaf59133b (t: introduce GIT_TEST_DEFAULT_REF_FORMAT envvar,
2023-12-29) copy-edited the `test_detect_hash` function, the code
comment was accidentally left unchanged. Let's adjust it.
Noticed-by: Matthew John Cheetham <mjcheetham@outlook.com>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In t5563, we test how various oddly-formatted WWW-Authenticate headers
are passed through curl to git's credential subsystem (and ultimately
out to credential helpers). One test, "access using basic auth with
wwwauth header mixed line-endings" does something odd. It does not mix
line endings at all (which must be CRLF according to the RFC anyway),
but omits the line ending entirely for the final header!
This means that the server produces an incomplete response. We send our
final header, and then the newline which is meant to mark the end of
headers (and the start of the body) becomes the line ending for that
header. And there is no header/body separator in the output at all.
Looking at strace, this is what the client reads:
recvfrom(9, "WWW-Authenticate: FooBar param1=\"value1\"\r\n \r\n\tparam2=\"value2\"\r\nWWW-Authenticate: Basic realm=\"example.com\"", 16384, 0, NULL, NULL) = 106
recvfrom(9, "\n", 16384, 0, NULL, NULL) = 1
recvfrom(9, "", 16384, 0, NULL, NULL) = 0
The headers themselves are produced from the custom-auth.challenge file
we write in the test (which is missing the final CRLF), and then the
header/body separator comes from our lib-httpd/nph-custom-auth.sh CGI.
(Ignore for a moment that it is producing a bare newline, which I think
is a bug; it should be a CRLF but curl is happy with either).
Older versions of curl seemed to be OK with the truncated output, but
the upcoming 8.18.0 release seems to get confused. Specifically, since
67ae101666 (http: unfold response headers earlier, 2025-12-12) our
request to the server fails with insufficient credentials. I traced far
enough to see that curl does relay the header back to us, which we then
pass to a credential helper, which gives us the correct
username/password combination. But on our followup request, curl refuses
to send the Authorization header (and so gets an HTTP 401 again).
The change in curl's behavior is a bit unexpected, but since we are
sending it garbage, it is hard to complain too much. Let's add the
missing CRLF to the header. I _think_ this was just an oversight and not
the intent of the test. And that the "mixed line-endings" really meant
"mixed continuations", since we differ from the previous test in
continuing with both space and tab. So I've likewise updated the test
title to match that assumption.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We check in t5551 that curl updates the expected list of cookies after
making a request. We do this by telling it to read and write cookies
from a particular text file, and then checking that after curl runs, the
file has the expected content.
However, in the upcoming curl 8.18.0, the output file has changed
slightly: curl will canonicalize the paths it writes, due to commit
a093c93994 (cookie: only keep and use the canonical cleaned up path,
2025-12-07). In particular, it strips trailing slashes from the paths we
see in the cookies.txt file.
This doesn't matter to Git, as the cookie handling is all internal to
curl. But our test is overly brittle and breaks as a result.
We can fix it by matching either format. We'll expect the new format
(without trailing slashes) and strip the slashes from curl's output
before comparing. That lets us pass with both old and new versions (I
tested against curl's 8_17_0 and rc-8_18_0-2 tags, which are
respectively before and after the curl change).
In theory it might be nice to try to future-proof this test more by
looking only for the bits we care about, rather than a byte-wise
comparison of the whole file. But after removing comments and blank
lines (which we already do), we care about most of what's there. So it's
not clear to me what a more liberal test would look like. Given that the
format doesn't change all that often, it's probably OK to stop here and
see if it ever breaks again.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When various *object_info() functions are given an extended object
info structure as NULL by a caller that does not want any details,
the code uses a file-scope static blank_oi and passes it down to
the helper functions they use, to avoid handling NULL specifically.
The ps/object-read-stream topic graduated to 'master' recently
however had a bug that assumed that two identically named file-scope
static variables in two functions are the same, which of course is
not the case. This made "git commit" take 0.38 seconds to 1508
seconds in some case, as reported by Aaron Plattner here:
https://lore.kernel.org/git/f4ba7e89-4717-4b36-921f-56537131fd69@nvidia.com/
We _could_ move the blank_oi variable to the global scope in common
section to fix this regression, but explicitly handling the NULL is
a much safer fix. It would also reduce the chance of errors that
somebody accidentally writes into blank_oi, making its contents
dirty, which potentially will make subsequent calls into the
function misbehave. By explicitly handling NULL input, we no longer
have to worry about it.
Reported-by: Aaron Plattner <aplattner@nvidia.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* ps/object-read-stream: (32 commits)
streaming: drop redundant type and size pointers
streaming: move into object database subsystem
streaming: refactor interface to be object-database-centric
streaming: move logic to read packed objects streams into backend
streaming: move logic to read loose objects streams into backend
streaming: make the `odb_read_stream` definition public
streaming: get rid of `the_repository`
streaming: rely on object sources to create object stream
packfile: introduce function to read object info from a store
streaming: move zlib stream into backends
streaming: create structure for filtered object streams
streaming: create structure for packed object streams
streaming: create structure for loose object streams
streaming: create structure for in-core object streams
streaming: allocate stream inside the backend-specific logic
streaming: explicitly pass packfile info when streaming a packed object
streaming: propagate final object type via the stream
streaming: drop the `open()` callback function
streaming: rename `git_istream` into `odb_read_stream`
object-file: refactor writing objects via a stream
...
The previous wording:
> Path expansions are made the same way as for `core.excludesFile`.
required one to check the docs for 'core.excludesFile' and from there
the definition of the pathname variable type to understand the path
expansion behaviour of this variable. Instead, just link directly to the
pathname type.
This change is basically the same rewording as was done to
'core.excludesFile' in dca83abd (config: describe 'pathname' value
type, 2016-04-29).
Signed-off-by: Matthew Hughes <matthewhughes934@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Similar to a prior commit, update the table output format for the
git-repo(1) structure command to display the total object disk usage by
object type.
Signed-off-by: Justin Tobler <jltobler@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Similar to a prior commit, extend the keyvalue and nul output formats of
the git-repo(1) structure command to additionally provide info regarding
total object disk sizes by object type.
Signed-off-by: Justin Tobler <jltobler@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Update the table output format for the git-repo(1) structure command to
begin printing the total inflated object size info by object type. To be
more human-friendly, larger values are scaled down and displayed with
the appropriate unit prefix. Output for the keyvalue and nul formats
remains unchanged.
Signed-off-by: Justin Tobler <jltobler@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The structure subcommand for git-repo(1) outputs basic count information
for objects and references. Extend this output to also provide
information regarding total size of inflated objects by object type.
For now, object size by object type info is only added to the keyvalue
and nul output formats. In a subsequent commit, this info is also added
to the table format.
Signed-off-by: Justin Tobler <jltobler@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The table output format for the git-repo(1) structure subcommand is used
by default and intended to provide output to users in a human-friendly
manner. When the reference/object count values in a repository are
large, it becomes more cumbersome for users to read the values.
For larger values, update the table output format to instead produce
more human-friendly count values that are scaled down with the
appropriate unit prefix. Output for the keyvalue and nul formats remains
unchanged.
Signed-off-by: Justin Tobler <jltobler@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In a subsequent commit, byte size values displayed in table output for
the git-repo(1) "structure" subcommand will be shown in a more
human-readable format with the appropriate unit prefixes. For this
usecase, the downscaled values and unit strings must be handled
separately to ensure proper column alignment.
Split out logic from strbuf_humanise() to downscale byte values and
determine the corresponding unit prefix into a separate humanise_bytes()
function that provides seperate value and unit strings.
Note that the "byte" string in "t/helper/test-simple-ipc.c" is unmarked
for translation here so that it doesn't conflict with the newly defined
plural "byte/bytes" translation and instead uses it.
Signed-off-by: Justin Tobler <jltobler@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The `object_stats` structure stores object counts by type. In a
subsequent commit, additional per-type object measurements will also be
stored. Group per-type object values into a new struct to allow better
reuse.
Signed-off-by: Justin Tobler <jltobler@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Git's test suite's relies on Unix shell scripting, which is
understandable, of course, given Git's firm roots (and indeed, ongoing
focus) on Linux.
This fact, combined with Unix shell scripting's natural
habitat -- which is, naturally... *drumroll*... Unix --
often has unintended side effects, where developers expect the test
suite to run in a Unix environment, which is an incorrect assumption.
One instance of this problem can be observed in the 'difftool --dir-diff
handles modified symlinks' test case in `t7800-difftool.sh`, which
assumes that all absolute paths start with a forward slash. That
assumption is incorrect in general, e.g. on Windows, where absolute
paths have many shapes and forms, none of which starts with a forward
slash.
The only saving grace is that this test case is currently not run on
Windows because of the `SYMLINK` prerequisite. However, I am currently
working towards upstreaming symbolic link support from Git for Windows
to upstream Git, which will put a crack into that saving grace.
Let's change that test case so that it does not rely on absolute paths
(which are passed to the "external command" `ls` as parameters and are
therefore part of its output, and which the test case wants to filter
out before verifying that the output is as expected) starting with a
forward slash. Let's instead rely on the much more reliable fact that
`ls` will output the path in a line that ends in a colon, and simply
filter out those lines by matching said colon instead.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The device `/dev/null` does not exist on Windows, it's called `NUL`
there. Calling `ln -s /dev/null my-symlink` in a symlink-enabled MSYS2
Bash will therefore literally link to a file or directory called `null`
that is supposed to be in the current drive's top-level `dev` directory.
Which typically does not exist.
The test, however, really wants the created symbolic link to point to
the NUL device. Let's instead use the `mklink` utility on Windows to
perform that job, and keep using `ln -s /dev/null <target>` on
non-Windows platforms.
While at it, add the missing `SYMLINKS` prereq because this test _still_
would not pass on Windows before support for symbolic links is
upstreamed from Git for Windows.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In Git for Windows, the gitdir is canonicalized so that even when the
gitdir is specified via a symbolic link, the `gitdir:` conditional
include will only match the real directory path.
Unfortunately, t1305 codifies a different behavior in two test cases,
which are hereby skipped on Windows.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The MSYS2 runtime (which inherits this trait from the Cygwin runtime,
and which is used by Git for Windows' Bash to emulate POSIX
functionality on Windows, the same Bash that is also used to run Git's
test suite on Windows) has a mode where it can create native symbolic
links on Windows.
Naturally, this is a bit of a strange feature, given that Cygwin goes
out of its way to support Unix-like paths even if no Win32 program
understands those, and the symbolic links have to use Win32 paths
instead (which Win32 programs understand very well).
As a consequence, the symbolic link targets get normalized before the
links are created.
This results in certain quirks that Git's test suite is ill equipped to
accommodate (because Git's test suite expects to be able to use
Unix-like paths even on Windows).
The test script t1006-cat-file.sh contains two prime examples, two test
cases that need to skip a couple assertions because they are simply
wrong in the context of Git for Windows.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The 'symref transaction supports symlinks' test case is guarded by the
`SYMLINK` prerequisite because `core.prefersymlinkrefs = true` requires
symbolic links to be supported.
However, the `preferSymlinkRefs` feature is not supported on Windows,
therefore this test case needs the `MINGW` prerequisite, too.
There's a couple more cases where we set this config key:
- In a subsequent test in t0600, but there we explicitly set it to
"false". So this would naturally be supported by Windows.
- In t7201 we set the value to `yes`, but we never verify that the
written reference is a symbolic link in the first place. I guess
that we could rather remove setting the configuration value here, as
we are about to deprecate support for symrefs via symbolic links in
the first place. But that's certainly outside of the scope of this
patch.
- In t9903 we do the same, but likewise, we don't check whether the
written file is a symbolic link.
Therefore this seems to be the only instance where the tests actually
need to be adapted.
Helped-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Just like 0fdcfa2f9f5 (t0301: fixes for windows compatibility,
2021-09-14) explained, we should not call `mkdir -m<mode>` in the test
suite because that would fail on Windows.
There was one forgotten instance of this which was hidden by a `SYMLINK`
prerequisite. Currently, this prevents this test case from being
executed on Windows, but with the upcoming support for symbolic links,
it would become a problem.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The test case 're-init to move gitdir symlink' wants to compare the
contents of `newdir/.git`, which is a symbolic link pointing to a file.
However, `git diff --no-index`, which is used by `test_cmp` on Windows,
does not resolve symlinks; It shows the symlink _target_ instead (with a
file mode of 120000). That is totally unexpected by the test case, which
as a consequence fails, meaning that it's a bug in the test case itself.
Co-authored-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The `_wopen()` function would gladly follow a symbolic link to a
non-existent file and create it when given above-mentioned flags.
Git expects the `open()` call to fail, though. So let's add yet another
work-around to pretend that Windows behaves according to POSIX, see:
https://pubs.opengroup.org/onlinepubs/007904875/functions/open.html#:~:text=If%20O_CREAT%20and%20O_EXCL%20are,set%2C%20the%20result%20is%20undefined.
This is required to let t4115.8(--reject removes .rej symlink if it
exists) pass on Windows when enabling the MSYS2 runtime's symbolic link
support.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When 0482c32c334b (apply: ignore working tree filemode when
!core.filemode, 2023-12-26) fixed `git apply` to stop warning about
executable files, it inadvertently changed the code flow also for
symbolic links and directories.
Let's narrow the scope of the special `!trust_executable_git` code path
to apply only to regular files.
This is needed to let t4115.5(symlink escape when creating new files)
pass on Windows when symbolic link support is enabled in the MSYS2
runtime.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Ever since fe53bbc9beb (Git.pm: Always set Repository to absolute path
if autodetecting, 2009-05-07), the t9700 test _must_ fail on Windows
because of that age-old Unix paths vs Windows paths problem.
The underlying root cause is that Git cannot run with a regular Win32
variant of Perl, the assumption that every path is a Unix path is just
too strong in Git's Perl code.
As a consequence, Git for Windows is basically stuck with using the
MSYS2 variant of Perl which uses a POSIX emulation layer (which is a
friendly fork of Cygwin) _and_ a best-effort Unix <-> Windows paths
conversion whenever crossing the boundary between MSYS2 and regular
Win32 processes. It is best effort only, though, using heuristics to
automagically convert correctly in most cases, but not in all cases.
In the context of this here patch, this means that asking `git.exe` for
the absolute path of the `.git/` directory will return a Win32 path
because `git.exe` is a regular Win32 executable that has no idea about
Unix-ish paths. But above-mentioned commit introduced a test that wants
to verify that this path is identical to the one that the Git Perl
module reports (which refuses to use Win32 paths and uses Unix-ish paths
instead). Obviously, this must fail because no heuristics can kick in at
that layer.
This test failure has not even been caught when Git introduced Windows
support in its CI definition in 2e90484eb4a (ci: add a Windows job to
the Azure Pipelines definition, 2019-01-29), as all tests relying on
Perl had to be disabled even from the start (because the CI runs would
otherwise have resulted in prohibitively long runtimes, not because
Windows is super slow per se, but because Git's test suite keeps
insisting on using technology that requires a POSIX emulation layer,
which _is_ super slow on Windows).
To work around this failure, let's use the `cygpath` utility to convert
the absolute `gitdir` path into the form that the Perl code expects.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Every now and then we see this coming up on the list. Let's help
new contributors who are not aware of past discussions by clearly
documenting our past consensus.
Helped-by: brian m. carlson <sandals@crustytoothpaste.net>
Helped-by: Elijah Newren <newren@gmail.com>
Helped-by: Johannes Sixt <j6t@kdbg.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Rewrite the only use of "mktemp()" that is subject to TOCTOU race
and Stop using the insecure "mktemp()" function.
* rs/ban-mktemp:
compat: remove gitmkdtemp()
banned.h: ban mktemp(3)
compat: remove mingw_mktemp()
compat: use git_mkdtemp()
wrapper: add git_mkdtemp()
The "git_istream" abstraction has been revamped to make it easier
to interface with pluggable object database design.
* ps/object-read-stream:
streaming: drop redundant type and size pointers
streaming: move into object database subsystem
streaming: refactor interface to be object-database-centric
streaming: move logic to read packed objects streams into backend
streaming: move logic to read loose objects streams into backend
streaming: make the `odb_read_stream` definition public
streaming: get rid of `the_repository`
streaming: rely on object sources to create object stream
packfile: introduce function to read object info from a store
streaming: move zlib stream into backends
streaming: create structure for filtered object streams
streaming: create structure for packed object streams
streaming: create structure for loose object streams
streaming: create structure for in-core object streams
streaming: allocate stream inside the backend-specific logic
streaming: explicitly pass packfile info when streaming a packed object
streaming: propagate final object type via the stream
streaming: drop the `open()` callback function
streaming: rename `git_istream` into `odb_read_stream`
Copy detection cannot work when comparing the index to the working tree
because Git ignores files that it is not explicitly told to track. It
should work in the other direction, though, i.e. for a reverse diff of
the deletion of a copy from the index.
d1f2d7e8ca (Make run_diff_index() use unpack_trees(), not read_tree(),
2008-01-19) broke it with a seemingly stray change to run_diff_files().
We didn't notice because there's no test for that. But even if we had
one, it might have gone unnoticed because the breakage only happens
with index preloading, which requires at least 1000 entries (more than
most test repos have) and is racy because it runs in parallel with the
actual command.
Fix copy detection by queuing up-to-date and skip-worktree entries using
diff_same().
While at it, use diff_same() also for queuing unchanged files not
flagged as up-to-date, i.e. clean submodules and entries where
preloading was not done at all or not quickly enough. It uses less
memory than diff_change() and doesn't unnecessarily set the diff flag
has_changes.
Add two tests to cover running both without and with preloading. The
first one passes reliably with the original code. The second one
enables preloading and thus is racy. It has a good chance to pass even
without the fix, but fails within seconds when running the test script
with --stress. With the fix it runs fine for several minutes, until
my patience runs out.
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When using the --filter option for git-rev-list(1), objects that are
explicitly provided ignore filters and are always printed unless the
--filter-provided-objects option is also specified. Clarify this
behavior in the documentation.
Signed-off-by: Justin Tobler <jltobler@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add user-facing documentation that justifies the values being set by
'scalar clone', 'scalar register', and 'scalar reconfigure'.
Helped-by: Junio C Hamano <gitster@pobox.com>
Helped-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git repo struct" learned to take "-z" as a synonym to "--format=nul".
* lo/repo-struct-z:
repo: add -z as an alias for --format=nul to git-repo-structure
repo: use [--format=... | -z] instead of [-z] in git-repo-info synopsis
repo: remove blank line from Documentation/git-repo.adoc
A help message from "git branch" now mentions "git help" instead of
"man" when suggesting to read some documentation.
* kh/advise-w-git-help-in-branch:
branch: advice using git-help(1) instead of man(1)
Build fix.
* tc/meson-cross-compile-fix:
meson: use is_cross_build() where possible
meson: only detect ICONV_OMITS_BOM if possible
meson: ignore subprojects/.wraplock
"git last-modified" used to mishandle "--" to mark the beginning of
pathspec, which has been corrected.
* js/last-modified-with-sparse-checkouts:
last-modified: support sparse checkouts
Halve the memory consumed by artificial filepairs created during
"git diff --find-copioes-harder", also making the operation run
faster.
* rs/diff-index-find-copies-harder-optim:
diff-index: don't queue unchanged filepairs with diff_change()
Recent optimization to "last-modified" command introduced use of
uninitialized block of memory, which has been corrected.
* tc/last-modified-active-paths-optimization:
last-modified: fix use of uninitialized memory
There is no documentation for `--contained`.
Start by copying the text from `replay_options` in `builtin/
replay.c`. But some people think that the existing text is a
bit unclear; what does it mean for a branch to be contained
in a revision range? Let’s include the implied commits here:
the branches that point at commits in the range.
Also use “update” instead of “advance”. “Update” is the verb
commonly used in this context.
Helped-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Helped-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Kristoffer Haugsbakk <code@khaugsbakk.name>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Some commands will produce output on stderr if there are conflicts, but
git-replay(1) is completely silent. Explicitly spell that out.
Signed-off-by: Kristoffer Haugsbakk <code@khaugsbakk.name>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
git --version reports its version with the prefix "git version ".
Remove precisely this string instead of everything up to and including
the rightmost space to avoid butchering version strings that contain
spaces. This helps Apple's release of Git, which reports its version
like this: "git version 2.50.1 (Apple Git-155)".
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Existing code in files that have been fairly stable trigger the
"make coccicheck" suggestions due to the new check.
Rewrite them to use MEMZERO_ARRAY()
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Telling the user "you got some error messages" without showing what
the errors are is almost useless in CI environment, as the errors
cannot be examined without downloading build artifacts.
Arrange it to spew out the output when it fails.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* tc/memzero-array:
contrib/coccinelle: pass include paths to spatch(1)
git-compat-util: introduce MEMZERO_ARRAY() macro
last-modified: fix use of uninitialized memory
The config values set by Scalar went through an audit in the previous
changes, so now reorganize the settings and simplify their purpose.
First, alphabetize the config options, except put the platform-specific
options at the end. This groups two Windows-specific settings and only
one non-Windows setting.
Also, this removes the 'overwrite_on_reconfigure' setting for many of
these options. That setting made nearly all of these options "required"
for scalar enlistments, restricting use for users. Instead, now nearly
all options have removed this setting.
However, there is one setting that still has this, which is
index.skipHash, which was previously being set to _false_ when we
actually prefer the value of true. Keep the overwrite here to help
Scalar users upgrade to the new version. We may remove that overwrite in
the future once we belive that most of the users who have the false
value have upgraded to a version that overwrites that to 'true'.
Signed-off-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
These config values were added in the original Scalar contribution,
d0feac4e8c (scalar: 'register' sets recommended config and starts
maintenance, 2021-12-03), but were never fully checked for validity in
the upstream Git project. At the time, Scalar was only intended for the
contrib/ directory so did not have as rigorous of an investigation.
Each config option has its own justification for removal:
* core.preloadIndex: This value is true by default, now. Removing this
causes some changes required to the tests that checked this config
value. Use gui.gcwarning=false instead.
* core.fscache: This config does not exist in the core Git project, but
is instead a config option for a Git for Windows feature.
* core.multiPackIndex: This config value is now enabled by default, so
does not need to be called out specifically. It was originally
included to make sure the background maintenance that created
multi-pack-indexes would result in the expected performance
improvements.
* credential.validate: This option is not something specific to Git but
instead an older version of Git Credential Manager for Windows. That
software was replaced several years ago by the cross-platform Git
Credential Manger so this option is no longer needed to help users who
were on that older software.
* pack.useSparse=true: This value is now Git's default as of de3a864114
(config: set pack.useSparse=true by default, 2020-03-20) so we don't
need it set by Scalar.
Signed-off-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The index.skipHash config option has been set to 'false' by Scalar since
4933152cbb (scalar: enable path-walk during push via config, 2025-05-16)
but that commit message is trying to communicate the exact opposite:
that the 'true' value is what we want instead. This means that we've
been disabling this performance benefit for Scalar repos
unintentionally.
Fix this issue before we add justification for the config options set in
this list.
Oddly, enabling index.skipHash causes a test issue during 'test_commit'
in one of the Scalar tests when GIT_TEST_SPLIT_INDEX is enabled (as
caught by the linux-test-vars build). I'm fixing the test by disabling
the environment variable, but the issue should be resolved in a series
focused on the split index.
Signed-off-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
A repo may have config options set by 'scalar clone' or 'scalar
register' and then updated by 'scalar reconfigure'. It can be helpful to
point out which of those options were set by the latest scalar
recommendations.
Add "# set by scalar" to the end of each config option to assist users
in identifying why these config options were set in their repo. Use a new
helper method to simplify the two callsites.
Co-authored-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Unless there are good reasons, it is customary to have the options[]
array used with the parse-options API declared in function scope rather
than at file scope.
Move builtin/pull.c:cmd_pull()’s options[] array into the function to
match that convention.
Signed-off-by: K Jayatheerth <jayatheerthkulkarni2005@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Before C99 syntax to express that the final member in a struct is an
array of unknown number of elements, i.e.,
struct {
...
T flexible_array[];
};
came along, GNU introduced their own extension to declare such a
member with 0 size, i.e.,
T flexible_array[0];
and the compilers that did not understand even that were given a way
to emulate it by wasting one element, i.e.,
T flexible_array[1];
As we are using more and more C99 language features, let's see if
the platforms that still need to resort to the historical forms of
flexible array member support are still there, by forcing all the
flex array definitions to use the C99 syntax and see if anybody
screams (in which case reverting the changes is rather easy).
Signed-off-by: Junio C Hamano <gitster@pobox.com>
cmd_replay() aborts if the pointer "onto" is NULL after argument
parsing, e.g. when specifying a non-existing commit with --onto.
15cd4ef1f4 (replay: make atomic ref updates the default behavior,
2025-11-06) added code that dereferences this pointer before the check.
Switch their places to avoid a segmentation fault.
Reported-by: Kristoffer Haugsbakk <kristofferhaugsbakk@fastmail.com>
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Since Aug 2006, the DarwinPorts project renamed themselves as
MacPorts. Those who are not intimately familiar with the Opensource
ecosystem around macOS from olden days, the name DarwinPorts may not
ring a bell, even when they are using MacPorts.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Reviewed-by: Carlo Marcelo Arenas Belón <carenas@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Refactor writing of alternates so that the actual business logic is
structured around the object database source we want to write the
alternate to. Same as with the preceding commit, this will eventually
allow us to have different logic for writing alternates depending on the
backend used.
Note that after the refactoring we start to call
`odb_add_alternate_recursively()` unconditionally. This is fine though
as we know to skip adding sources that are tracked already.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Adapt how we read alternates so that the interface is structured around
the object database source we're reading from. This will eventually
allow us to abstract away this behaviour with pluggable object databases
so that every format can have its own mechanism for listing alternates.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Now that we have removed the mutual recursion in the preceding commit
it is not necessary anymore to have a forward declaration of the
`read_info_alternates()` function. Move the function and its
dependencies further up so that we can remove it.
Note that this commit also removes the function documentation of
`read_info_alternates()`. It's unclear what it's documenting, but it for
sure isn't documenting the modern behaviour of the function anymore.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When adding an alternative object database source we not only have to
consider the added source itself, but we also have to add _its_ sources
to our database. We implement this via mutual recursion:
1. We first call `link_alt_odb_entries()`.
2. `link_alt_odb_entries()` calls `parse_alternates()`.
3. We then add each alternate via `odb_add_alternate_recursively()`.
4. `odb_add_alternate_recursively()` calls `link_alt_odb_entries()`
again.
This flow is somewhat hard to follow, but more importantly it means that
parsing of alternates is somewhat tied to the recursive behaviour.
Refactor the function to remove the mutual recursion between adding
sources and parsing alternates. The parsing step thus becomes completely
oblivious to the fact that there is recursive behaviour going on at all.
The recursion is handled by `odb_add_alternate_recursively()` instead,
which now recurses with itself.
This refactoring allows us to move parsing of alternates into object
database sources in a subsequent step.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When calling `odb_add_to_alternates_file()` we know to add the newly
added source to the object database in case we have already loaded
alternates. This is done so that we can make its objects accessible
immediately without having to fully reload all alternates.
The way we do this though is to call `link_alt_odb_entries()`, which
adds _multiple_ sources to the object database source in case we have
newline-separated entries. This behaviour is not documented in the
function documentation of `odb_add_to_alternates_file()`, and all
callers only ever pass a single directory to it. It's thus entirely
surprising and a conceptual mismatch.
Fix this issue by directly calling `odb_add_alternate_recursively()`
instead.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The function `alt_odb_usable()` receives as input the object database,
the path it's supposed to determine usability for as well as the
normalized path of the main object directory of the repository. The last
part is derived by the function's caller from the object database. As we
already pass the object database to `alt_odb_usable()` it is redundant
information.
Drop the extra parameter and compute the normalized object directory in
the function itself.
While at it, rename the function to `odb_is_source_usable()` to align it
with modern terminology.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Parsing alternates and resolving potential relative paths is currently
handled in two separate steps. This has the effect that the logic to
retrieve alternates is not entirely self-contained. We want it to be
just that though so that we can eventually move the logic to list
alternates into the `struct odb_source`.
Move the logic to resolve relative alternative paths into
`parse_alternates()`. Besides bringing us a step closer towards the
above goal, it also neatly separates concerns of generating the list of
alternatives and linking them into the object database.
Note that we ignore any errors when the relative path cannot be
resolved. This isn't really a change in behaviour though: if the path
cannot be resolved to a directory then `alt_odb_usable()` still knows to
bail out.
While at it, rename the function to `odb_add_alternate_recursively()` to
more clearly indicate what its intent is and to align it with modern
terminology.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Parsing of the alternates file and environment variable is currently
split up across multiple different functions and is entangled with
`link_alt_odb_entries()`, which is responsible for linking the parsed
object database sources. This results in two downsides:
- We have mutual recursion between parsing alternates and linking them
into the object database. This is because we also parse alternates
that the newly added sources may have.
- We mix up the actual logic to parse the data and to link them into
place.
Refactor the logic so that parsing of the alternates file is entirely
self-contained. Note that this doesn't yet fix the above two issues, but
it is a necessary step to get there.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In the previous commit a new coccinelle rule is added. But neiter
`make coccicheck` nor `meson compile coccicheck` did detect a case in
builtin/last-modified.c.
This case involves the field `scratch` in `struct last_modified`. This
field is of type `struct bitmap` and that struct has a member
`eword_t *words`. Both are defined in `ewah/ewok.h`. Now, while
builtin/last-modified.c does include that header (with the subdir in the
#include directive), it seems coccinelle does not process it. So it's
unaware of the type of `words` in the bitmap, and it doesn't recognize
the rule from previous commit that uses:
type T;
T *ptr;
Fix coccicheck by passing all possible include paths inside the Git
project so spatch(1) can find the headers and can determine the types.
Signed-off-by: Toon Claes <toon@iotcl.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Introduce a new macro MEMZERO_ARRAY() that zeroes the memory allocated
by ALLOC_ARRAY() and friends. And add coccinelle rule to enforce the use
of this macro.
Signed-off-by: Toon Claes <toon@iotcl.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In `write_midx_internal()` we know to skip rewriting the multi-pack
index in case the existing one already covers all packs. This logic does
not know to handle `git multi-pack-index write --stdin-packs` though, so
we end up always rewriting the MIDX in this case even if the MIDX would
not change.
With our default maintenance strategy this isn't really much of a
problem, as git-gc(1) does not use the "--stdin-packs" option. But that
is changing with geometric repacking, where "--stdin-packs" is used to
explicitly select the packfiles part of the geometric sequence.
This issue can be demonstrated trivially with a benchmark in the Git
repository: executing `git repack --geometric=2 --write-midx -d` in the
Git repository takes more than 3 seconds only to end up with the same
multi-pack index as we already had before.
The logic that decides if we need to rewrite the MIDX only checks
whether the number of packfiles covered will change. That check is of
course too lenient for "--stdin-packs", as it could happen that we want
to cover a different-but-same-size set of packfiles. But there is no
inherent reason why we cannot handle "--stdin-packs".
Improve the logic to not only check for the number of packs, but to also
verify that we are asked to generate a MIDX for the _same_ packs. This
allows us to also skip no-op rewrites for "--stdin-packs".
Helped-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In `write_midx_internal()` we know to skip writing the new multi-pack
index in case it would be the same as the existing one. This logic does
not handle the `--stdin-packs` option yet though, so we end up always
rewriting the MIDX if that option is passed to us.
Extract the logic to decide whether or not to rewrite the MIDX into a
separate function. This will allow us to extend that feature in the next
commit to address the above issue.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The function `midx_preferred_pack()` returns the preferred pack for a
given multi-pack index. To compute the preferred pack we:
1. Take the first position indexed by the MIDX in pseudo-pack order.
2. Convert this pseudo-pack position into the MIDX position.
3. We then look up the pack that corresponds to this MIDX position.
This reliably returns the preferred pack given that all of its contained
objects will be up front in pseudo-pack order.
The second step that turns the pseudo-pack order into MIDX order
requires the reverse index though, which may not exist for example when
the MIDX does not have a bitmap. And in that case one may easily hit a
bug:
BUG: ../pack-revindex.c:491: pack_pos_to_midx: reverse index not yet loaded
In theory, `midx_preferred_pack()` already knows to handle the case
where no reverse index exists, as it calls `load_midx_revindex()` before
calling into `midx_preferred_pack()`. But we only check for negative
return values there, even though the function returns a positive error
code in case the reverse index does not exist.
Fix the issue by testing for a non-zero return value instead, same as
all the other callers of this function already do. While at it, document
the return value of `load_midx_revindex()`.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Fix a regression introduced with batched updates in 0e358de64a (fetch:
use batched reference updates, 2025-05-19) when fetching references. In
the `do_fetch()` function, we jump to cleanup if committing the
transaction fails, regardless of whether using batched or atomic
updates. This skips three subsequent operations:
- Update 'FETCH_HEAD' as part of `commit_fetch_head()`.
- Add upstream tracking information via `set_upstream()`.
- Setting remote 'HEAD' values when `do_set_head` is true.
For atomic updates, this is expected behavior. For batched updates,
we want to continue with these operations even if some refs fail to
update.
Skipping `commit_fetch_head()` isn't actually a regression because
'FETCH_HEAD' is already updated via `append_fetch_head()` when not
using '--atomic'. However, we add a test to validate this behavior.
Skipping the other two operations (upstream tracking and remote HEAD)
is a regression. Fix this by only jumping to cleanup when using
'--atomic', allowing batched updates to continue with post-fetch
operations. Add tests to prevent future regressions.
Helped-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The commit 0e358de64a (fetch: use batched reference updates, 2025-05-19)
updated the 'git-fetch(1)' command to use batched updates. This batches
updates to gain performance improvements. When fetching references, each
update is added to the transaction. Finally, when committing, individual
updates are allowed to fail with reason, while the transaction itself
succeeds.
One scenario which was missed here, was fetching tags. When fetching
conflicting tags, the `fetch_and_consume_refs()` function returns '1',
which skipped committing the transaction and directly jumped to the
cleanup section. This mean that no updates were applied. This also
extends to backfilling tags which is done when fetching specific
refspecs which contains tags in their history.
Fix this by committing the transaction when we have an error code and
not using an atomic transaction. This ensures other references are
applied even when some updates fail.
The cleanup section is reached with `retcode` set in several scenarios:
- `truncate_fetch_head()`, `open_fetch_head()` and `prune_refs()` set
`retcode` before the transaction is created, so no commit is
attempted.
- `fetch_and_consume_refs()` and `backfill_tags()` are the primary
cases this fix targets, both setting a positive `retcode` to
trigger the committing of the transaction.
This simplifies error handling and ensures future modifications to
`do_fetch()` don't need special handling for batched updates.
Add tests to check for this regression. While here, add a missing
cleanup from previous test.
Reported-by: David Bohman <debohman@gmail.com>
Helped-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When is_promisor_object() is called for the first time, it lazily
initializes a set of all promisor objects by iterating through all
objects in promisor packs. For each object, add_promisor_object() calls
parse_object(), which decompresses and hashes the entire object.
For repositories with large pack files, this can take an extremely long
time. For example, on a production repository with a 176 GB promisor
pack:
$ time ~/git/git/git-rev-list --objects --all --exclude-promisor-objects --quiet
________________________________________________________
Executed in 76.10 mins fish external
usr time 72.10 mins 1.83 millis 72.10 mins
sys time 3.56 mins 0.17 millis 3.56 mins
add_promisor_object() just wants to construct the set of all promisor
objects, so it doesn't really need to verify the hash of every object.
Set PARSE_OBJECT_SKIP_HASH_CHECK to skip the hash check. This has the
side effect of skipping decompression of blob objects completely, saving
a significant amount of time:
$ time ~/git/git/git-rev-list --objects --all --exclude-promisor-objects --quiet
________________________________________________________
Executed in 124.70 secs fish external
usr time 46.94 secs 0.00 millis 46.94 secs
sys time 43.11 secs 1.03 millis 43.11 secs
Signed-off-by: Aaron Plattner <aplattner@nvidia.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
parse_object_with_flags() has an optimization to skip parsing blobs if
PARSE_OBJECT_SKIP_HASH_CHECK is set and the object hasn't been seen
before or might be a blob but hasn't been parsed yet. The latter can
happen, for example, if add_tree_entries() walks a path that references
a blob object that hasn't been seen before: lookup_blob() marks the
referenced oid as being a blob, but does not provide any additional
information about it until it is parsed.
It's possible for an object to be created without even a type, such as
when prepare_revision_walk() uses mark_uninteresting() to mark all
promisor objects as uninteresting. These objects have obj->parsed ==
false and obj->type == OBJ_NONE.
The skip_hash optimization does not consider this kind of object, so
parse_object_with_flags() proceeds to fully parse the object to
determine its type.
Improve the optimization by applying it to OBJ_NONE objects as well as
OBJ_BLOB ones. Apply a similar fix for trees.
Fixes: 8db2dad7a045 ("parse_object(): check on-disk type of suspected blob")
Signed-off-by: Aaron Plattner <aplattner@nvidia.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The use of "revision" (a connected set of commits) has been
clarified in the "git replay" documentation.
* en/replay-doc-revision-range:
Documentation/git-replay.adoc: fix errors around revision range
A few tests have been updated to work under the shell compatible
mode of zsh.
* bc/zsh-testsuite:
t5564: fix test hang under zsh's sh mode
t0614: use numerical comparison with test_line_count
"git replay" forgot to omit the "gpgsig-sha256" extended header
from the resulting commit the same way it omits "gpgsig", which has
been corrected.
* pw/replay-exclude-gpgsig-fix:
replay: do not copy "gpgsign-sha256" header
While investigating the config options set by 'scalar' I noticed this
one wasn't documented.
Signed-off-by: Matthew Hughes <matthewhughes934@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When pushing to a set of remotes using a nickname for the group, the
client initializes the connection to each remote, talks to the
remote and reads and parses capabilities line, and holds the
capabilities in a file-scope static variable server_capabilities_v1.
There are a few other such file-scope static variables, and these
connections cannot be parallelized until they are refactored to a
structure that keeps track of active connections.
Which is *not* the theme of this patch ;-)
For a single connection, the server_capabilities_v1 variable is
initialized to NULL (at the program initialization), populated when
we talk to the other side, used to look up capabilities of the other
side possibly multiple times, and the memory is held by the variable
until program exit, without leaking. When talking to multiple remotes,
however, the server capabilities from the second connection overwrites
without freeing the one from the first connection, which leaks.
==1080970==ERROR: LeakSanitizer: detected memory leaks
Direct leak of 421 byte(s) in 2 object(s) allocated from:
#0 0x5615305f849e in strdup (/home/gitster/g/git-jch/bin/bin/git+0x2b349e) (BuildId: 54d149994c9e85374831958f694bd0aa3b8b1e26)
#1 0x561530e76cc4 in xstrdup /home/gitster/w/build/wrapper.c:43:14
#2 0x5615309cd7fa in process_capabilities /home/gitster/w/build/connect.c:243:27
#3 0x5615309cd502 in get_remote_heads /home/gitster/w/build/connect.c:366:4
#4 0x561530e2cb0b in handshake /home/gitster/w/build/transport.c:372:3
#5 0x561530e29ed7 in get_refs_via_connect /home/gitster/w/build/transport.c:398:9
#6 0x561530e26464 in transport_push /home/gitster/w/build/transport.c:1421:16
#7 0x561530800bec in push_with_options /home/gitster/w/build/builtin/push.c:387:8
#8 0x5615307ffb99 in do_push /home/gitster/w/build/builtin/push.c:442:7
#9 0x5615307fe926 in cmd_push /home/gitster/w/build/builtin/push.c:664:7
#10 0x56153065673f in run_builtin /home/gitster/w/build/git.c:506:11
#11 0x56153065342f in handle_builtin /home/gitster/w/build/git.c:779:9
#12 0x561530655b89 in run_argv /home/gitster/w/build/git.c:862:4
#13 0x561530652cba in cmd_main /home/gitster/w/build/git.c:984:19
#14 0x5615308dda0a in main /home/gitster/w/build/common-main.c:9:11
#15 0x7f051651bca7 in __libc_start_call_main csu/../sysdeps/nptl/libc_start_call_main.h:58:16
SUMMARY: AddressSanitizer: 421 byte(s) leaked in 2 allocation(s).
Free the capablities data for the previous server before overwriting
it with the next server to plug this leak.
The added test fails without the freeing with SANITIZE=leak; I
somehow couldn't get it fail reliably with SANITIZE=leak,address
though.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Join two paragraphs that start with the standard “The default <hook>,
when enabled” into one and put it at the end of the “pre-commit”
section.
The trailing whitespace paragraph was added in the first commit for the
doc, in 6d35cc76 (Document hooks., 2005-09-02). Then 3e14dd2c (mention
use of "hooks.allownonascii" in "man githooks", 2019-02-20) updated the
“pre-commit” section to mention the non-ASCII check that was added in
d00e364d.[1] But this paragraph was added one-past the original
“default” paragraph, after the env. variable paragraph, and starts
exactly the same. That causes the flow of this section to feel
off (paragraphs in order):
1. Invoked by <cmd> and what parameters it takes
2. The default 'pre-commit' hook catches introduction of trailing
whitespace
3. `GIT_EDITOR=:`
4. The default pre-commit' hook catches introduction of non-ASCII
filenames
Let’s instead join these two paragrahs and explain the whole behavior of
the default script.
† 1: Extend sample pre-commit hook to check for non ascii filenames,
2009-05-19
Signed-off-by: Kristoffer Haugsbakk <code@khaugsbakk.name>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The list of supported completions in the header of the file was
mostly written a long time ago when Shawn added the initial version
of this script in 2006. The list explicitly states that we complete
"common --long-options", which implies that we do not complete
not-so-common ones and single letter options (this text dates back
to May 2007).
Update the description to explicitly state that single-letter
options are not completed. Also, document that arguments to options
are completed, even for single-letter options (e.g., "git -c <TAB>"
offers configuration variables).
The reason why we do not complete single-letter options is because
it does not seem to help all that much to learn that the command
takes -c, -d, -e options when "git foo -<TAB>" offers these three,
unlike long options that is easier to guess what they are about.
Because this rationale is primarily for our developers, let's leave
it out of the completion script itself, whose messages are entirely
for end-users. Our developers can run "git blame" to find this
commit as needed.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
gitmkdtemp() has become a trivial wrapper around git_mkdtemp(). Remove
this now unnecessary layer of indirection.
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Older versions of mktemp(3) generate easily guessable file names. The
function checks if the generated name is used, which is unreliable, as
a file with that name might then be created by some other process before
we can do it ourselves. The function was dropped from POSIX due to its
security problems. Forbid its use.
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Remove the mktemp(3) compatibility function now that its last caller was
removed by the previous commit.
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
A file might appear at the path returned by mktemp(3) before we call
mkdir(2). Use the more robust git_mkdtemp() instead, which retries a
number of times and doesn't need to call lstat(2).
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Extend git_mkstemps_mode() to optionally call mkdir(2) instead of
open(2), then use that ability to create a mkdtemp(3) replacement,
git_mkdtemp(). We'll start using it in the next commit.
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The error message given by "git config set", when the variable
being updated has more than one values defined, used old style "git
config" syntax with an incorrect option in its hint, both of which
have been corrected.
* rs/config-set-multi-error-message-fix:
config: fix suggestion for failed set of multi-valued option
The option help text given by "git config unset -h" described
the "--all" option to "replace", not "unset", multiple variables,
which has been corrected.
* rs/config-unset-opthelp-fix:
config: fix short help of unset flags
Code refactoring around object database sources.
* ps/object-source-management:
odb: handle recreation of quarantine directories
odb: handle changing a repository's commondir
chdir-notify: add function to unregister listeners
odb: handle initialization of sources in `odb_new()`
http-push: stop setting up `the_repository` for each reference
t/helper: stop setting up `the_repository` repeatedly
builtin/index-pack: fix deferred fsck outside repos
oidset: introduce `oidset_equal()`
odb: move logic to disable ref updates into repo
odb: refactor `odb_clear()` to `odb_free()`
odb: adopt logic to close object databases
setup: convert `set_git_dir()` to have file scope
path: move `enter_repo()` into "setup.c"
Dockerised jobs at the GitHub Actions CI have been taught to show
more details of failed tests.
* js/ci-show-breakage-in-dockerized-jobs:
ci(dockerized): do show the result of failing tests again
The "--committer-date-is-author-date" option of "git am/rebase" is
a misguided one. The documentation is updated to discourage its
use.
* kh/doc-committer-date-is-author-date:
doc: warn against --committer-date-is-author-date
"git config get --path" segfaulted on an ":(optional)path" that
does not exist, which has been corrected.
* jc/optional-path:
config: really treat missing optional path as not configured
config: really pretend missing :(optional) value is not there
config: mark otherwise unused function as file-scope static
Code clean-up.
* en/xdiff-cleanup-2:
xdiff: rename rindex -> reference_index
xdiff: change rindex from long to size_t in xdfile_t
xdiff: make xdfile_t.nreff a size_t instead of long
xdiff: make xdfile_t.nrec a size_t instead of long
xdiff: split xrecord_t.ha into line_hash and minimal_perfect_hash
xdiff: use unambiguous types in xdl_hash_record()
xdiff: use size_t for xrecord_t.size
xdiff: make xrecord_t.ptr a uint8_t instead of char
xdiff: use ptrdiff_t for dstart/dend
doc: define unambiguous type mappings across C and Rust
Other Git commands that have nul-terminated output, such as git-config,
git-status, git-ls-files, and git-repo-info have a flag `-z` for using
the null character as the record separator.
Add the `-z` flag to git-repo-structure as an alias for `--format=nul`,
making it consistent with the behavior of the other commands.
Signed-off-by: Lucas Seiki Oshiro <lucasseikioshiro@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The flag -z is only an alias for --format=null and even though --format
and -z can be used together and repeated, only the last one is
considered.
Replace `[-z]` in the synopsis of git-repo-info by
`[--format=... | -z]`, expliciting that the use of one of those flags
replace the other.
Signed-off-by: Lucas Seiki Oshiro <lucasseikioshiro@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
There was an extra blank line in git-repo-structure documentation, which
led to an unwawnted '+' character after generating an HTML or PDF from
that page. This can be seen, for example, in Git 2.52.0 online docs [1].
Remove that extra line.
[1] https://git-scm.com/docs/git-repo/2.52.0
Signed-off-by: Lucas Seiki Oshiro <lucasseikioshiro@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In previous commit the first use of meson.can_run_host_binaries() was
introduced. This is a guard around compiler.run() to ensure it's
actually possible to execute the provided.
In other places we've been having the same issue, but here `not
meson.is_cross_build()` is used as guard. This does the trick, but it
also prevents the code from running even when an exe_wrapper is
configured.
Switch to using meson.can_run_host_binaries() here as well.
There is another place left that still uses `not
meson.is_cross_build()`, but here it's a guard around fs.exists(). That
function will always run on the build machine, so checking for
cross-compilation is still in place here.
Signed-off-by: Toon Claes <toon@iotcl.com>
Acked-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In our Meson setup it automatically detects whether ICONV_OMITS_BOM
should be defined. To check this, a piece of code is compiled and ran.
When cross-compiling, it's not possible to run this piece of code. Guard
this test with a can_run_host_binaries() check to ensure it can run.
Signed-off-by: Toon Claes <toon@iotcl.com>
Acked-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When asking Meson to wrap subprojects, it generates a .wraplock file in
the subprojects/ directory. Ignore this file.
See also https://github.com/mesonbuild/meson/issues/14948.
Signed-off-by: Toon Claes <toon@iotcl.com>
Acked-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In a sparse checkout, a user might want to run `last-modified` on a
directory outside the worktree.
And even in non-sparse checkouts, a user might need to run that command
on a directory that does not exist in the worktree.
These use cases should be supported via the `--` separator between
revision and file arguments, which is even advertised in the
documentation. This patch fixes a tiny bug that prevents that from
working.
This fixes https://github.com/git-for-windows/git/issues/5978
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Acked-by: Derrick Stolee <stolee@gmail.com>
Acked-by: Toon Claes <toon@iotcl.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
An earlier commit e9d221b0 (doc: git-pull: clarify how to exit a
conflicted merge, 2025-10-15) misspelt `git rebase --abort` to
`git --rebase abort`. Fix it.
Signed-off-by: Julia Evans <julia@jvns.ca>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
I meant to delete this sentence fragment when rewriting this paragraph,
but accidentally left it in. It's repetitive (since it was meant to be
deleted) and it's causing some formatting issues with the note.
Signed-off-by: Julia Evans <julia@jvns.ca>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
8fbd903e (branch: advise about ref syntax rules, 2024-03-05) added
an advice about checking git-check-ref-format(1) for the ref syntax
rules. The advice uses man(1). But git(1) is a multi-platform tool and
man(1) may not be available on some platforms. It might also be slightly
jarring to see a suggestion for running a command which is not from
the Git suite.
Let’s instead use git-help(1) in order to stay inside the land of
git(1). This also means that `help.format` (for `man`, `html` or other
formats) will be used if set.
Also change to using single quotes (') to quote the command since that
is more conventional.
While here let’s also update the test to use `{SQ}`, which is more
readable and easier to edit.
Signed-off-by: Kristoffer Haugsbakk <code@khaugsbakk.name>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Various issues detected by Asan have been corrected.
* jk/asan-bonanza:
t: enable ASan's strict_string_checks option
fsck: avoid parse_timestamp() on buffer that isn't NUL-terminated
fsck: remove redundant date timestamp check
fsck: avoid strcspn() in fsck_ident()
fsck: assert newline presence in fsck_ident()
cache-tree: avoid strtol() on non-string buffer
Makefile: turn on NO_MMAP when building with ASan
pack-bitmap: handle name-hash lookups in incremental bitmaps
compat/mmap: mark unused argument in git_munmap()
Both "git apply" and "git diff" learn a new whitespace error class,
"incomplete-line".
* jc/whitespace-incomplete-line:
attr: enable incomplete-line whitespace error for this project
diff: highlight and error out on incomplete lines
apply: check and fix incomplete lines
whitespace: allocate a few more bits and define WS_INCOMPLETE_LINE
apply: revamp the parsing of incomplete lines
diff: update the way rewrite diff handles incomplete lines
diff: call emit_callback ecbdata everywhere
diff: refactor output of incomplete line
diff: keep track of the type of the last line seen
diff: correct suppress_blank_empty hack
diff: emit_line_ws_markup() if/else style fix
whitespace: correct bit assignment comments
diff_cache() queues unchanged filepairs if the flag find_copies_harder
is set, and uses diff_change() for that. This function allocates a
filespec for each side, does a few other things that are unnecessary for
unchanged filepairs and always sets the diff_flag has_changes, which is
simply misleading in this case.
Add a new streamlined function for queuing unchanged filepairs and
use it in show_modified(), which is called by diff_cache() via
oneway_diff() and do_oneway_diff(). It allocates only a single filespec
for each filepair and uses it twice with reference counting. This has a
measurable effect if there are a lot of them, like in the Linux repo:
Benchmark 1: ./git_v2.52.0 -C ../linux diff --cached --find-copies-harder
Time (mean ± σ): 31.8 ms ± 0.2 ms [User: 24.2 ms, System: 6.3 ms]
Range (min … max): 31.5 ms … 32.3 ms 85 runs
Benchmark 2: ./git -C ../linux diff --cached --find-copies-harder
Time (mean ± σ): 23.9 ms ± 0.2 ms [User: 18.1 ms, System: 4.6 ms]
Range (min … max): 23.5 ms … 24.4 ms 111 runs
Summary
./git -C ../linux diff --cached --find-copies-harder ran
1.33 ± 0.01 times faster than ./git_v2.52.0 -C ../linux diff --cached --find-copies-harder
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
git-last-modified(1) uses a scratch bitmap to keep track of paths that
have been changed between commits. To avoid reallocating a bitmap on
each call of process_parent(), the scratch bitmap is kept and reused.
Although, it seems an incorrect length is passed to memset(3).
`struct bitmap` uses `eword_t` to for internal storage. This type is
typedef'd to uint64_t. To fully zero the memory used by the bitmap,
multiply the length (saved in `struct bitmap::word_alloc`) by the size
of `eword_t`.
Reported-by: Anders Kaseorg <andersk@mit.edu>
Helped-by: Jeff King <peff@peff.net>
Signed-off-by: Toon Claes <toon@iotcl.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
There was significant confusion in the git-replay manual about what
constitutes a revision range. As noted in f302c1e4aa09 (revisions(7):
clarify that most commands take a single revision range, 2021-05-18):
Commands that are specifically designed to take two distinct ranges
(e.g. "git range-diff R1 R2" to compare two ranges) do exist, but they
are exceptions. Unless otherwise noted, all "git" commands that operate
on a set of commits work on a single revision range.
`git replay` is not an exception, but a few places in the manual were
written as though it were. These appear to have come in revisions to
the original series, between v3->v4 (see
https://lore.kernel.org/git/CAP8UFD3bpLrVW97DH7j=V9H2GsTSAkksC9L3QujQERFk_kLnZA@mail.gmail.com/
, "More than one <revision-range> can be passed") and between v6->v7
(https://lore.kernel.org/git/20231115143327.2441397-1-christian.couder@gmail.com/,
"Takes ranges of commits"), and I missed both of these revisions when
reviewing. Fix them now.
There was also a reference to the "Commit Limiting options below", but
this page has no such section of options; strike the misleading
reference.
It is worth noting that we are documenting existing behavior, rather
than optimal behavior. Junio has multiple times suggested introducing
alternative ways to walk revisions and use them in `git replay
--advance`, e.g. at
* https://lore.kernel.org/git/xmqqy1mqo6kv.fsf@gitster.g/
* https://lore.kernel.org/git/xmqq8rb3is8c.fsf@gitster.g/
* https://lore.kernel.org/git/xmqqtsydj2zk.fsf@gitster.g/ (item (2))
If/when we introduce some new revision walking flag that implements one
of these alternate types of revision walks, we can update the --advance
option and this manual appropriately.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The find_longest_common_sequence() function in patience diff is
inefficient as it calls binary_search() for every unique line it
encounters when deciding where to put it in the sequence. From
instrumentation (using xctrace) on popular repositories, binary_search()
takes up 50-60% of the run time within patience_diff() when performing a
diff.
To optimize this, add a boundary condition check before binary_search()
is called to see if the encountered unique line is located after the
entire currently tracked longest subsequence. If so, skip the
unnecessary binary search and simply append the entry to the end of
sequence. Given that most files compared in a diff are usually quite
similar to each other, this condition is very common, and should be hit
much more frequently than the binary search.
Below are some end-to-end performance results by timing `git log
--shortstat --oneline -500 --patience` on different repositories with
the old and new code. Generally speaking this seems to give at least
8-10% speed up. The "binary search hit %" column describes how often the
algorithm enters the binary search path instead of the new faster path.
Even in the WebKit case we can see that it's quite rare (1.46%).
| Repo | Speed difference | binary search hit % |
|----------|------------------|---------------------|
| vim | 1.27x | 0.01% |
| pytorch | 1.16x | 0.02% |
| cpython | 1.14x | 0.06% |
| ripgrep | 1.14x | 0.03% |
| git | 1.13x | 0.12% |
| vscode | 1.09x | 0.10% |
| WebKit | 1.08x | 1.46% |
The benchmarks were done using hyperfine, on an Apple M1 Max laptop,
with git compiled with `-O3 -flto`.
Signed-off-by: Yee Cheng Chin <ychin.git@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This test starts a SOCKS server in Perl in the background and then kills
it after the tests are done. However, when using zsh (in sh mode) in
the tests, the start_socks function hangs until the background process
is killed.
Note that this does not reproduce in a simple shell script, so there is
likely some interaction between job handling, our heavy use of eval in
the test framework, and possibly other complexities of our test
framework. What is clear, however, is that switching from a compound
statement to a subshell fixes the problem entirely and the test passes
with no problem, so do that.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In this comparison, we want to know whether the number of lines is
greater than 1. Our test_line_count function passes the first argument
as the comparison operator to test, so what we want is a numerical
comparison, not a string comparison. While this does not produce a
functional problem now, it could very well if we expected two or more
items, in which case the value "10" would not match when it should.
Furthermore, the "<" and ">" comparisons are new in POSIX 1003.1-2024
and we don't want to require such a new version of POSIX since many
popular and supported operating systems were released before that
version of POSIX was released.
Finally, zsh's builtin test operator does not like the greater-than sign
in "test", since it is only supported in the double-bracket extension.
This has been reported and will be addressed in a future version, but
since our code is also technically incorrect, as well as not very
compatible, let's fix it by using a numeric comparison.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"Windows+meson" job at the GitHub Actions CI was hard to debug, as
it did not show and save failed test artifacts, which has been
corrected.
* jk/ci-windows-meson-test-fix:
ci(windows-meson-test): handle options and output like other test jobs
unit-test: ignore --no-chain-lint
"git worktree list" attempts to show paths to worktrees while
aligning them, but miscounted display columns for the paths when
non-ASCII characters were involved, which has been corrected.
* pw/worktree-list-display-width-fix:
worktree list: quote paths
worktree list: fix column spacing
Makefile based build have recently been updated to build a
libgit.a that also has reftable and xdiff objects; CMake based
build procedure has been updated to match.
* js/cmake-libgit-fix:
cmake: stop trying to build the reftable and xdiff libraries
The "return errno = EFOO, -1" construct, which is heavily used in
compat/mingw.c and triggers warnings under "-Wcomma", has been
rewritten to avoid the warnings.
* js/mingw-assign-comma-fix:
mingw: avoid the comma operator
Yet another corner case fix around renames in the "ort" merge
strategy.
* en/ort-rename-another-fix:
merge-ort: fix failing merges in special corner case
merge-ort: remove debugging crud
t6429: update comment to mention correct tool
The quality of tests and test suites is most apparent not when
everything passes, but in how quickly bugs can be identified,
analyzed, and resolved after test failures occur.
As such, it is an unfortunate side effect of 2a21098b98a (github: adapt
containerized jobs to be rootless, 2025-01-10) that the output of failed
test cases, which was shown before that change directly in the build
logs, is now no longer shown at all.
The reason is a side effect of trying to run the build and the tests
with permissions other than the `root` user, but without providing the
prerequisite permissions to signal what tests failed and whose output
hence needs to be included in the logs.
The way this signaling works is for the workflow to write into
special-purpose files whose path is specific to the current workflow
step and which can be accessed via the `$GITHUB_ENV` environment
variable, which differs between workflow steps. It is this file that is
missing write permission for the `builder` user that was introduced in
above-mentioned commit.
The solution is simple: make the file world-writable.
Technically, this write permission should be removed after the step has
completed, if proper security practices were to be upheld, but since
nothing uses that file again, it does not matter, and the fix is more
succinct this way.
This commit is best viewed with `--color-words`.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
[jc: squashed Elijah's rewrite of the first paragraph of the log message]
[jc: updated chmod to match "world-writable" in the log message]
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* 'master' of https://github.com/j6t/gitk:
gitk: add external diff file rename detection
gitk: show unescaped file names on 'rename' and 'copy' lines
gitk: fix a 'continue' statement outside a loop to 'return'
gitk: persist position and size of the Tags and Heads window
Revert "gitk: Only restore window size from ~/.gitk, not position"
When "git replay" replays a commit it copies the extended headers
across from the original commit. However, if the original commit
was signed, we do not want to copy the header associated with the
signature is it wont be valid for the new commit. The code already
knows to avoid coping the "gpgsig" header but does not know to avoid
copying the "gpgsig-sha256" header. Add that header to the list of
exclusions to match what "git commit --amend" does.
Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Tools like `git filter-repo`[1] use `git fast-export` and
`git fast-import` to rewrite repository history. When rewriting
history using one such tool though, commit signatures might become
invalid because the commits they sign changed due to the changes
in the repository history made by the tool between the fast-export
and the fast-import steps.
Note that as far as signature handling goes:
* Since fast-export doesn't know what changes filter-repo may make
to the stream, it can't know whether the signatures will still be
valid.
* Since filter-repo doesn't know what history canonicalizations
fast-export performed (and it performs a few), it can't know whether
the signatures will still be valid.
* Therefore, fast-import is the only process in the pipeline that
can know whether a specified signature remains valid.
Having invalid signatures in a rewritten repository could be
confusing, so users rewritting history might prefer to simply
discard signatures that are invalid at the fast-import step.
For example a common use case is to rewrite only "recent" history.
While specifying commit ranges corresponding to "recent" commits
could work, users worry about getting it wrong and want to just
automatically rewrite everything, expecting older commit signatures
to be untouched.
To let them do that, let's add a new 'strip-if-invalid' mode to the
`--signed-commits=<mode>` option of `git fast-import`.
It would be interesting for the `--signed-tags=<mode>` option to
have this mode too, but we leave that for a future improvement.
It might also be possible for `git fast-export` to have such a mode
in its `--signed-commits=<mode>` and `--signed-tags=<mode>`
options, but the use cases for it are much less clear, so we also
leave that for possible future improvements.
For now let's just die() if 'strip-if-invalid' is passed to these
options where it hasn't been implemented yet.
[1]: https://github.com/newren/git-filter-repo
Helped-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* js/persist-ref-window-geometry:
gitk: persist position and size of the Tags and Heads window
Revert "gitk: Only restore window size from ~/.gitk, not position"
In the preceding commit we have moved the logic that reparents object
database sources on chdir(3p) from "setup.c" into "odb.c". Let's also do
the same for any temporary quarantine directories so that the complete
reparenting logic is self-contained in "odb.c".
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The function `repo_set_gitdir()` is called in two situations:
- To initialize the repository with its discovered location. As part
of this we also set up the new object database.
- To update the repository's discovered location in case the process
changes its working directory so that we update relative paths. This
means we also have to update any relative paths that are potentially
used in the object database.
In the context of the object database we ideally wouldn't ever have to
worry about the second case: if all paths used by our object database
sources were absolute, then we wouldn't have to update them. But
unfortunately, the paths aren't only used to locate files owned by the
given source, but we also use them for reporting purposes. One such
example is `repo_get_object_directory()`, where we cannot just change
semantics to always return absolute paths, as that is likely to break
tooling out there.
One solution to this would be to have both a "display path" and an
"internal path". This would allow us to use internal paths for all
internal matters, but continue to use the potentially-relative display
paths so that we don't break compatibility. But converting the codebase
to honor this split is quite a messy endeavour, and it wouldn't even
help us with the goal to get rid of the need to update the display path
on chdir(3p).
Another solution would be to rework "setup.c" so that we never have to
update paths in the first place. In that case, we'd only initialize the
repository once we have figured out final locations for all directories.
This would be a significant simplification of that subsystem indeed, but
the current logic is so messy that it would take significant investments
to get there.
Meanwhile though, while object sources may still use relative paths, the
best thing we can do is to handle the reparenting of the object source
paths in the object database itself. This can be done by registering one
callback for each object database so that we get notified whenever the
current working directory changes, and we then perform the reparenting
ourselves.
Ideally, this wouldn't even happen on the object database level, but
instead handled by each object database source. But we don't yet have
proper pluggable object database sources, so this will need to be
handled at a later point in time.
The logic itself is rather simple:
- We register the callback when creating the object database.
- We unregister the callback when releasing it again.
- We split up `set_git_dir_1()` so that it becomes possible to skip
recreating the object database. This is required because the
function is called both when the current working directory changes,
but also when we set up the repository. Calling this function
without skipping creation of the ODB will result in a bug in case
it's already created.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
While we (obviously) have a way to register new listeners that get
called whenever we chdir(3p), we don't have an equivalent that can be
used to unregister such a listener again.
Add one, as it will be required in a subsequent commit.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The logic to set up a new object database is currently distributed
across two functions in "repository.c":
- In `initialize_repository()` we initialize an empty object database.
This object database is not fully initialized and doesn't have any
sources attached to it.
- The primary object database source is then created in
`repo_set_gitdir()`.
Ideally though, the logic should be entirely self-contained so that we
can iterate more readily on how exactly the sources themselves get set
up.
Refactor `odb_new()` to handle both allocation and setup of the object
database. This ensures that the object database is always initialized
and ready for use, and it allows us to change how the sources get set up
eventually.
Note that `repo_set_gitdir()` still reaches into the sources when the
function gets called with an already-initialized object database. This
will be fixed in the next commit.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When pushing references via HTTP we call `repo_init_revisions()` in a
loop for each reference that we're about to push. As third argument we
pass the result of `setup_git_directory()`, which causes us to
reinitialize the repository every single time.
This is an obvious waste of compute, as the repository that we're
working in will never change across any of the initializations. The only
reason that we do this is to retrieve the directory of the repository.
Furthermore, this is about to create issues in a subsequent commit,
where reinitializing the repository will cause a `BUG()`.
Address this by storing the Git directory in a variable instead so that
we don't have to call the function repeatedly.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The "repository" test helper sets up `the_repository` twice. In fact
though, we don't even have to set it up even once: all we need is to set
up its hash algorithm, because we still depend on some subsystems that
aren't free of `the_repository`.
Refactor the code accordingly. This prepares for a subsequent change,
where setting up the repository repeatedly will lead to a `BUG()`.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When asked to perform object consistency checks via the `--fsck-objects`
flag we verify that each object part of the pack is valid. In general,
this check can even be performed outside of a Git repository: we don't
need an initialized object database as we simply read the object from
the packfile directly.
But there's one exception: a subset of the object checks may be deferred
to a later point in time. For now, this only concerns ".gitmodules" and
".gitattributes" files: whenever we see a tree referencing these files
we queue them for a deferred check. This is done because we need to do
some extra checks for those files to ensure that they are well-formed,
and these checks need to be done regardless of whether the corresponding
blobs are part of the packfile or not.
This works inside a repository, but unfortunately the logic leads to a
segfault when running outside of one. This is because we eventually call
`odb_read_object()`, which will crash because the object database has
not been initialized.
There's multiple options here:
- We could in theory create a purely in-memory database with only a
packfile store that contains the single packfile. We don't really
have the infrastructure for this yet though, and it would end up
being quite hacky.
- We could refuse to perform consistency checks outside of a
repository. But most of the checks work alright, so this would be a
regression.
- We can skip the finalizing consistency checks when running outside
of a repository. This is not as invasive as skipping all checks,
but it's not great to randomly skip a subset of tests, either.
None of these options really feel perfect. The first one would be the
obvious choice if easily possible.
There's another option though: instead of skipping the final object
checks, we can die if there are any queued object checks. With this
change we now die exactly if and only if we would have previously
segfaulted. Like this we ensure that objects that _may_ fail the
consistency checks won't be silently skipped, and at the same time we
give users a much better error message.
Refactor the code accordingly and add a test that would have triggered
the segfault. Note that we also move down the logic to add the packfile
to the store. There is no point doing this any earlier than right before
we execute `fsck_finish()`, and it ensures that the logic to set up and
perform the consistency check is self-contained.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Introduce a new function that allows the caller to verify whether two
oidsets contain the exact same object IDs.
Note that this change requires us to change `oidset_iter_init()` to
accept a `const struct oidset`.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Our object database sources have a field `disable_ref_updates`. This
field can obviously be set to disable reference updates, but it is
somewhat curious that this logic is hosted by the object database.
The reason for this is that it was primarily added to keep us from
accidentally updating references while an ODB transaction is ongoing.
Any objects part of the transaction have not yet been committed to disk,
so new references that point to them might get corrupted in case we
never end up committing the transaction. As such, whenever we create a
new transaction we set up a new temporary ODB source and mark it as
disabling reference updates.
This has one (and only one?) upside: once we have committed the
transaction, the temporary source will be dropped and thus we clean up
the disabled reference updates automatically. But other than that, it's
somewhat misdesigned:
- We can have multiple ODB sources, but only the currently active
source inhibits reference updates.
- We're mixing concerns of the refdb with the ODB.
Arguably, the decision of whether we can update references or not should
be handled by the refdb. But that wouldn't be a great fit either, as
there can be one refdb per worktree. So we'd again have the same problem
that a "global" intent becomes localized to a specific instance.
Instead, move the setting into the repository. While at it, convert it
into a boolean.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git submodule add" tries to find if a submodule with the same name
already exists at a different path, by looking up an entry in the
.gitmodules file. If the entry in the file is incomplete, e.g.,
when the submodule.<name>.something variable is defined but there is
no definition of submodule.<name>.path variable, it accesses the
missing .path member of the submodule structure and triggers a
segfault.
A brief audit was done to make sure that the code does not assume
members other than those that are absolutely certain to exist: a
submodule obtained by submodule_from_name() should have .name
member, while a submodule obtained by submodule_from_path() should
also have .path as well as .name member, and we cannot assume
anything else. Luckily, the module_add() codepath was the only
problematic one. It is fairly recent code that comes from 1fa06ced
(submodule: prevent overwriting .gitmodules on path reuse,
2025-07-24).
A helper used by update_submodule() seems to assume that its call to
submodule_from_path() always yields a submodule object without a
failure, which seems to rely on the caller making sure it is the
case. Leave an assert() with a NEEDSWORK comment there for future
developers to make sure the assumption actually holds.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
These callers expect that git_config_pathname() that returns 0 is a
signal that the variable they passed has a string they need to act
on. But with the introduction of ":(optional)path" earlier, that is
no longer the case. If the path specified by the configuration
variable is missing, their variable will get a NULL in it, and they
need to act on it (often, just refraining from copying it elsewhere).
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Earlier we added support for a value spelled as ":(optional)path"
for configuration variables whose values are of type "path", with
the documented semantics "if the path is missing, behave as if such
a variable definition is not even there."
This has worked OK for code paths that reads configuration files and
stores the configured value as a string, where NULL in such a string
is treated as if the setting is not there, left as the default.
However, there are other code paths that do not _ignore_ such NULL
values and misbehave. "git config get --path" is one of them.
When git_config_pathname() helper function finds that the value of
the variable is an optional path *and* the path is missing, it
leaves the destination pointer intact (which usually is left to
NULL) and returns 0 to signal a success. format_config() helper
however assumed that the destination pointer always gets a string,
which no longer is the case, and segfaulted.
Make sure that git_config_pathname() clears the destination pointer
in such a case, and teach format_config() to react to the condition
by returning 1 (which is different from 0 that is a normal success
and negative that is an error) to its callers. Adjust the callers
to react to this new return value that tells them to pretend as if
they did not even see this partcular <key, value> pair.
Reported-by: Han Jiang <jhcarl0814@gmail.com>
Helped-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The "git repo structure" subcommand tried to align its output but
mixed up byte count and display column width, which has been
corrected.
* jx/repo-struct-utf8width-fix:
builtin/repo: fix table alignment for UTF-8 characters
t/unit-tests: add UTF-8 width tests for CJK chars
An earlier check added to osx keychain credential helper to avoid
storing the credential itself supplied was overeager and rejected
credential material supplied by other helper backends that it would
have wanted to store, which has been corrected.
* kn/osxkeychain-idempotent-store-fix:
osxkeychain: avoid incorrectly skipping store operation
A part of code paths that deals with loose objects has been cleaned
up.
* ps/object-source-loose:
object-file: refactor writing objects via a stream
object-file: rename `write_object_file()`
object-file: refactor freshening of objects
object-file: rename `has_loose_object()`
object-file: read objects via the loose object source
object-file: move loose object map into loose source
object-file: hide internals when we need to reprepare loose sources
object-file: move loose object cache into loose source
object-file: introduce `struct odb_source_loose`
object-file: move `fetch_if_missing`
odb: adjust naming to free object sources
odb: introduce `odb_source_new()`
odb: fix subtle logic to check whether an alternate is usable
"git replay" (experimental) learned to perform ref updates itself
in a transaction by default, instead of emitting where each refs
should point at and leaving the actual update to another command.
* sa/replay-atomic-ref-updates:
replay: add replay.refAction config option
replay: make atomic ref updates the default behavior
replay: use die_for_incompatible_opt2() for option validation
Adding a repository that uses a different hash function is a no-no,
but "git submodule add" did nt prevent it, which has been corrected.
* bc/submodule-force-same-hash:
read-cache: drop submodule check from add_to_cache()
object-file: disallow adding submodules of different hash algo
The code to expand attribute macros has been rewritten to avoid
recursion to avoid running out of stack space in an uncontrolled
way.
* jk/attr-macroexpand-wo-recursion:
attr: avoid recursion when expanding attribute macros
The flags --all and --value of "git config unset" don't make the command
"replace" or "show" anything, they are about selecting what to unset.
Change their help text accordingly.
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The command "git config set <name> <value>" fails for an option that has
multiple values. List the "git config set" flags that can be used,
instead of old-style "git config" actions.
Reported-by: Paul Wintz <pwintz@ucsc.edu>
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
An earier patch had a typo discovered after it has been merged to
'next'. Fix it.
Signed-off-by: Jean-Noël Avila <jn.avila@free.fr>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In the preceding commits we have turned `struct odb_read_stream` into a
publicly visible structure. Furthermore, this structure now contains the
type and size of the object that we are about to stream. Consequently,
the out-pointers that we used before to propagate the type and size of
the streamed object are now somewhat redundant with the data contained
in the structure itself.
Drop these out-pointers and adapt callers accordingly.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The "streaming" terminology is somewhat generic, so it may not be
immediately obvious that "streaming.{c,h}" is specific to the object
database. Rectify this by moving it into the "odb/" directory so that it
can be immediately attributed to the object subsystem.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Refactor the streaming interface to be centered around object databases
instead of centered around the repository. Rename the functions
accordingly.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Move the logic to read packed object streams into the respective
subsystem.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Move the logic to read loose object streams into the respective
subsystem. This allows us to make a couple of function declarations
private.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Subsequent commits will move the backend-specific logic of setting up an
object read stream into the specific subsystems. As the backends are now
the ones that are responsible for allocating the stream they'll need to
have the stream definition available to them.
Make the stream definition public to prepare for this.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Subsequent commits will move the backend-specific logic of object
streaming into their respective subsystems. These subsystems have gotten
rid of `the_repository` already, but we still use it in two locations in
the streaming subsystem.
Prepare for the move by fixing those two cases. Converting the logic in
`open_istream_pack_non_delta()` is trivial as we already got the object
database as input.
But for `stream_blob_to_fd()` we have to add a new parameter to make it
accessible. So, as we already have to adjust all callers anyway, rename
the function to `odb_stream_blob_to_fd()` to indicate it's part of the
object subsystem.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When creating an object stream we first look up the object info and, if
it's present, we call into the respective backend that contains the
object to create a new stream for it.
This has the consequence that, for loose object source, we basically
iterate through the object sources twice: we first discover that the
file exists as a loose object in the first place by iterating through
all sources. And, once we have discovered it, we again walk through all
sources to try and map the object. The same issue will eventually also
surface once the packfile store becomes per-object-source.
Furthermore, it feels rather pointless to first look up the object only
to then try and read it.
Refactor the logic to be centered around sources instead. Instead of
first reading the object, we immediately ask the source to create the
object stream for us. If the object exists we get stream, otherwise
we'll try the next source.
Like this we only have to iterate through sources once. But even more
importantly, this change also helps us to make the whole logic
pluggable. The object read stream subsystem does not need to be aware of
the different source backends anymore, but eventually it'll only have to
call the source's callback function.
Note that at the current point in time we aren't fully there yet:
- The packfile store still sits on the object database level and is
thus agnostic of the sources.
- We still have to call into both the packfile store and the loose
object source.
But both of these issues will soon be addressed.
This refactoring results in a slight change to semantics: previously, it
was `odb_read_object_info_extended()` that picked the source for us, and
it would have favored packed (non-deltified) objects over loose objects.
And while we still favor packed over loose objects for a single source
with the new logic, we'll now favor a loose object from an earlier
source over a packed object from a later source.
Ultimately this shouldn't matter though: the stream doesn't indicate to
the caller which source it is from and whether it was created from a
packed or loose object, so such details are opaque to the caller. And
other than that we should be able to assume that two objects with the
same object ID should refer to the same content, so the streamed data
would be the same, too.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Extract the logic to read object info for a packed object from
`do_oid_object_into_extended()` into a standalone function that operates
on the packfile store. This function will be used in a subsequent
commit.
Note that this change allows us to make `find_pack_entry()` an internal
implementation detail. As a consequence though we have to move around
`packfile_store_freshen_object()` so that it is defined after that
function.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
While all backend-specific data is now contained in a backend-specific
structure, we still share the zlib stream across the loose and packed
objects.
Refactor the code and move it into the specific structures so that we
fully detangle the different backends from one another.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
As explained in a preceding commit, we want to get rid of the union of
stream-type specific data in `struct odb_read_stream`. Create a new
structure for filtered object streams to move towards this design.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
As explained in a preceding commit, we want to get rid of the union of
stream-type specific data in `struct odb_read_stream`. Create a new
structure for packed object streams to move towards this design.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
As explained in a preceding commit, we want to get rid of the union of
stream-type specific data in `struct odb_read_stream`. Create a new
structure for loose object streams to move towards this design.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
As explained in a preceding commit, we want to get rid of the union of
stream-type specific data in `struct odb_read_stream`. Create a new
structure for in-core object streams to move towards this design.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When creating a new stream we first allocate it and then call into
backend-specific logic to populate the stream. This design requires that
the stream itself contains a `union` with backend-specific members that
then ultimately get populated by the backend-specific logic.
This works, but it's awkward in the context of pluggable object
databases. Each backend will need its own member in that union, and as
the structure itself is completely opaque (it's only defined in
"streaming.c") it also has the consequence that we must have the logic
that is specific to backends in "streaming.c".
Ideally though, the infrastructure would be reversed: we have a generic
`struct odb_read_stream` and some helper functions in "streaming.c",
whereas the backend-specific logic sits in the backend's subsystem
itself.
This can be realized by using a design that is similar to how we handle
reference databases: instead of having a union of members, we instead
have backend-specific structures with a `struct odb_read_stream base`
as its first member. The backends would thus hand out the pointer to the
base, but internally they know to cast back to the backend-specific
type.
This means though that we need to allocate different structures
depending on the backend. To prepare for this, move allocation of the
structure into the backend-specific functions that open a new stream.
Subsequent commits will then create those new backend-specific structs.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When streaming a packed object we first populate the stream with
information about the pack that contains the object before calling
`open_istream_pack_non_delta()`. This is done because we have already
looked up both the pack and the object's offset, so it would be a waste
of time to look up this information again.
But the way this is done makes for a somewhat awkward calling interface,
as the caller now needs to be aware of how exactly the function itself
behaves.
Refactor the code so that we instead explicitly pass the packfile info
into `open_istream_pack_non_delta()`. This makes the calling convention
explicit, but more importantly this allows us to refactor the function
so that it becomes its responsibility to allocate the stream itself in a
subsequent patch.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When opening the read stream for a specific object the caller is also
expected to pass in a pointer to the object type. This type is passed
down via multiple levels and will eventually be populated with the type
of the looked-up object.
The way we propagate down the pointer though is somewhat non-obvious.
While `istream_source()` still expects the pointer and looks it up via
`odb_read_object_info_extended()`, we also pass it down even further
into the format-specific callbacks that perform another lookup. This is
quite confusing overall.
Refactor the code so that the responsibility to populate the object type
rests solely with the format-specific callbacks. This will allow us to
drop the call to `odb_read_object_info_extended()` in `istream_source()`
entirely in a subsequent patch.
Furthermore, instead of propagating the type via an in-pointer, we now
propagate the type via a new field in the object stream. It already has
a `size` field, so it's only natural to have a second field that
contains the object type.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When creating a read stream we first populate the structure with the
open callback function and then subsequently call the function. This
layout is somewhat weird though:
- The structure needs to be allocated and partially populated with the
open function before we can properly initialize it.
- We only ever call the `open()` callback function right after having
populated the `struct odb_read_stream::open` member, and it's never
called thereafter again. So it is somewhat pointless to store the
callback in the first place.
Especially the first point creates a problem for us. In subsequent
commits we'll want to fully move construction of the read source into
the respective object sources. E.g., the loose object source will be the
one that is responsible for creating the structure. But this creates a
problem: if we first need to create the structure so that we can call
the source-specific callback we cannot fully handle creation of the
structure in the source itself.
We could of course work around that and have the loose object source
create the structure and populate its `open()` callback, only. But
this doesn't really buy us anything due to the second bullet point
above.
Instead, drop the callback entirely and refactor `istream_source()` so
that we open the streams immediately. This unblocks a subsequent step,
where we'll also start to allocate the structure in the source-specific
logic.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In the following patches we are about to make the `git_istream` more
generic so that it becomes fully controlled by the specific object
source that wants to create it. As part of these refactorings we'll
fully move the structure into the object database subsystem.
Prepare for this change by renaming the structure from `git_istream`
to `odb_read_stream`. This mirrors the `odb_write_stream` structure that
we already have.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Ever since we added whitespace rules for this project, we misspelt
an entry, which has been corrected.
* jc/gitattributes-whitespace-no-indent-fix:
.gitattributes: remove misspelled no-op whitespace attribute
"git maintenance" command learned "is-needed" subcommand to tell if
it is necessary to perform various maintenance tasks.
* kn/maintenance-is-needed:
maintenance: add 'is-needed' subcommand
maintenance: add checking logic in `pack_refs_condition()`
refs: add a `optimize_required` field to `struct ref_storage_be`
reftable/stack: add function to check if optimization is required
reftable/stack: return stack segments directly
As "git diff --quiet" only cares about the existence of any
changes, disable rename/copy detection to skip more expensive
processing whose result will be discarded anyway.
* rs/diff-quiet-no-rename:
diff: disable rename detection with --quiet
The `do_fetch()` function contains the core of the `git-fetch(1)` logic.
Part of this is to fetch and store references. This is done by
1. Creating a reference transaction (non-atomic mode uses batched
updates).
2. Adding individual reference updates to the transaction.
3. Committing the transaction.
4. When using batched updates, handling the rejected updates.
The following commit, will fix a bug wherein fetching tags with
conflicts was causing other reference updates to fail. Fixing this
requires utilizing this logic in different regions of the function.
In preparation of the follow up commit, extract the committing and
rejection handling logic into a separate function called
`commit_ref_transaction()`.
Helped-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
git_configset_get_pathname() is only used once inside config.c; we do
not have to expose it as a public function.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This value is not checked, but it must return to match POSIX
Signed-off-by: Greg Funni <gfunni234@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This option could create a commit history which violates the assumption
that commits have non-decreasing commit timestamps. Warn against that in
both git-am(1) and git-rebase(1).
The genesis of this option is from git-am(1) and was added in
3f01ad66 (am: Add --committer-date-is-author-date option,
2009-01-22). The commit message doesn’t give us an example
of a use case, but the thread starter does:[1]
I've a big set of patches in a mbox file: there's sufficient info
inside for git-am to work.
Yet, each time I do import these, my sha1sums are changing because of
different commit dates.
I'd like to force the commit date to match the info/date from the time
I received the email (and therefore always get back the right
sha1sums).
[1]: https://lore.kernel.org/git/46d6db660901221441q60eb90bdge601a7a250c3a247@mail.gmail.com/
So the motivation was to treat git-am(1) as an import command that
creates the same commit IDs.
Putting aside the question of whether you should be using git-am(1) for
importing commits, this approach is problematic:
• you still need to apply the commits to the same base if you want the
same hashes; and
• you need the same committer.
And if you expect the same committer, why is this person applying the
same patches multiple times with the goal of making *identical* commits?
That was all for git-am(1).
It was added to git-rebase(1) in 570ccad3 (rebase: add options passed to
git-am, 2009-03-18)[2] in order to plug options that could not be sent
on to git-am(1). At this point the utility of the option graduated to
making no sense; a use case for `git rebase --committer-date-is-author-
date` is still yet to be found.
Just warn against using this option on both commands and remind the user
to consider whether they really need it.
† 2: See also 7573cec5 (rebase -i: support
--committer-date-is-author-date, 2020-08-17) for the commit for the
merge backend
Suggested-by: Johannes Sixt <j6t@kdbg.org>
Signed-off-by: Kristoffer Haugsbakk <code@khaugsbakk.name>
Acked-by: Johannes Sixt <j6t@kdbg.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The function `odb_clear()` releases all resources allocated to an object
database and ensures that all fields become zero'd out. Despite its
naming though it doesn't really clear the object database so that it
becomes ready for reuse afterwards again -- the caller would first have
to reinitialize it, and that contradicts the terminology of "clearing"
as we have defined it in our coding guidelines.
There isn't really only a reason to have "clearing" semantics, either.
There's only a single caller of `odb_clear()`, and that caller also ends
up freeing the object database structure itself.
Refactor the function to have "freeing" semantics instead, so that the
structure itself is also freed, which allows us to drop some useless
boilerplate to zero out the structure's members.
This refactoring reveals that we're trying to close the commit graph
multiple times: once directly via `free_commit_graph()`, and once via
`odb_close()`. Drop the former call.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The logic to close an object database is currently contained in the
packfile subsystem. That choice is somewhat relatable, as most of the
logic really is to close resources associated with the packfile store
itself. But we also end up handling object sources and commit graphs,
which certainly is not related to packfiles.
Move the function into the object database subsystem and rename it to
`odb_close()`.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We don't have any external callers of `set_git_dir()` anymore now that
`enter_repo()` has been moved into "setup.c". Remove the declaration and
mark the function as static.
Note that this change requires us to move the implementation around so
that we can avoid adding any new forward declarations.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The function `enter_repo()` is used to enter a repository at a given
path. As such it sits way closer to setting up a repository than it does
with handling paths, but regardless of that it's located in "path.c"
instead of in "setup.c".
Move the function into "setup.c".
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
A part of code paths that deals with loose objects has been cleaned
up.
* ps/object-source-loose:
object-file: refactor writing objects via a stream
object-file: rename `write_object_file()`
object-file: refactor freshening of objects
object-file: rename `has_loose_object()`
object-file: read objects via the loose object source
object-file: move loose object map into loose source
object-file: hide internals when we need to reprepare loose sources
object-file: move loose object cache into loose source
object-file: introduce `struct odb_source_loose`
object-file: move `fetch_if_missing`
odb: adjust naming to free object sources
odb: introduce `odb_source_new()`
odb: fix subtle logic to check whether an alternate is usable
A part of code paths that deals with loose objects has been cleaned
up.
* ps/object-source-loose:
object-file: refactor writing objects via a stream
object-file: rename `write_object_file()`
object-file: refactor freshening of objects
object-file: rename `has_loose_object()`
object-file: read objects via the loose object source
object-file: move loose object map into loose source
object-file: hide internals when we need to reprepare loose sources
object-file: move loose object cache into loose source
object-file: introduce `struct odb_source_loose`
object-file: move `fetch_if_missing`
odb: adjust naming to free object sources
odb: introduce `odb_source_new()`
odb: fix subtle logic to check whether an alternate is usable
- Switch the synopsis to a synopsis block which will automatically
format placeholders in italics and keywords in monospace
- Use _<placeholder>_ instead of <placeholder> in the description
- Use `backticks` for keywords and more complex option
descriptions. The new rendering engine will apply synopsis rules to
these spans.
Signed-off-by: Jean-Noël Avila <jn.avila@free.fr>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
- Switch the synopsis to a synopsis block which will automatically
format placeholders in italics and keywords in monospace
- Use _<placeholder>_ instead of <placeholder> in the description
- Use `backticks` for keywords and more complex option
descriptions. The new rendering engine will apply synopsis rules to
these spans.
Signed-off-by: Jean-Noël Avila <jn.avila@free.fr>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
- Switch the synopsis to a synopsis block which will automatically
format placeholders in italics and keywords in monospace
- Use _<placeholder>_ instead of <placeholder> in the description
- Use `backticks` for keywords and more complex option
descriptions. The new rendering engine will apply synopsis rules to
these spans.
Signed-off-by: Jean-Noël Avila <jn.avila@free.fr>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Code clean-up.
* kn/refs-optim-cleanup:
t/pack-refs-tests: move the 'test_done' to callees
refs: rename 'pack_refs_opts' to 'refs_optimize_opts'
refs: move to using the '.optimize' functions
Some ref backend storage can hold not just the object name of an
annotated tag, but the object name of the object the tag points at.
The code to handle this information has been streamlined.
* ps/ref-peeled-tags:
t7004: do not chdir around in the main process
ref-filter: fix stale parsed objects
ref-filter: parse objects on demand
ref-filter: detect broken tags when dereferencing them
refs: don't store peeled object IDs for invalid tags
object: add flag to `peel_object()` to verify object type
refs: drop infrastructure to peel via iterators
refs: drop `current_ref_iter` hack
builtin/show-ref: convert to use `reference_get_peeled_oid()`
ref-filter: propagate peeled object ID
upload-pack: convert to use `reference_get_peeled_oid()`
refs: expose peeled object ID via the iterator
refs: refactor reference status flags
refs: fully reset `struct ref_iterator::ref` on iteration
refs: introduce `.ref` field for the base iterator
refs: introduce wrapper struct for `each_ref_fn`
The list of packfiles used in a running Git process is moved from
the packed_git structure into the packfile store.
* ps/packed-git-in-object-store:
packfile: track packs via the MRU list exclusively
packfile: always add packfiles to MRU when adding a pack
packfile: move list of packs into the packfile store
builtin/pack-objects: simplify logic to find kept or nonlocal objects
packfile: fix approximation of object counts
http: refactor subsystem to use `packfile_list`s
packfile: move the MRU list into the packfile store
packfile: use a `strmap` to store packs by name
The classic diff adds only the lines that it's going to consider,
during the diff, to an array. A mapping between the compacted
array, and the lines of the file that they reference, is
facilitated by this array.
Signed-off-by: Ezekiel Newren <ezekielnewren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The field rindex describes an index offset for other arrays. Change it
to size_t.
Changing the type of rindex from long to size_t has no cascading
refactor impact because it is only ever used to directly index other
arrays.
Signed-off-by: Ezekiel Newren <ezekielnewren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
size_t is used because nreff describes the number of elements in memory
for rindex.
Signed-off-by: Ezekiel Newren <ezekielnewren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
size_t is used because nrec describes the number of elements for both
recs, and for 'changed' + 2.
Signed-off-by: Ezekiel Newren <ezekielnewren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The ha field is serving two different purposes, which makes the code
harder to read. At first glance, it looks like many places assume
there could never be hash collisions between lines of the two input
files. In reality, line_hash is used together with xdl_recmatch() to
ensure correct comparisons of lines, even when collisions occur.
To make this clearer, the old ha field has been split:
* line_hash: a straightforward hash of a line, independent of any
external context. Its type is uint64_t, as it comes from a fixed
width hash function.
* minimal_perfect_hash: Not a new concept, but now a separate
field. It comes from the classifier's general-purpose hash table,
which assigns each line a unique and minimal hash across the two
files. A size_t is used here because it's meant to be used to
index an array. This also avoids ` as usize` casts on the Rust
side when using it to index a slice.
Signed-off-by: Ezekiel Newren <ezekielnewren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Convert the function signature and body to use unambiguous types. char
is changed to uint8_t because this function processes bytes in memory.
unsigned long to uint64_t so that the hash output is consistent across
platforms. `flags` was changed from long to uint64_t to ensure the
high order bits are not dropped on platforms that treat long as 32
bits.
Signed-off-by: Ezekiel Newren <ezekielnewren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
size_t is the appropriate type because size is describing the number of
elements, bytes in this case, in memory.
Signed-off-by: Ezekiel Newren <ezekielnewren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Make xrecord_t.ptr uint8_t because it's referring to bytes in memory.
In order to avoid a refactor avalanche, many uses of this field were
cast to char* or similar.
Places where casting was unnecessary:
xemit.c:156
xmerge.c:124
xmerge.c:127
xmerge.c:164
xmerge.c:169
xmerge.c:172
xmerge.c:178
Signed-off-by: Ezekiel Newren <ezekielnewren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
ptrdiff_t is appropriate for dstart and dend because they both describe
positive or negative offsets relative to a pointer.
Signed-off-by: Ezekiel Newren <ezekielnewren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Document other nuances when crossing the FFI boundary. Other language
mappings may be added in the future.
Signed-off-by: Ezekiel Newren <ezekielnewren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add a new flag `--all` to git-repo-info for requesting values for all
the available keys. By using this flag, the user can retrieve all the
values instead of searching what are the desired keys for what they
wants.
Helped-by: Karthik Nayak <karthik.188@gmail.com>
Helped-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Lucas Seiki Oshiro <lucasseikioshiro@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Move the field printing in git-repo-info to a new function called
`print_field`, allowing it to be called by functions other than
`print_fields`.
Also change its use of quote_c_style() helper to output directly to
the standard output stream, instead of taking a result in a strbuf
and then printing it outselves.
Signed-off-by: Lucas Seiki Oshiro <lucasseikioshiro@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
If a worktree path contains newlines or other control characters
it messes up the output of "git worktree list". Fix this by using
quote_path() to display the worktree path. The output of "git worktree
list" is designed for human consumption, scripts should be using the
"--porcelain" option so this change should not break them.
Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The output of "git worktree list" displays a table containing the
worktree path, HEAD OID and branch name for each worktree. The code
aligns the columns by measuring the visual width of the worktree path
when it is printed. Unfortunately it fails to use the visual width
when calculating the width of the column so, if any of the paths
contain a multibyte character, we can end up with excess padding
between columns. The simplest fix would be to replace strlen() with
utf8_strwidth() in measure_widths(). However that leaves us measuring
the visual width twice and the byte length once. By caching the visual
width and printing the padding separately to the worktree path, we only
need to calculate the visual width once and do not need the byte length
at all. The visual widths are stored in an arrays of structs rather
than an array of ints as the next commit will add more struct members.
Even if there are no multibyte characters in any of the paths we still
print an extra space between the path and the object id as the field
width is calculated as one plus the length of the path and we print an
explicit space as well. This is fixed by not printing the extra space.
The tests are updated to include multibyte characters in one of the
worktree paths and to check the spacing of the columns.
Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We test xmkstemp() in our helper by just calling:
xmkstemp(xstrdup(argv[1]));
This leaks both the copied string as well as the descriptor returned by
the function. In practice this isn't a big deal, since we immediately
exit the program, but:
1. LSan will complain about the memory leak. The only reason we did
not notice this in our leak-checking builds is that both of the
callers in the test suite (both in t0070) pass a broken template
(and expect failure). So the function calls die() before we can
actually leak.
But it's an accident waiting to happen if anybody adds a call which
succeeds.
2. Coverity complains about the descriptor leak. There's a long list
of uninteresting or false positives in Coverity's results, but
since we're here we might as well fix it, too.
I didn't bother adding a new test that triggers the leak. It's not even
in real production code, but just in the test-helper itself.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The GitHub windows-meson-test jobs directly run "meson test" with the
--slice option. This means they skip all of the ci/lib.sh
infrastructure, and in particular:
1. They do not actually set any GIT_TEST_OPTS like --verbose-log or
-x.
2. They do not do the usual handle_failed_tests() magic to print test
failures or tar up failed directories.
As a result, you get almost no feedback at all when a test fails in this
job, making debugging rather tricky.
Let's try to make this behave more like the other CI jobs. Because we're
on Windows, we can't just use the normal run-build-and-tests.sh script.
Our build runs as a separate job (like the non-meson Windows job), and
then we parallelize the tests across several job slices. So we need
something like the run-test-slice.sh script that the "windows-test" job
uses.
In theory we could just swap out the "make" invocation there for
"meson". But it doesn't quite work, because "make" knows how to pull
GIT_TEST_OPTS out of GIT-BUILD-OPTIONS automatically. But for meson, we
have to extract them into the --test-args option ourselves. I tried
making the logic in run-test-slice.sh conditional, but there ended up
being hardly any common code at all (and there are some tricky ordering
constraints). So I added up with a new meson-specific test-slice runner.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In the same spirit as 9faf3963b6 (t: introduce compatibility options to
clar-based tests, 2024-12-13), we should ignore --no-chain-lint passed
to our clar tests, since it may appear in GIT_TEST_OPTS to be used with
other tests.
This is particularly important on Windows CI, where --no-chain-lint is
added to the test options by default, and the meson build will pass all
options to the unit tests. The only reason our meson Windows CI job does
not run into this currently is that it is not respecting GIT_TEST_OPTS
at all! So ignoring this option is a prerequisite to fixing that
situation.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
ASan has an option to enable strict string checking, where any pointer
passed to a function that expects a NUL-terminated string will be
checked for that NUL termination. This can sometimes produce false
positives. E.g., it is not wrong to pass a buffer with { '1', '2', '\n' }
into strtoul(). Even though it is not NUL-terminated, it will stop at
the newline.
But in trying it out, it identified two problematic spots in our test
suite (which have now been adjusted):
1. The strtol() parsing in cache-tree.c was a real potential problem,
which would have been very hard to find otherwise (since it
required constructing a very specific broken index file).
2. The use of string functions in fsck_ident() were false positives,
because we knew that there was always a trailing newline which
would stop the functions from reading off the end of the buffer.
But the reasoning behind that is somewhat fragile, and silencing
those complaints made the code easier to reason about.
So even though this did not find any earth-shattering bugs, and even had
a few false positives, I'm sufficiently convinced that its complaints
are more helpful than hurtful. Let's turn it on by default (since the
test suite now runs cleanly with it) and see if it ever turns up any
other instances.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In fsck_ident(), we parse the timestamp with parse_timestamp(), which is
really an alias for strtoumax(). But since our buffer may not be
NUL-terminated, this can trigger a complaint from ASan's
strict_string_checks mode. This is a false positive, since we know that
the buffer contains a trailing newline (which we checked earlier in the
function), and that strtoumax() would stop there.
But it is worth working around ASan's complaint. One is because that
will let us turn on strict_string_checks by default, which has helped
catch other real problems. And two is that the safety of the current
code is very hard to reason about (it subtly depends on distant code
which could change).
One option here is to just parse the number left-to-right ourselves. But
we care about the size of a timestamp_t and detecting overflow, since
that's part of the point of these checks. And doing that correctly is
tricky. So we'll instead just pull the digits into a separate,
NUL-terminated buffer, and use that to call parse_timestamp().
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
After calling "parse_timestamp(p, &end, 10)", we complain if "p == end",
which would imply that we did not see any digits at all. But we know
this cannot be the case, since we would have bailed already if we did
not see any digits, courtesy of extra checks added by 8e4309038f (fsck:
do not assume NUL-termination of buffers, 2023-01-19). Since then,
checking "p == end" is redundant and we can drop it.
This will make our lives a little easier as we refactor further.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We may be operating on a buffer that is not NUL-terminated, but we use
strcspn() to parse it. This is OK in practice, as discussed in
8e4309038f (fsck: do not assume NUL-termination of buffers, 2023-01-19),
because we know there is at least a trailing newline in our buffer, and
we always pass "\n" to strcspn(). So we know it will stop before running
off the end of the buffer.
But this is a subtle point to hang our memory safety hat on. And it
confuses ASan's strict_string_checks mode, even though it is technically
a false positive (that mode complains that we have no NUL, which is
true, but it does not know that we have verified the presence of the
newline already).
Let's instead open-code the loop. As a bonus, this makes the logic more
obvious (to my mind, anyway). The current code skips forward with
strcspn until it hits "<", ">", or "\n". But then it must check which it
saw to decide if that was what we expected or not, duplicating some
logic between what's in the strcspn() and what's in the domain logic.
Instead, we can just check each character as we loop and act on it
immediately.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The fsck code purports to handle buffers that are not NUL-terminated,
but fsck_ident() uses some string functions. This works OK in practice,
as explained in 8e4309038f (fsck: do not assume NUL-termination of
buffers, 2023-01-19). Before calling fsck_ident() we'll have called
verify_headers(), which makes sure we have at least a trailing newline.
And none of our string-like functions will walk past that newline.
However, that makes this code at the top of fsck_ident() very confusing:
*ident = strchrnul(*ident, '\n');
if (**ident == '\n')
(*ident)++;
We should always see that newline, or our memory safety assumptions have
been violated! Further, using strchrnul() is weird, since the whole
point is that if the newline is not there, we don't necessarily have a
NUL at all, and might read off the end of the buffer.
So let's have callers pass in the boundary of our buffer, which lets us
safely find the newline with memchr(). And if it is not there, this is a
BUG(), because it means our caller did not validate the input with
verify_headers() as it was supposed to (and we are better off bailing
rather than having memory-safety problems).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
A cache-tree extension entry in the index looks like this:
<name> NUL <entry_nr> SPACE <subtree_nr> NEWLINE <binary_oid>
where the "_nr" items are human-readable base-10 ASCII. We parse them
with strtol(), even though we do not have a NUL-terminated string (we'd
generally have an mmap() of the on-disk index file). For a well-formed
entry, this is not a problem; strtol() will stop when it sees the
newline. But there are two problems:
1. A corrupted entry could omit the newline, causing us to read
further. You'd mostly get stopped by seeing non-digits in the oid
field (and if it is likewise truncated, there will still be 20 or
more bytes of the index checksum). So it's possible, though
unlikely, to read off the end of the mmap'd buffer. Of course a
malicious index file can fake the oid and the index checksum to all
(ASCII) 0's.
This is further complicated by the fact that mmap'd buffers tend to
be zero-padded up to the page boundary. So to run off the end, the
index size also has to be a multiple of the page size. This is also
unlikely, though you can construct a malicious index file that
matches this.
The security implications aren't too interesting. The index file is
a local file anyway (so you can't attack somebody by cloning, but
only if you convince them to operate in a .git directory you made,
at which point attacking .git/config is much easier). And it's just
a read overflow via strtol(), which is unlikely to buy you much
beyond a crash.
2. ASan has a strict_string_checks option, which tells it to make sure
that options to string functions (like strtol) have some eventual
NUL, without regard to what the function would actually do (like
stopping at a newline here). This option sometimes has false
positives, but it can point to sketchy areas (like this one) where
the input we use doesn't exhibit a problem, but different input
_could_ cause us to misbehave.
Let's fix it by just parsing the values ourselves with a helper function
that is careful not to go past the end of the buffer. There are a few
behavior changes here that should not matter:
- We do not consider overflow, as strtol() would. But nor did the
original code. However, we don't trust the value we get from the
on-disk file, and if it says to read 2^30 entries, we would notice
that we do not have that many and bail before reading off the end of
the buffer.
- Our helper does not skip past extra leading whitespace as strtol()
would, but according to gitformat-index(5) there should not be any.
- The original quit parsing at a newline or a NUL byte, but now we
insist on a newline (which is what the documentation says, and what
Git has always produced).
Since we are providing our own helper function, we can tweak the
interface a bit to make our lives easier. The original code does not use
strtol's "end" pointer to find the end of the parsed data, but rather
uses a separate loop to advance our "buf" pointer to the trailing
newline. We can instead provide a helper that advances "buf" as it
parses, letting us read strictly left-to-right through the buffer.
I didn't add a new test here. It's surprisingly difficult to construct
an index of exactly the right size due to the way we pad entries. But it
is easy to trigger the problem in existing tests when using ASan's
strict string checking, coupled with a recent change to use NO_MMAP with
ASan builds. So:
make SANITIZE=address
cd t
ASAN_OPTIONS=strict_string_checks=1 ./t0090-cache-tree.sh
triggers it reliably. Technically it is not deterministic because there
is ~8% chance (it's 1-(255/256)^20, or ^32 for sha256) that the trailing
checksum hash has a NUL byte in it. But we compute enough cache-trees in
the course of that script that we are very likely to hit the problem in
one of them.
We can look at making strict_string_checks the default for ASan builds,
but there are some other cases we'd want to fix first.
Reported-by: correctmost <cmlists@sent.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Git often uses mmap() to access on-disk files. This leaves a blind spot
in our SANITIZE=address builds, since ASan does not seem to handle mmap
at all. Nor does the OS notice most out-of-bounds access, since it tends
to round up to the nearest page size (so depending on how big the map
is, you might have to overrun it by up to 4095 bytes to trigger a
segfault).
The previous commit demonstrates a memory bug that we missed. We could
have made a new test where the out-of-bounds access was much larger, or
where the mapped file ended closer to a page boundary. But the point of
running the test suite with sanitizers is to catch these problems
without having to construct specific tests.
Let's enable NO_MMAP for our ASan builds by default, which should give
us better coverage. This does increase the memory usage of Git, since
we're copying from the filesystem into heap. But the repositories in the
test suite tend to be small, so the overhead isn't really noticeable
(and ASan already has quite a performance penalty).
There are a few other known bugs that this patch will help flush out.
However, they aren't directly triggered in the test suite (yet). So
it's safe to turn this on now without breaking the test suite, which
will help us add new tests to demonstrate those other bugs as we fix
them.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
If a bitmap has a name-hash cache, it is an array of 32-bit integers,
one per entry in the bitmap, which we've mmap'd from the .bitmap file.
We access it directly like this:
if (bitmap_git->hashes)
hash = get_be32(bitmap_git->hashes + index_pos);
That works for both regular pack bitmaps and for non-incremental midx
bitmaps. There is one bitmap_index with one "hashes" array, and
index_pos is within its bounds (we do the bounds-checking when we load
the bitmap).
But for an incremental midx bitmap, we have a linked list of
bitmap_index structs, and each one has only its own small slice of the
name-hash array. If index_pos refers to an object that is not in the
first bitmap_git of the chain, then we'll access memory outside of the
bounds of its "hashes" array, and often outside of the mmap.
Instead, we should walk through the list until we find the bitmap_index
which serves our index_pos, and use its hash (after adjusting index_pos
to make it relative to the slice we found). This is exactly what we do
elsewhere for incremental midx lookups (like the pack_pos_to_midx() call
a few lines above). But we can't use existing helpers like
midx_for_object() here, because we're walking through the chain of
bitmap_index structs (each of which refers to a midx), not the chain of
incremental multi_pack_index structs themselves.
The problem is triggered in the test suite, but we don't get a segfault
because the out-of-bounds index is too small. The OS typically rounds
our mmap up to the nearest page size, so we just end up accessing some
extra zero'd memory. Nor do we catch it with ASan, since it doesn't seem
to instrument mmaps at all. But if we build with NO_MMAP, then our maps
are replaced with heap allocations, which ASan does check. And so:
make NO_MMAP=1 SANITIZE=address
cd t
./t5334-incremental-multi-pack-index.sh
does show the problem (and this patch makes it go away).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Our mmap compat code emulates mapping by using malloc/free. Our
git_munmap() must take a "length" parameter to match the interface of
munmap(), but we don't use it (it is up to the allocator to know how big
the block is in free()).
Let's mark it as UNUSED to avoid complaints from -Wunused-parameter.
Otherwise you cannot build with "make DEVELOPER=1 NO_MMAP=1".
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The pattern `return errno = ..., -1;` is observed several times in
`compat/mingw.c`. It has served us well over the years, but now clang
starts complaining:
compat/mingw.c:723:24: error: possible misuse of comma operator here [-Werror,-Wcomma]
723 | return errno = ENOSYS, -1;
| ^
See for example this failing workflow run:
https://github.com/git-for-windows/git-sdk-arm64/actions/runs/15457893907/job/43513458823#step:8:201
Let's appease clang (and also reduce the use of the no longer common
comma operator).
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In the `en/make-libgit-a` topic branch, more precisely in the commits
f3b4c89d59f1 (make: delete REFTABLE_LIB, add reftable to LIB_OBJS,
2025-10-02) and cf680cdb9543 (make: delete XDIFF_LIB, add xdiff to
LIB_OBJS, 2025-10-02), the strategy to build three static libraries was
rethought, and instead only one static library is now built.
This is good.
However, the CMake definition was not changed accordingly, and now
CMake-based builds fail thusly:
[...]
Generating hook-list.h
CMake Error at CMakeLists.txt:122 (string):
string sub-command REPLACE requires at least four arguments.
Call Stack (most recent call first):
CMakeLists.txt:711 (parse_makefile_for_sources)
CMake Error at CMakeLists.txt:122 (string):
string sub-command REPLACE requires at least four arguments.
Call Stack (most recent call first):
CMakeLists.txt:717 (parse_makefile_for_sources)
-- Configuring incomplete, errors occurred!
Fix that by removing the parts that expect the reftable and xdiff
objects to be defined separately in the Makefile, still.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
`wcsncpy_s()` wants to write the terminating null character so we need
to allocate one more space for it in the target memory block.
This should fix crashes when trying to read passwords. When this
happened, the password/token wouldn't print out and Git would therefore
ask for a new password every time.
Signed-off-by: David Macek <david.macek.0@gmail.com>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
At GitHub, we had a repository that was triggering
git: merge-ort.c:3032: process_renames: Assertion `newinfo && !newinfo->merged.clean` failed.
during git replay.
This sounds similar to the somewhat recent f6ecb603ff8a (merge-ort: fix
directory rename on top of source of other rename/delete, 2025-08-06),
but the cause is different. Unlike that case, there are no
rename-to-self situations arising in this case, and new to this case it
can only be triggered during a replay operation on the 2nd or later
commit being replayed, never on the first merge in the sequence.
To trigger, the repository needs:
* an upstream which:
* renames a file to a different directory, e.g.
old/file -> new/file
* leaves other files remaining in the original directory (so that
e.g. "old/" still exists upstream even though file has been
removed from it and placed elsewhere)
* a topic branch being rebased where:
* a commit in the sequence:
* modifies old/file
* a subsequent commit in the sequence being replayed:
* does NOT touch *anything* under new/
* does NOT touch old/file
* DOES modify other paths under old/
* does NOT have any relevant renames that we need to detect
_anywhere_ elsewhere in the tree (meaning this interacts
interestingly with both directory renames and cached renames)
In such a case, the assertion will trigger. The fix turns out to be
surprisingly simple. I have a very vague recollection that I actually
considered whether to add such an if-check years ago when I added the
very similar one for oldinfo in 1b6b902d95a5 (merge-ort:
process_renames() now needs more defensiveness, 2021-01-19), but I think
I couldn't figure out a possible way to trigger it and was worried at
the time that if I didn't know how to trigger it then I wasn't so sure
that simply skipping it was correct. Waiting did give me a chance to
put more thorough tests and checks into place for the rename-to-self
cases a few months back, which I might not have found as easily
otherwise. Anyway, put the check in place now and add a test that
demonstrates the fix.
Note that this bug, as demonstrated by the conditions listed above,
runs at the intersection of relevant renames, trivial directory
resolutions, and cached renames. All three of those optimizations are
ones that unfortunately make the code (and testcases!) a bit more
complex, and threading all three makes it a bit more so. However, the
testcase isn't crazy enough that I'd expect no one to ever hit it in
practice, and was confused why we didn't see it before. After some
digging, I discovered that merge.directoryRenames=false is a workaround
to this bug, and GitHub used that setting until recently (it was a
"temporary" match-what-libgit2-does piece of code that lasted years
longer than intended). Since the conditions I gave above for triggering
this bug rule out the possibility of there being directory renames, one
might assume that it shouldn't matter whether you try to detect such
renames if there aren't any. However, due to commit a16e8efe5c2b
(merge-ort: fix merge.directoryRenames=false, 2025-03-13), the heavy
hammer used there means that merge.directoryRenames=false ALSO turns off
rename caching, which is critical to triggering the bug. This becomes
a bit more than an aside since...
Re-reading that old commit, a16e8efe5c2b (merge-ort: fix
merge.directoryRenames=false, 2025-03-13), it appears that the solution
to this latest bug might have been at least a partial alternative
solution to that old commit. And it may have been an improved
alternative (or at least help implement one), since it may be able to
avoid the heavy-handed disabling of rename cache. That might be an
interesting future thing to investigate, but is not critical for the
current fix. However, since I spent time digging it all up, at least
leave a small comment tweak breadcrumb to help some future reader
(myself or others) who wants to dig further to connect the dots a little
quicker.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
While developing commit a16e8efe5c2b (merge-ort: fix
merge.directoryRenames=false, 2025-03-13), I was testing things out and
had an extra condition on one of the if-blocks that I occasionally
swapped between '&& 0' and '&& 1' to see the effects of the changes. I
forgot to remove it before submitting and it wasn't caught in review.
Remove it now.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
A comment at the top of t6429 mentions why the test doesn't exercise git
rebase or git cherry-pick. However, it claims that it uses `test-tool
fast-rebase`. That was true when the comment was written, but commit
f920b0289ba3 (replay: introduce new builtin, 2023-11-24) changed it to
use git replay without updating this comment.
We could potentially just strike this second comment, since git replay
is a bona fide built-in, but perhaps the explanation about why it focuses
on git replay is still useful. Update the comment to make it accurate
again.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When Scalar was made a canonical part of Git in 7b5c93c6c68 (scalar:
include in standard Git build & installation, 2022-09-02), it was added
to all relevant Makefile targets except for the `strip` target.
Let's correct that.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Call xmkstemp_mode() instead of duplicating its error handling code.
This switches the implementation from the system's mkstemp(3) to our own
git_mkstemp_mode(), which works just as well.
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The diff algorithm used in 'git-blame(1)' is set to 'myers',
without the possibility to change it aside from the `--minimal` option.
There has been long-standing interest in changing the default diff
algorithm to "histogram", and Git 3.0 was floated as a possible occasion
for taking some steps towards that:
https://lore.kernel.org/git/xmqqed873vgn.fsf@gitster.g/
As a preparation for this move, it is worth making sure that the diff
algorithm is configurable where useful.
Make it configurable in the `git-blame(1)` command by introducing the
`--diff-algorithm` option and make honor the `diff.algorithm` config
variable. Keep Myers diff as the default.
Signed-off-by: Antonin Delpeuch <antonin@delpeuch.eu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The XDF_DIFF_ALGORITHM_MASK bit mask only includes bits for the patience
and histogram diffs, not for the minimal one. This means that when
reseting the diff algorithm to the default one, one needs to separately
clear the bit for the minimal diff. There are places in the code that fail
to do that: merge-ort.c and builtin/merge-file.c.
Add the XDF_NEED_MINIMAL bit to the bit mask, and remove the separate
clearing of this bit in the places where it hasn't been forgotten.
Signed-off-by: Antonin Delpeuch <antonin@delpeuch.eu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We replaced deprecated macos-13 with macos-14 image in GitHub
Actions CI, but we forgot that the image is for arm64. We have
been seeing a lot of test failures ever since. Switch to arm64
binary for Perforce tests.
* jc/ci-use-arm64-p4-on-macos:
Use Perforce arm64 binary on macOS CI jobs
In a following commit, we are going to check commit signatures, but we
won't have a commit yet, only a commit buffer, and we are going to
discard this commit buffer if the signature is invalid. So it would be
wasteful to create a commit that we might discard, just to be able to
check a commit signature.
It would be simpler instead to be able to check commit signatures
using only a commit buffer instead of a commit.
To be able to do that, let's extract some code from the
check_commit_signature() function into a new verify_commit_buffer()
function, and then let's make check_commit_signature() call
verify_commit_buffer().
Note that this doesn't fundamentally change how
check_commit_signature() works. It used to call parse_signed_commit()
which calls repo_get_commit_buffer(), parse_buffer_signed_by_header()
and repo_unuse_commit_buffer(). Now these 3 functions are called
directly by verify_commit_buffer().
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In a following commit we are going to finalize commit buffers with or
without signatures in order to check the signatures and possibly drop
them.
To do so easily and without duplication, let's refactor the current
code that finalizes commit buffers into a new finalize_commit_buffer()
function.
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The output table from "git repo structure" is misaligned when displaying
UTF-8 characters (e.g., non-ASCII glyphs). E.g.:
| 仓库结构 | 值 |
| -------------- | ---- |
| * 引用 | |
| * 计数 | 67 |
The previous implementation used simple width formatting with printf()
which didn't properly handle multi-byte UTF-8 characters, causing
misaligned table columns when displaying repository structure
information.
This change modifies the stats_table_print_structure function to use
strbuf_utf8_align() instead of basic printf width specifiers. This
ensures proper column alignment regardless of the character encoding of
the content being displayed.
Also add test cases for strbuf_utf8_align(), a function newly introduced
in "builtin/repo.c".
Signed-off-by: Jiang Xin <worldhello.net@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The file "builtin/repo.c" uses utf8_strwidth() to calculate the display
width of UTF-8 characters in a table, but the resulting output is still
misaligned. Add test cases for both utf8_strwidth and utf8_strnwidth to
verify that they correctly compute the display width for UTF-8
characters.
Also updated the build configuration in Makefile and meson.build to
include the new test suite in the build process.
Signed-off-by: Jiang Xin <worldhello.net@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The previous step replaced deprecated macos-13 image with macos-14
image on GitHub Actions CI. While x86-64 binaries can work there,
because macos-14 images are arm64 based (we could replace it with
macos-14-large that is x86-64), it makes more sense to use arm64
binary there. Without this change, we have been getting unusually
higher rate of failures from random macOS CI jobs railing to run
t98xx series of tests.
Helped-by: Koji Nakamaru <koji.nakamaru@gree.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In add_to_cache(), we treat any directories as submodules, and complain
if we can't resolve their HEAD. This call to resolve_gitlink_ref() was
added by f937bc2f86 (add: error appropriately on repository with no
commits, 2019-04-09), with the goal of improving the error message for
empty repositories.
But we already resolve the submodule HEAD in index_path(), which is
where we find the actual oid we're going to use. Resolving it again here
introduces some downsides:
1. It's more work, since we have to open up the submodule repository's
files twice.
2. There are call paths that get to index_path() without going through
add_to_cache(). For instance, we'd want a similar informative
message if "git diff empty" finds that it can't resolve the
submodule's HEAD. (In theory we can also get there through
update-index, but AFAICT it refuses to consider directories as
submodules at all, and just complains about them).
3. The resolution in index_path() catches more errors that we don't
handle here. In particular, it will validate that the object format
for the submodule matches that of the superproject. This isn't a
bug, since our call in add_to_cache() throws away the oid it gets
without looking at it. But it certainly caused confusion for me
when looking at where the object-format check should go.
So instead of resolving the submodule HEAD in add_to_cache(), let's just
teach the call in index_path() to actually produce an error message
(which it already does for other cases). That's probably what f937bc2f86
should have done in the first place, and it gives us a single point of
resolution when adding a submodule to the index.
The resulting output is slightly more verbose, as we propagate the error
up the call stack, but I think that's OK (and again, matches many other
errors we get when indexing fails).
I've left the text of the error message as-is, though it is perhaps
overly specific. There are many reasons that resolving the submodule
HEAD might fail, though outside of corruption or system errors it is
probably most likely that the submodule HEAD is simply on an unborn
branch.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The design of the hash algorithm transition plan is that objects stored
must be entirely in one algorithm since we lack any way to indicate a
mix of algorithms. This also includes submodules, but we have
traditionally not enforced this, which leads to various problems when
trying to clone or check out the the submodule from the remote.
Since this cannot work in the general case, restrict adding a submodule
of a different algorithm to the index. Add tests for git add and git
submodule add that these are rejected.
Note that we cannot check this in git fsck because the malformed
submodule is stored in the tree as an object ID which is either
truncated (when a SHA-256 submodule is added to a SHA-1 repository) or
padded with zeros (when a SHA-1 submodule is added to a SHA-256
repository). We cannot detect even the latter case because someone
could have an actual submodule that actually ends in 24 zeros, which
would be a false positive.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
`--branch` and `--long` refer to git-status(1) options but they don’t tell us
what `short-format` and `long-format` are, respectively. And `--null`
mentions “status” but does not link to the command.
Refer to git-config(1) on `--branch` like `--short` does.
`long-format` is the git-status(1) output. So we can just say that
directly.
Replace “status” with a `linkgit` on `--null`.
Signed-off-by: Kristoffer Haugsbakk <code@khaugsbakk.name>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
git-credential-osxkeychain skips storing a credential if its "get"
action sets "state[]=osxkeychain:seen=1". This behavior was introduced
in e1ab45b2 (osxkeychain: state to skip unnecessary store operations,
2024-05-15), which appeared in v2.46.
However, this state[] persists even if a credential returned by
"git-credential-osxkeychain get" is invalid and a subsequent helper's
"get" operation returns a valid credential. Another subsequent helper
(such as [1]) may expect git-credential-osxkeychain to store the valid
credential, but the "store" operation is incorrectly skipped because it
only checks "state[]=osxkeychain:seen=1".
To solve this issue, "state[]=osxkeychain:seen" needs to contain enough
information to identify whether the current "store" input matches the
output from the previous "get" operation (and not a credential from
another helper).
Set "state[]=osxkeychain:seen" to a value encoding the credential output
by "get", and compare it with a value encoding the credential input by
"store".
[1]: https://github.com/hickford/git-credential-oauth
Reported-by: Petter Sælen <petter@saelen.eu>
Helped-by: Junio C Hamano <gitster@pobox.com>
Helped-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Koji Nakamaru <koji.nakamaru@gree.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Now "git diff --check" and "git apply --whitespace=warn/fix" learned
incomplete line is a whitespace error, enable them for this project
to prevent patches to add new incomplete lines to our source to both
code and documentation files.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Introduced via aea86cf00f (The nineteenth batch, 2025-10-14).
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Teach "git diff" to highlight "\ No newline at end of file" message
as a whitespace error when incomplete-line whitespace error class is
in effect. Thanks to the previous refactoring of complete rewrite
code path, we can do this at a single place.
Unlike whitespace errors in the payload where we need to annotate in
line, possibly using colors, the line that has whitespace problems,
we have a dedicated line already that can serve as the error
message, so paint it as a whitespace error message.
Also teach "git diff --check" to notice incomplete lines as
whitespace errors and report when incomplete-line whitespace error
class is in effect.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The final line of a file that lacks the terminating newline at its
end is called an incomplete line. In general they are frowned upon
for many reasons (imagine concatenating two files with "cat A B" and
what happens when A ends in an incomplete line, for example), and
text-oriented tools often mishandle such a line.
Implement checks in "git apply" for incomplete lines, which is off
by default for backward compatibility's sake, so that "git apply
--whitespace={fix,warn,error}" can notice, warn against, and fix
them.
As one of the new test shows, if you modify contents on an
incomplete line in the original and leave the resulting line
incomplete, it is still considered a whitespace error, the reasoning
being that "you'd better fix it while at it if you are making a
change on an incomplete line anyway", which may controversial.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Reserve a few more bits in the diff flags word to be used for future
whitespace rules. Add WS_INCOMPLETE_LINE without implementing the
behaviour (yet).
Signed-off-by: Junio C Hamano <gitster@pobox.com>
A patch file represents the incomplete line at the end of the file
with two lines, one that is the usual "context" with " " as the
first letter, "added" with "+" as the first letter, or "removed"
with "-" as the first letter that shows the content of the line,
plus an extra "\ No newline at the end of file" line that comes
immediately after it.
Ever since the apply machinery was written, the "git apply"
machinery parses "\ No newline at the end of file" line
independently, without even knowing what line the incomplete-ness
applies to, simply because it does not even remember what the
previous line was.
This poses a problem if we want to check and warn on an incomplete
line. Revamp the code that parses a fragment, to actually drop the
'\n' at the end of the incoming patch file that terminates a line,
so that check_whitespace() calls made from the code path actually
sees an incomplete as incomplete.
Note that the result of this parsing is not directly used by the
code path that applies the patch. apply_one_fragment() function
already checks if each of the patch text it handles is followed by a
line that begins with a backslash to drop the newline at the end of
the current line it is looking at. In a sense, this patch harmonizes
the behaviour of the parsing side to what is already done in the
application side.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The diff_symbol based output framework uses one DIFF_SYMBOL_* enum
value per the kind of output lines of "git diff", which corresponds
to one output line from the xdiff machinery used internally. Most
notably, DIFF_SYMBOL_PLUS and DIFF_SYMBOL_MINUS that correspond to
"+" and "-" lines are designed to always take a complete line, even
if the output from xdiff machinery may produce "\ No newline at the
end of file" immediately after them.
But this is not true in the rewrite-diff codepath, which completely
bypasses the xdiff machinery. Since the code path feeds the bytes
directly from the payload to the output routines, the output layer
has to deal with an incomplete line with DIFF_SYMBOL_PLUS and
DIFF_SYMBOL_MINUS, which never would see an incomplete line in the
normal code paths. This lack of final newline is compensated by an
ugly hack for a fabricated DIFF_SYMBOL_NO_LF_EOF token to inject an
extra newline to the output to simulate output coming from the xdiff
machinery.
Revamp the way the complete-rewrite code path feeds the lines to the
output layer by treating the last line of the pre/post image when it
is an incomplete line specially.
This lets us remove the DIFF_SYMBOL_NO_LF_EOF hack and use the usual
DIFF_SYMBOL_CONTEXT_INCOMPLETE code path, which will later learn how
to handle whitespace errors.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Everybody else, except for emit_rewrite_lines(), calls the
emit_callback data ecbdata. Make sure we call the same thing by
the same name for consistency.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Create a helper function that reacts to "\ No newline at the end of
file" in preparation for unifying the incomplete line handling in
the code path that handles xdiff output and the code path that
bypasses xdiff and produces a complete-rewrite patch.
Currently the output from the DIFF_SYMBOL_CONTEXT_INCOMPLETE case
still (ab)uses the same code as what is used for context lines, but
that would change in a later step where we introduce support to treat
an incomplete line as a whitespace error.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The "\ No newline at the end of the file" can come after any of the
"-" (deleted preimage line), " " (unchanged line), or "+" (added
postimage line). In later steps in this series, we will start
treating a change that makes a file to end in an incomplete line
as a whitespace error, and we would need to know what the previous
line was when we react to "\ No newline" in the diff output. If
the previous line was a context (i.e., unchanged) line, the file
lacked the final newline before the change, and the change did not
touch that line and left it still incomplete, so we do not want to
warn in such a case.
Teach fn_out_consume() function to keep track of what the previous
line was, and prepare an otherwise empty switch statement to let us
react differently to "\ No newline" based on that.
Note that there is an existing curiosity (read: likely to be a bug)
in the code that increments line number in the preimage file every
time it sees a line with "\ No newline" on it, regardless of what
the previous line was. I left it as-is, because it does not affect
the main theme of this series, and more importantly, I do not think
it matters, as these numbers are used only to compare them with
blank_at_eof_in_{pre,post}image to issue a warning when we see more
empty line was added at the end, but by definition, after we see
"\ No newline at the end of the file" for an added line, we will not
see an added line for the file.
An independent audit to ensure that this curious increment can be
safely removed would make a good #leftoverbits clean-up (we may even
find some code that decrements this counter or over-increments the
other quantity this counter is compared with that compensates the
effect of this curious increment that hides a bug, in which case we
may also need to remove them).
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The suppress-blank-empty feature abused the CONTEXT_INCOMPLETE
symbol that was meant to be used only for "\ No newline at the end
of file" code path.
The intent of the feature was to turn a context line we receive from
xdiff machinery (which always uses ' ' for context lines, even an
empty one) and spit it out as a truly empty line.
Perform such a conversion very locally at where a line from xdiff
that begins with ' ' is handled for output; there are many checks
before the control reaches such place that checks the first letter
of the diff output line to see if it is a context line, and having
to check for '\n' and treat it as a special case is error prone.
In order to catch similar hacks in the future, make sure the code
path that is meant for "\ No newline" case checks the first byte is
indeed a backslash.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Apply the simple rule: if you need {} in one arm of the if/else
if/else... cascade, have {} in all of them.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
A comment in diff.c claimed that bits up to 12th (counting from 0th)
are whitespace rules, and 13th thru 15th are for new/old/context,
but it turns out it was miscounting. Correct them, and clarify
where the whitespace rule bits come from in the comment. Extend bit
assignment comments to cover bits used for color-moved, which
weren't described.
Also update the way these bit constants are defined to use (1 << N)
notation, instead of octal constants, as it tends to make it easier
to notice a breakage like this.
Sprinkle a few blank lines between logically distinct groups of CPP
macro definitions to make them easier to read.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Git very often uses the terms "object", "reference", or "index" in its
documentation.
However, it's hard to find a clear explanation of these terms and how
they relate to each other in the documentation. The closest candidates
currently are:
1. `gitglossary`. This makes a good effort, but it's an alphabetically
ordered dictionary and a dictionary is not a good way to learn
concepts. You have to jump around too much and it's not possible to
present the concepts in the order that they should be explained.
2. `gitcore-tutorial`. This explains how to use the "core" Git commands.
This is a nice document to have, but it's not necessary to learn how
`update-index` works to understand Git's data model, and we should
not be requiring users to learn how to use the "plumbing" commands
if they want to learn what the term "index" or "object" means.
3. `gitrepository-layout`. This is a great resource, but it includes a
lot of information about configuration and internal implementation
details which are not related to the data model. It also does
not explain how commits work.
The result of this is that Git users (even users who have been using
Git for 15+ years) struggle to read the documentation because they don't
know what the core terms mean, and it's not possible to add links
to help them learn more.
Add an explanation of Git's data model. Some choices I've made in
deciding what "core data model" means:
1. Omit pseudorefs like `FETCH_HEAD`, because it's not clear to me
if those are intended to be user facing or if they're more like
internal implementation details.
2. Don't talk about submodules other than by mentioning how they
relate to trees. This is because Git has a lot of special features,
and explaining how they all work exhaustively could quickly go
down a rabbit hole which would make this document less useful for
understanding Git's core behaviour.
3. Don't discuss the structure of a commit message
(first line, trailers etc).
4. Don't mention configuration.
5. Don't mention the `.git` directory, to avoid getting too much into
implementation details
Signed-off-by: Julia Evans <julia@jvns.ca>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git last-modified" was optimized by narrowing the set of paths to
follow as it dug deeper in the history.
* tc/last-modified-active-paths-optimization:
last-modified: implement faster algorithm
Given a set of attribute macros like:
[attr]a1 a2
[attr]a2 a3
...
[attr]a300000 -text
file a1
expanding the attributes for "file" requires expanding "a1" to "a2",
"a2" to "a3", and so on until hitting a non-macro expansion ("-text", in
this case). We implement this via recursion: fill_one() calls
macroexpand_one(), which then recurses back to fill_one(). As a result,
very deep macro chains like the one above can run out of stack space and
cause us to segfault.
The required stack space is fairly small; I needed on the order of
200,000 entries to get a segfault on Linux. So it's unlikely anybody
would hit this accidentally, leaving only malicious inputs. There you
can easily construct a repo which will segfault on clone (we look at
attributes during the checkout step, but you'd see the same trying to do
other operations, like diff in a bare repo). It's mostly harmless, since
anybody constructing such a repo is only preventing victims from cloning
their evil garbage, but it could be a nuisance for hosting sites.
One option to prevent this is to limit the depth of recursion we'll
allow. This is conceptually easy to implement, but it raises other
questions: what should the limit be, and do we need a configuration knob
for it?
The recursion here is simple enough that we can avoid those questions by
just converting it to iteration instead. Rather than iterate over the
states of a match_attr in fill_one(), we'll put them all in a queue, and
the expansion of each can add to the queue rather than recursing. Note
that this is a LIFO queue in order to keep the same depth-first order we
did with the recursive implementation. I've avoided using the word
"stack" in the code because the term is already heavily used to refer to
the stack of .gitattribute files that matches the tree structure of the
repository.
The test uses a limited stack size so we can trigger the problem with a
much smaller input than the one shown above. The value here (3000) is
enough to trigger the issue on my x86_64 Linux machine.
Reported-by: Ben Stav <benstav@miggo.io>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Building "git contacts" script (in contrib/) left the resulting
file unexecutable, which has been corrected.
* dk/make-git-contacts-executable:
perl: also mark git-contacts executable
The build procedure based on meson learned to allow builders to
specify the directory to install HTML documents.
* dk/meson-html-dir:
meson: make GIT_HTML_PATH configurable
Build procedure for Wincred credential helper has been updated.
* tu/credential-wincred-makefile-update:
wincred: align Makefile with other Makefiles in contrib
Ever since 14f9e128 (Define the project whitespace policy,
2008-02-10) added the whitespace rules to .gitattributes, we spelled
the most general rule like so:
* whitespace=!indent,trail,space
in the top-level .gitattributes file. The intent of this line was
described in the commit log message:
- Unless otherwise specified, indent with SP that could be
replaced with HT are not "bad". But SP before HT in the
indent is "bad", and trailing whitespaces are "bad".
It clearly wanted to disable indent-with-non-tab, so !indent is most
likely a misspelt form of '-indent'. Because indent-with-non-tab
has never been enabled by default, by luck this was not causing any
ill effect.
We could either remove "!indent", or spell it "-indent". The
immediate effect would be the same. It would only start to make a
difference when/if we enable indent-with-non-tab by default in
future versions of Git.
Let's take the former option to remove "!indent" from the list. We
would feel the effect first-hand ourselves before anybody else if we
ever decide to change the built-in default whitespace rules, which
would be hidden from us if we decide to rewrite it to "-indent"
instead.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Detecting renames and copies improves diff's output. This effort is
wasted if we don't show any. Disable detection in that case.
This actually fixes the error code when using the options --cached,
--find-copies-harder, --no-ext-diff and --quiet together:
run_diff_index() indirectly calls diff-lib.c::show_modified(), which
queues even non-modified entries using diff_change() because we need
them for copy detection. diff_change() sets flags.has_changes, though,
which causes diff_can_quit_early() to declare we're done after seeing
only the very first entry -- way too soon.
Using --cached, --find-copies-harder and --quiet together without
--no-ext-diff was not affected even before, as it causes the flag
flags.diff_from_contents to be set, which disables the optimization
in a different way.
Reported-by: D. Ben Knoble <ben.knoble@gmail.com>
Suggested-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The 'git-maintenance(1)' command provides tooling to run maintenance
tasks over Git repositories. The 'run' subcommand, as the name suggests,
runs the maintenance tasks. When used with the '--auto' flag, it uses
heuristics to determine if the required thresholds are met for running
said maintenance tasks.
There is however a lack of insight into these heuristics. Meaning, the
checks are linked to the execution.
Add a new 'is-needed' subcommand to 'git-maintenance(1)' which allows
users to simply check if it is needed to run maintenance without
performing it.
This subcommand can check if it is needed to run maintenance without
actually running it. Ideally it should be used with the '--auto' flag,
which would allow users to check if the thresholds required are met. The
subcommand also supports the '--task' flag which can be used to check
specific maintenance tasks.
While adding the respective tests in 't/t7900-maintenance.sh', remove a
duplicate of the test: 'worktree-prune task with --auto honors
maintenance.worktree-prune.auto'.
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Acked-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The 'git-maintenance(1)' command supports an '--auto' flag. Usage of the
flag ensures to run maintenance tasks only if certain thresholds are
met. The heuristic is defined on a task level, wherein each task defines
an 'auto_condition', which states if the task should be run.
The 'pack-refs' task is hard-coded to return 1 as:
1. There was never a way to check if the reference backend needs to be
optimized without actually performing the optimization.
2. We can pass in the '--auto' flag to 'git-pack-refs(1)' which would
optimize based on heuristics.
The previous commit added a `refs_optimize_required()` function, which
can be used to check if a reference backend required optimization. Use
this within `pack_refs_condition()`.
This allows us to add a 'git maintenance is-needed' subcommand which can
notify the user if maintenance is needed without actually performing the
optimization. Without this change, the reference backend would always
state that optimization is needed.
Since we import 'revision.h', we need to remove the definition for
'SEEN' which is duplicated in the included header.
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Acked-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
To allow users of the refs namespace to check if the reference backend
requires optimization, add a new field `optimize_required` field to
`struct ref_storage_be`. This field is of type `optimize_required_fn`
which is also introduced in this commit.
Modify the debug, files, packed and reftable backend to implement this
field. A following commit will expose this via 'git pack-refs' and 'git
refs optimize'.
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Acked-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The reftable backend performs auto-compaction as part of its regular
flow, which is required to keep the number of tables part of a stack at
bay. This allows it to stay optimized.
Compaction can also be triggered voluntarily by the user via the 'git
pack-refs' or the 'git refs optimize' command. However, currently there
is no way for the user to check if optimization is required without
actually performing it.
Extract out the heuristics logic from 'reftable_stack_auto_compact()'
into an internal function 'update_segment_if_compaction_required()'.
Then use this to add and expose `reftable_stack_compaction_required()`
which will allow users to check if the reftable backend can be
optimized.
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Acked-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The `stack_table_sizes_for_compaction()` function returns individual
sizes of each reftable table. This function is only called by
`reftable_stack_auto_compact()` to decide which tables need to be
compacted, if any.
Modify the function to directly return the segments, which avoids the
extra step of receiving the sizes only to pass it to
`suggest_compaction_segment()`.
A future commit will also add functionality for checking whether
auto-compaction is necessary without performing it. This change allows
code re-usability in that context.
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Acked-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Refreshes the Irish translation for Git 2.52, including new strings and
consistency improvements. Verified with `git-po-helper check`.
Signed-off-by: Aindriú Mac Giolla Eoin <aindriu80@gmail.com>
A recently added configuration variable and command line option
syntax ":(optional)" for values that are of filename type
inconsistently behaved on an empty file (configuration took it
happily, while the command line option pretended as if it did not
exist), which has been corrected.
* dk/parseopt-optional-filename-fixes:
parseopt: remove unreachable code
parseopt: restore const qualifier to parsed filename
config: use boolean type for a simple flag
parseopt: use boolean type for a simple flag
doc: clarify command equivalence comment
parseopt: fix :(optional) at command line to only ignore missing files
Messages from fast-import/export are now marked for i18n.
* cc/fast-import-export-i18n-cleanup:
gpg-interface: mark a string for translation
fast-import: mark strings for translation
fast-export: mark strings for translation
gpg-interface: use left shift to define GPG_VERIFY_*
gpg-interface: simplify ssh fingerprint parsing
Our Bencher dashboards [1] have recently alerted us about a bunch of
performance regressions when writing references, specifically with the
reftable backend. There is a 3x regression when writing many refs with
preexisting refs in the reftable format, and a 10x regression when
migrating refs between backends in either of the formats.
Bisecting the issue lands us at 6ec4c0b45b (refs: don't store peeled
object IDs for invalid tags, 2025-10-23). The gist of the commit is that
we may end up storing peeled objects in both reftables and packed-refs
for corrupted tags, where the claimed tagged object type is different
than the actual tagged object type. This will then cause us to create
the `struct object *` with a wrong type, as well, and obviously nothing
good comes out of that.
The fix for this issue was to introduce a new flag to `peel_object()`
that causes us to verify the tagged object's type before writing it into
the refdb -- if the tag is corrupt, we skip writing the peeled value.
To verify whether the peeled value is correct we have to look up the
object type via the ODB and compare the actual type with the claimed
type, and that additional object lookup is costly.
This also explains why we see the regression only when writing refs with
the reftable backend, but we see the regression with both backends when
migrating refs:
- The reftable backend knows to store peeled values in the new table
immediately, so it has to try and peel each ref it's about to write
to the transaction. So the performance regression is visible for all
writes.
- The files backend only stores peeled values when writing the
packed-refs file, so it wouldn't hit the performance regression for
normal writes. But on ref migrations we know to write all new values
into the packed-refs file immediately, and that's why we see the
regression for both backends there.
Taking a step back though reveals an oddity in the new verification
logic: we not only verify the _tagged_ object's type, but we also verify
the type of the tag itself. But this isn't really needed, as we wouldn't
hit the bug in such a case anyway, as we only hit the issue with corrupt
tags claiming an invalid type for the tagged object.
The consequence of this is that we now started to look up the target
object of every single reference we're about to write, regardless of
whether it even is a tag or not. And that is of course quite costly.
Fix the issue by only verifying the type of the tagged objects. This
means that we of course still have a performance hit for actual tags.
But this only happens for writes anyway, and I'd claim it's preferable
to not store corrupted data in the refdb than to be fast here. Rename
the flag accordingly to clarify that we only verify the tagged object's
type.
This fix brings performance back to previous levels:
Benchmark 1: baseline
Time (mean ± σ): 46.0 ms ± 0.4 ms [User: 40.0 ms, System: 5.7 ms]
Range (min … max): 45.0 ms … 47.1 ms 54 runs
Benchmark 2: regression
Time (mean ± σ): 140.2 ms ± 1.3 ms [User: 77.5 ms, System: 60.5 ms]
Range (min … max): 138.0 ms … 142.7 ms 20 runs
Benchmark 3: fix
Time (mean ± σ): 46.2 ms ± 0.4 ms [User: 40.2 ms, System: 5.7 ms]
Range (min … max): 45.0 ms … 47.3 ms 55 runs
Summary
update-ref: baseline
1.00 ± 0.01 times faster than fix
3.05 ± 0.04 times faster than regression
[1]: https://bencher.dev/perf/git/plots
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* ps/ref-peeled-tags:
t7004: do not chdir around in the main process
ref-filter: fix stale parsed objects
ref-filter: parse objects on demand
ref-filter: detect broken tags when dereferencing them
refs: don't store peeled object IDs for invalid tags
object: add flag to `peel_object()` to verify object type
refs: drop infrastructure to peel via iterators
refs: drop `current_ref_iter` hack
builtin/show-ref: convert to use `reference_get_peeled_oid()`
ref-filter: propagate peeled object ID
upload-pack: convert to use `reference_get_peeled_oid()`
refs: expose peeled object ID via the iterator
refs: refactor reference status flags
refs: fully reset `struct ref_iterator::ref` on iteration
refs: introduce `.ref` field for the base iterator
refs: introduce wrapper struct for `each_ref_fn`
If a file is renamed between commits and an external diff is started
through gitk on the original or the renamed file name,
gitk is unable to open the renamed file in the external diff editor.
It fails to fetch the renamed file from git, because it fetches it
using its original path in contrast to using the renamed path of the
file.
Detect the rename and open the external diff with the original and
the renamed file instead of no file (fetch the renamed file path and
name from git) no matter if the original or the renamed file is
selected in gitk.
Signed-off-by: Tobias Boesch <tobias.boesch@miele.com>
Signed-off-by: Johannes Sixt <j6t@kdbg.org>
Makefile-based builds can configure Git's internal HTML_PATH by defining
htmldir, which is useful for packagers that put documentation in
different locations. Gentoo, for example, uses version-suffixed
directories like ${prefix}/share/doc/git-2.51 and puts the HTML
documentation in an 'html' subdirectory of the same.
Propagate the same configuration knob to Meson-based builds so that
"git --html-path" on such systems can be configured to output the
correct directory.
Signed-off-by: D. Ben Knoble <ben.knoble+github@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When installing git-contacts with Meson via -Dcontrib=contacts, the default
Perl generation fails to mark it executable. As a result, "git contacts"
reports "'contacts' is not a git command."
Unlike generate-script.sh, we aren't testing the basename here; so, glob
the script name in the case arm to match wherever the input comes from.
Signed-off-by: D. Ben Knoble <ben.knoble+github@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* Replace $(LOADLIBES) because it is deprecated since long and it is
used nowhere else in the git project.
* Use $(gitexecdir) instead of $(libexecdir) because config.mak defines
$(libexecdir) as $(prefix)/libexec, not as $(prefix)/libexec/git-core.
* Similar to other Makefiles, let install target rule create
$(gitexecdir) to make sure the directory exists before copying the
executable and also let it respect $(DESTDIR).
* Shuffle the lines for the default settings to align them with the
other Makefiles in contrib/credential.
* Define .PHONY for all special targets (all, install, clean).
Signed-off-by: Thomas Uhle <thomas.uhle@mailbox.tu-dresden.de>
Acked-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Update the documentation to clearly describe how the server responds when a
client sends an invalid or malformed `want` line during the HTTP protocol
exchange. The server includes the offending object name in its error message.
Signed-off-by: Queen Ediri Jessa <qjessa662@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When a file is selected in the file list, the diff window scrolls to the
corresponding section. The administrative data needed for this purpose
is extracted from the 'rename from', 'rename to', and 'copy to' lines.
Escaped file names are unescaped for this purpose. However, the lines
shown in the diff window are left in the escaped form. This is not very
pleasing. Replace the escaped form by the unescaped form.
Add a section to treat the 'copy from' case.
Signed-off-by: Johannes Sixt <j6t@kdbg.org>
When 5de460a2cfdd (gitk: Refactor per-line part of getblobdiffline and
its support) moved the body of a loop into a separate function, several
'continue' statements were changed to 'return'. But one instance was
missed. Fix it now.
Signed-off-by: Johannes Sixt <j6t@kdbg.org>
The version of macos image used in GitHub CI has been updated to
macos-14, as the macos-13 that we have been using got deprecated.
* jc/ci-use-macos-14:
GitHub CI: macos-13 images are no more
The help text and manual page of "git bisect" command have been
made consistent with each other.
* rz/t0450-bisect-doc-update:
bisect: update usage and docs to match each other
Add a configuration variable to control the default behavior of git replay
for updating references. This allows users who prefer the traditional
pipeline output to set it once in their config instead of passing
--ref-action=print with every command.
The config variable uses string values that mirror the behavior modes:
* replay.refAction = update (default): atomic ref updates
* replay.refAction = print: output commands for pipeline
Helped-by: Junio C Hamano <gitster@pobox.com>
Helped-by: Elijah Newren <newren@gmail.com>
Helped-by: Christian Couder <christian.couder@gmail.com>
Helped-by: Phillip Wood <phillip.wood123@gmail.com>
Signed-off-by: Siddharth Asthana <siddharthasthana31@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The git replay command currently outputs update commands that can be
piped to update-ref to achieve a rebase, e.g.
git replay --onto main topic1..topic2 | git update-ref --stdin
This separation had advantages for three special cases:
* it made testing easy (when state isn't modified from one step to
the next, you don't need to make temporary branches or have undo
commands, or try to track the changes)
* it provided a natural can-it-rebase-cleanly (and what would it
rebase to) capability without automatically updating refs, similar
to a --dry-run
* it provided a natural low-level tool for the suite of hash-object,
mktree, commit-tree, mktag, merge-tree, and update-ref, allowing
users to have another building block for experimentation and making
new tools
However, it should be noted that all three of these are somewhat
special cases; users, whether on the client or server side, would
almost certainly find it more ergonomic to simply have the updating
of refs be the default.
For server-side operations in particular, the pipeline architecture
creates process coordination overhead. Server implementations that need
to perform rebases atomically must maintain additional code to:
1. Spawn and manage a pipeline between git-replay and git-update-ref
2. Coordinate stdout/stderr streams across the pipe boundary
3. Handle partial failure states if the pipeline breaks mid-execution
4. Parse and validate the update-ref command output
Change the default behavior to update refs directly, and atomically (at
least to the extent supported by the refs backend in use). This
eliminates the process coordination overhead for the common case.
For users needing the traditional pipeline workflow, add a new
--ref-action=<mode> option that preserves the original behavior:
git replay --ref-action=print --onto main topic1..topic2 | git update-ref --stdin
The mode can be:
* update (default): Update refs directly using an atomic transaction
* print: Output update-ref commands for pipeline use
Test suite changes:
All existing tests that expected command output now use
--ref-action=print to preserve their original behavior. This keeps
the tests valid while allowing them to verify that the pipeline workflow
still works correctly.
New tests were added to verify:
- Default atomic behavior (no output, refs updated directly)
- Bare repository support (server-side use case)
- Equivalence between traditional pipeline and atomic updates
- Real atomicity using a lock file to verify all-or-nothing guarantee
- Test isolation using test_when_finished to clean up state
- Reflog messages include replay mode and target
A following commit will add a replay.refAction configuration
option for users who prefer the traditional pipeline output as their
default behavior.
Helped-by: Elijah Newren <newren@gmail.com>
Helped-by: Patrick Steinhardt <ps@pks.im>
Helped-by: Christian Couder <christian.couder@gmail.com>
Helped-by: Phillip Wood <phillip.wood123@gmail.com>
Signed-off-by: Siddharth Asthana <siddharthasthana31@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In preparation for adding the --ref-action option, convert option
validation to use die_for_incompatible_opt2(). This helper provides
standardized error messages for mutually exclusive options.
The following commit introduces --ref-action which will be incompatible
with certain other options. Using die_for_incompatible_opt2() now means
that commit can cleanly add its validation using the same pattern,
keeping the validation logic consistent and maintainable.
This also aligns git-replay's option handling with how other Git commands
manage option conflicts, using the established die_for_incompatible_opt*()
helper family.
Signed-off-by: Siddharth Asthana <siddharthasthana31@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
As this image was deprecated on Sep 22nd, and will be dropped on Dec
4th, replace these jobs to use macos-14 images instead.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
At this point in the code after running skip_prefix() on the
variable and receiving the result in the same variable, the contents
of the variable can never be NULL. The function either (1) updates
the variable to point at a later part of the string it originally
pointed at, or (2) leaves it intact if the string does not have the
prefix. (1) will never make the variable NULL, and (2) cannot be
the source of NULL, because the variable cannot be NULL before
calling skip_prefix(), which would die immediately by dereferencing
the NULL pointer in that case.
Helped-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This was unintentionally dropped in ccfcaf399f (parseopt: values of
pathname type can be prefixed with :(optional), 2025-09-28). Notably,
continue dropping the const qualifier when free'ing value; see
4049b9cfc0 (fix const issues with some functions, 2007-10-16) or
83838d5c1b (cast variable in call to free() in builtin/diff.c and
submodule.c, 2011-11-06) for more details on why.
Suggested-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Signed-off-by: D. Ben Knoble <ben.knoble+github@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Suggested-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Signed-off-by: D. Ben Knoble <ben.knoble+github@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Suggested-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Signed-off-by: D. Ben Knoble <ben.knoble+github@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Documentation of command parsing for :(optional) includes a terse
comment; expand it to be clearer to readers.
Suggested-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Signed-off-by: D. Ben Knoble <ben.knoble+github@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Unlike the configuration option magic, the parseopt code also ignores
empty files: compare implementations from ccfcaf399f (parseopt: values
of pathname type can be prefixed with :(optional), 2025-09-28) and
749d6d166d (config: values of pathname type can be prefixed with
:(optional), 2025-09-28).
Unify the 2 by not ignoring empty files, which is less surprising and
the intended semantics from the first patch for config.
Suggested-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Signed-off-by: D. Ben Knoble <ben.knoble+github@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The patterns used in the .gitignore files use backslash in the way
documented for fnmatch(3); document as such to reduce confusion.
* jk/doc-backslash-in-exclude:
doc: document backslash in gitignore patterns
The "debug" ref-backend was missing a method implementation, which
has been corrected.
* xr/ref-debug-remove-on-disk:
refs: add missing remove_on_disk implementation for debug backend
The "MyFirstContribution" tutorial tells the reader how to send out
their patches; the section gained a hint to verify the message
reached the mailing list.
* qj/doc-my1stcontrib-email-verify:
MyFirstContribution: add note on confirming patches
Tests did not set up GNUPGHOME correctly, which is fixed but some
flaky tests are exposed in t1016, which needs to be addressed
before this topic can move forward.
* tz/test-prepare-gnupghome:
t/lib-gpg: call prepare_gnupghome() in GPG2 prereq
t/lib-gpg: add prepare_gnupghome() to create GNUPGHOME dir
Contributed credential helpers (obviously in contrib/) now have "cd
$there && make install" target.
* tu/credential-install:
contrib/credential: add install target
* kn/refs-optim-cleanup:
t/pack-refs-tests: move the 'test_done' to callees
refs: rename 'pack_refs_opts' to 'refs_optimize_opts'
refs: move to using the '.optimize' functions
* ps/ref-peeled-tags: (23 commits)
t7004: do not chdir around in the main process
ref-filter: fix stale parsed objects
ref-filter: parse objects on demand
ref-filter: detect broken tags when dereferencing them
refs: don't store peeled object IDs for invalid tags
object: add flag to `peel_object()` to verify object type
refs: drop infrastructure to peel via iterators
refs: drop `current_ref_iter` hack
builtin/show-ref: convert to use `reference_get_peeled_oid()`
ref-filter: propagate peeled object ID
upload-pack: convert to use `reference_get_peeled_oid()`
refs: expose peeled object ID via the iterator
refs: refactor reference status flags
refs: fully reset `struct ref_iterator::ref` on iteration
refs: introduce `.ref` field for the base iterator
refs: introduce wrapper struct for `each_ref_fn`
builtin/repo: add progress meter for structure stats
builtin/repo: add keyvalue and nul format for structure stats
builtin/repo: add object counts in structure output
builtin/repo: introduce structure subcommand
...
In ac0bad0af4 (t0601: refactor tests to be shareable, 2025-09-19), we
refactored 't/t0601-reffiles-pack-refs.sh' to move all of the tests to
't/pack-refs-tests.sh', which became a common test suite which was also
used by 't/t1463-refs-optimize.sh'.
This also moved the 'test_done' directive to 't/pack-refs-tests.sh'.
Which inhibits additional tests from being added to either of the tests.
Let's move the directive out to both the tests, so that we can add
additional specific tests to them. Also the test flow logic shouldn't be
part of tests which can be embedded in other test scripts.
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The previous commit removed all references to 'pack_refs()' within
the refs subsystem. Continue this cleanup by also renaming
'pack_refs_opts' to 'refs_optimize_opts' and the respective flags
accordingly. Keeping the naming consistent will make the code easier to
maintain.
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The `struct ref_store` variable exposes two ways to optimize a reftable
backend:
1. pack_refs
2. optimize
The former was specific to the 'files' + 'packed' refs backend. The
latter is more generic and covers all backends. While the naming is
different, both of these functions perform the same functionality.
Consolidate this code to only maintain the 'optimize' functions. Do this
by modifying the backends so that they exclusively implement the
`optimize` callback, only. All users of the refs subsystem already use
the 'optimize' function so there is no changes needed on the callee
side. Finally, cleanup all references to the 'pack_refs' field of the
structure and code around it.
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* ps/ref-peeled-tags: (92 commits)
t7004: do not chdir around in the main process
ref-filter: fix stale parsed objects
ref-filter: parse objects on demand
ref-filter: detect broken tags when dereferencing them
refs: don't store peeled object IDs for invalid tags
object: add flag to `peel_object()` to verify object type
refs: drop infrastructure to peel via iterators
refs: drop `current_ref_iter` hack
builtin/show-ref: convert to use `reference_get_peeled_oid()`
ref-filter: propagate peeled object ID
upload-pack: convert to use `reference_get_peeled_oid()`
refs: expose peeled object ID via the iterator
refs: refactor reference status flags
refs: fully reset `struct ref_iterator::ref` on iteration
refs: introduce `.ref` field for the base iterator
refs: introduce wrapper struct for `each_ref_fn`
builtin/repo: add progress meter for structure stats
builtin/repo: add keyvalue and nul format for structure stats
builtin/repo: add object counts in structure output
builtin/repo: introduce structure subcommand
...
Move down to no-contains subdirectory inside a subshell, just like
the previous step that created and used it does.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In 054f5f457e (ref-filter: parse objects on demand, 2025-10-23) we have
started to skip parsing some objects in case we don't need to access
their values in the first place. This was done by introducing a new
member `struct expand_data::maybe_object` that gets populated on demand
via `get_or_parse_object()`.
This has led to a regression though where the object now gets reused
because we don't reset it properly. The `oi` structure is declared in
global scope, and there is no single place where we reset it before
invoking `get_object()`. The consequence is that the `maybe_object`
member doesn't get reset across calls, so subsequent calls will end up
reusing the same object.
This is only an issue for a subset of retrieved values, as not all of
the infrastructure ends up calling `get_or_parse_object()`. So the
effect is limited, which is probably why the issue wasn't detected
earlier.
Fix the issue by resetting `maybe_object` in `get_object()`.
Reported-by: Junio C Hamano <gitster@pobox.com>
Based-on-patch-by: Jeff King <peff@peff.net>
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When formatting an arbitrary object we parse that object regardless of
whether or not we actually need any parsed data. In fact, many of the
atoms we have don't require any.
Refactor the code so that we parse the data on demand when we see an
atom that wants to access the objects. This leads to a small speedup,
for example in the Chromium repository with around 40000 refs:
Benchmark 1: for-each-ref --format='%(raw)' (HEAD~)
Time (mean ± σ): 388.7 ms ± 1.1 ms [User: 322.2 ms, System: 65.0 ms]
Range (min … max): 387.3 ms … 390.8 ms 10 runs
Benchmark 2: for-each-ref --format='%(raw)' (HEAD)
Time (mean ± σ): 344.7 ms ± 0.7 ms [User: 287.8 ms, System: 55.1 ms]
Range (min … max): 343.9 ms … 345.7 ms 10 runs
Summary
for-each-ref --format='%(raw)' (HEAD) ran
1.13 ± 0.00 times faster than for-each-ref --format='%(raw)' (HEAD~)
With this change, we now spend ~90% of the time decompressing objects,
which is almost as good as it gets regarding git-for-each-ref(1)'s own
infrastructure.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Users can ask git-for-each-ref(1) to peel tags and return information of
the tagged object by adding an asterisk to the format, like for example
"%(*$objectname)". If so, git-for-each-ref(1) peels that object to the
first non-tag object and then returns its values.
As mentioned in preceding commits, it can happen that the tagged object
type and the claimed object type differ, effectively resulting in a
corrupt tag. git-for-each-ref(1) would notice this mismatch, print an
error and then bail out when trying to peel the tag.
But we only notice this corruption in some very specific edge cases!
While we have a test in "t/for-each-ref-tests.sh" that verifies the
above scenario, this test is specifically crafted to detect the issue at
hand. Namely, we create two tags:
- One tag points to a specific object with the correct type.
- The other tag points to the *same* object with a different type.
The fact that both tags point to the same object is important here:
`peel_object()` wouldn't notice the corruption if the tagged objects
were different.
The root cause is that `peel_object()` calls `lookup_${type}()`
eventually, where the type is the same type declared in the tag object.
Consequently, when we have two tags pointing to the same object but with
different declared types we'll call two different lookup functions. The
first lookup will store the object with an unverified type A, whereas
the second lookup will try to look up the object with a different
unverified type B. And it is only now that we notice the discrepancy in
object types, even though type A could've already been the wrong type.
Fix the issue by verifying the object type in `populate_value()`. With
this change we'll also notice type mismatches when only dereferencing a
tag once.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Both the "files" and "reftable" backend store peeled object IDs for
references that point to tags:
- The "files" backend stores the value when packing refs, where each
peeled object ID is prefixed with "^".
- The "reftable" backend stores the value whenever writing a new
reference that points to a tag via a special ref record type.
Both of these backends use `peel_object()` to find the peeled object ID.
But as explained in the preceding commit, that function does not detect
the case where the tag's tagged object and its claimed type mismatch.
The consequence of storing these bogus peeled object IDs is that we're
less likely to detect such corruption in other parts of Git.
git-for-each-ref(1) for example does not notice anymore that the tag is
broken when using "--format=%(*objectname)" to dereference tags.
One could claim that this is good, because it still allows us to mostly
use the tag as intended. But the biggest problem here is that we now
have different behaviour for such a broken tag depending on whether or
not we have its peeled value in the refdb.
Fix the issue by verifying the object type when peeling the object. If
that verification fails we simply skip storing the peeled value in
either of the reference formats.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When peeling a tag to a non-tag object we repeatedly call
`parse_object()` on the tagged object until we find the first object
that isn't a tag. While this feels sensible at first, there is a big
catch here: `parse_object()` doesn't actually verify the type of the
tagged object.
The relevant code path here eventually ends up in `parse_tag_buffer()`.
Here, we parse the various fields of the tag, including the "type". Once
we've figured out the type and the tagged object ID, we call one of the
`lookup_${type}()` functions for whatever type we have found. There is
two possible outcomes in the successful case:
1. The object is already part of our cached objects. In that case we
double-check whether the type we're trying to look up matches the
type that was cached.
2. The object is _not_ part of our cached objects. In that case, we
simply create a new object with the expected type, but we don't
parse that object.
In the first case we might notice type mismatches, but only in the case
where our cache has the object with the correct type. In the second
case, we'll blindly assume that the type is correct and then go with it.
We'll only notice that the type might be wrong when we try to parse the
object at a later point.
Now arguably, we could change `parse_tag_buffer()` to verify the tagged
object's type for us. But that would have the effect that such a tag
cannot be parsed at all anymore, and we have a small bunch of tests for
exactly this case that assert we still can open such tags. So this
change does not feel like something we can retroactively tighten, even
though one shouldn't ever hit such corrupted tags.
Instead, add a new `flags` field to `peel_object()` that allows the
caller to opt in to strict object verification. This will be wired up at
a subset of callsites over the next few commits.
Note that this change also inlines `deref_tag_noverify()`. There's only
been two callsites of that function, the one we're changing and one in
our test helpers. The latter callsite can trivially use `deref_tag()`
instead, so by inlining the function we avoid having to pass down the
flag.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Now that the peeled object ID gets propagated via the `struct reference`
there is no need anymore to call into the reference iterator itself to
dereference an object. Remove this infrastructure.
Most of the changes are straight-forward deletions of code. There is one
exception though in `refs/packed-backend.c::write_with_updates()`. Here
we stop peeling the iterator and instead just pass the peeled object ID
of that iterator directly.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In preceding commits we have refactored all callers of
`peel_iterated_oid()` to instead use `reference_get_peeled_oid()`. This
allows us to thus get rid of the former function.
Getting rid of that function is nice, but even nicer is that this also
allows us to get rid of the `current_ref_iter` hack. This global
variable tracked the currently-active ref iterator so that we can use it
to peel an object ID. Now that the peeled object ID is propagated via
`struct reference` though we don't have to depend on this hack anymore,
which makes for a more robust and easier-to-understand infrastructure.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The git-show-ref(1) command has multiple different modes:
- It knows to show all references matching a pattern.
- It knows to list all references that are an exact match to whatever
the user has provided.
- It knows to check for reference existence.
The first two commands use mostly the same infrastructure to print the
references via `show_one()`. But while the former mode uses a proper
iterator and thus has a `struct reference` available in its context, the
latter calls `refs_read_ref()` and thus doesn't. Consequently, we cannot
easily use `reference_get_peeled_oid()` to print the peeled value.
Adapt the code so that we manually construct a `struct reference` when
verifying refs. We wouldn't ever have the peeled value available anyway
as we're not using an iterator here, so we can simply plug in the values
we _do_ have.
With this change we now have a `struct reference` available at both
callsites of `show_one()` and can thus pass it, which allows us to use
`reference_get_peeled_oid()` instead of `peel_iterated_oid()`.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When queueing a reference in the "ref-filter" subsystem we end up
creating a new ref array item that contains the reference's info. One
bit of info that we always discard though is the peeled object ID, and
because of that we are forced to use `peel_iterated_oid()`.
Refactor the code to propagate the peeled object ID via the ref array,
if available. This allows us to manually peel tags without having to go
through the object database.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The `write_v0_ref()` callback is invoked from two callsites:
- Once via `send_ref()` which is a callback passed to
`for_each_namespaced_ref_1()` and `refs_head_ref_namespaced()`.
- Once manually to announce capabilities.
When sending references to the client we also send the peeled value of
tags. As we don't have a `struct reference` available in the second
case, we cannot easily peel by calling `reference_get_peeled_oid()`, but
we instead have to depend on on global state via `peel_iterated_oid()`.
We do have a reference available though in the first case, it's only the
second case that keeps us from using `reference_get_peeled_oid()`. But
that second case only announces capabilities anyway, so we're not really
handling a reference at all here.
Adapt that case to construct a reference manually and pass that to
`write_v0_ref()`. Start to use `reference_get_peeled_oid()` now that we
always have a `struct reference` available.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Both the "files" and "reftable" backend are able to store peeled values
for tags in the respective formats. This allows for a more efficient
lookup of the target object of such a tag without having to manually
peel via the object database.
The infrastructure to access these peeled object IDs is somewhat funky
though. When iterating through objects, we store a pointer reference to
the current iterator in a global variable. The callbacks invoked by that
iterator are then expected to call `peel_iterated_oid()`, which checks
whether the globally-stored iterator's current reference refers to the
one handed into that function. If so, we ask the iterator to peel the
object, otherwise we manually peel the object via the object database.
Depending on global state like this is somewhat weird and also quite
fragile.
Introduce a new `struct reference::peeled_oid` field that can be
populated by the reference backends. This field can be accessed via a
new function `reference_get_peeled_oid()` that either uses that value,
if set, or alternatively peels via the ODB. With this change we don't
have to rely on global state anymore, but make the peeled object ID
available to the callback functions directly.
Adjust trivial callers that already have a `struct reference` available.
Remaining callers will be adjusted in subsequent commits.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The reference flags encode information like whether or not a reference
is a symbolic reference or whether it may be broken. This information is
stored in a `int flags` bitfield, which is in conflict with our modern
best practices; we tend to use an unsigned integer to store flags.
Change the type of the field to be `unsigned`. While at it, refactor the
individual flags to be part of an `enum` instead of using preprocessor
defines.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
With the introduction of the `struct ref_iterator::ref` field it now is
a whole lot easier to introduce new fields that become accessible to the
caller without having to adapt every single callsite. But there's a
downside: when a new field is introduced we always have to adapt all
backends to set that field.
This isn't something we can avoid in the general case: when the new
field is expected to be populated by all backends we of course cannot
avoid doing so. But new fields may be entirely optional, in which case
we'd still have such churn. And furthermore, it is very easy right now
to leak state from a previous iteration into the next iteration.
Address this issue by ensuring that the reference backends all fully
reset the field on every single iteration. This ensures that no state
from previous iterations can leak into the next one. And it ensures that
any newly introduced fields will be zeroed out by default.
Note that we don't have to explicitly adapt the "files" backend, as it
uses the `cache_ref_iterator` internally. Furthermore, other "wrapping"
iterators like for example the `prefix_ref_iterator` copy around the
whole reference, so these don't need to be adapted either.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The base iterator has a couple of fields that tracks the name, target,
object ID and flags for the current reference. Due to this design we
have to create a new `struct reference` whenever we want to hand over
that reference to the callback function, which is tedious and not very
efficient.
Convert the structure to instead contain a `struct reference` as member.
This member is expected to be populated by the implementations of the
iterator and is handed over to the callback directly.
While at it, simplify `should_pack_ref()` to take a `struct reference`
directly instead of passing its respective fields.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The `each_ref_fn` callback function type is used across our code base
for several different functions that iterate through reference. There's
a bunch of callbacks implementing this type, which makes any changes to
the callback signature extremely noisy. An example of the required churn
is e8207717f1 (refs: add referent to each_ref_fn, 2024-08-09): adding a
single argument required us to change 48 files.
It was already proposed back then [1] that we might want to introduce a
wrapper structure to alleviate the pain going forward. While this of
course requires the same kind of global refactoring as just introducing
a new parameter, it at least allows us to more change the callback type
afterwards by just extending the wrapper structure.
One counterargument to this refactoring is that it makes the structure
more opaque. While it is obvious which callsites need to be fixed up
when we change the function type, it's not obvious anymore once we use
a structure. That being said, we only have a handful of sites that
actually need to populate this wrapper structure: our ref backends,
"refs/iterator.c" as well as very few sites that invoke the iterator
callback functions directly.
Introduce this wrapper structure so that we can adapt the iterator
interfaces more readily.
[1]: <ZmarVcF5JjsZx0dl@tanuki>
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We have two different ways to write an object into the database:
- We either provide the full buffer and write the object all at once.
- Or we provide an input stream that has a `read()` function so that
we can chunk the object.
The latter is especially used for large objects, where it may be too
expensive to hold the complete object in memory all at once.
While we already have `odb_write_object()` at the ODB-layer, we don't
have an equivalent for streaming an object. Introduce a new function
`odb_write_object_stream()` to address this gap so that callers don't
have to be aware of the inner workings of how to stream an object to
disk with a specific object source.
Rename `stream_loose_object()` to `odb_source_loose_write_stream()` to
clarify its scope. This matches our modern best practices around how to
name functions.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Rename `write_object_file()` to `odb_source_loose_write_object()` so
that it becomes clear that this is tied to a specific loose object
source. This matches our modern naming schema for functions.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When writing an object that already exists in our object database we
skip the write and instead only update mtimes of the object, either in
its packed or loose object format. This logic is wholly contained in
"object-file.c", but that file is really only concerned with loose
objects. So it does not really make sense that it also contains the
logic to freshen a packed object.
Introduce a new `odb_freshen_object()` function that sits on the object
database level and two functions `packfile_store_freshen_object()` and
`odb_source_loose_freshen_object()`. Like this, the format-specific
functions can be part of their respective subsystems, while the backend
agnostic function to freshen an object sits at the object database
layer.
Note that this change also moves the logic that iterates through object
sources from the object source layer into the object database layer.
This change is intentional: object sources should ideally only have to
worry about themselves, and coordination of different sources should be
handled on the object database level.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Rename `has_loose_object()` to `odb_source_loose_has_object()` so that
it becomes clear that this is tied to a specific loose object source.
This matches our modern naming schema for functions.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When reading an object via `loose_object_info()` or `map_loose_object()`
we hand in the whole repository. We then iterate through each of the
object sources to figure out whether that source has the object in
question.
This logic is reversing responsibility though: a specific backend should
only care about one specific source, where the object sources themselves
are then managed by the object database.
Refactor the code accordingly by passing an object source to both of
these functions instead. The different sources are then handled by
either `do_oid_object_info_extended()`, which sits on the object
database level, and by `open_istream_loose()`. The latter function
arguably is still at the wrong level, but this will be cleaned up at a
later point in time.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The loose object map is used to map from the repository's canonical
object hash to the compatibility hash. As the name indicates, this map
is only used for loose objects, and as such it is tied to a specific
loose object source.
Same as with preceding commits, move this map into the loose object
source accordingly.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
There are two different situations where we have to clear the cache of
loose objects:
- When freeing the loose object source itself to avoid memory leaks.
- When repreparing the loose object source so that any potentially-
stale data is getting evicted from the cache.
The former is already handled by `odb_source_loose_free()`. But the
latter case is still done manually by in `odb_reprepare()`, so we are
leaking internals into that code.
Introduce a new `odb_source_loose_reprepare()` function as an equivalent
to `packfile_store_prepare()` to hide these implementation details.
Furthermore, while at it, rename the function `odb_clear_loose_cache()`
to `odb_source_loose_clear()`.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Our loose objects use a cache that (optionally) stores all objects for
each of the opened sharding directories. This cache is located in the
`struct odb_source`, but now that we have `struct odb_source_loose` it
makes sense to move it into the latter structure so that all state that
relates to loose objects is entirely self-contained.
Do so. While at it, rename corresponding functions to have a prefix that
relates to `struct odb_source_loose`.
Note that despite this prefix, the functions still accept a `struct
odb_source` as input. This is done intentionally: once we introduce
pluggable object databases, we will continue to accept this struct but
then do a cast inside these functions to `struct odb_source_loose`. This
design is similar to how we do it for our ref backends.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Currently, all state that relates to loose objects is held directly by
the `struct odb_source`. Introduce a new `struct odb_source_loose` to
hold the state instead so that it is entirely self-contained.
This structure will eventually morph into the backend for accessing
loose objects. As such, this is part of the refactorings to introduce
pluggable object databases.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The `fetch_if_missing` global variable is declared in "object-file.h"
but defined in "odb.c". The variable relates to the whole object
database instead of only loose objects, so move the declaration into
"odb.h" accordingly.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The functions `free_object_directory()` and `free_object_directories()`
are responsible for freeing a single object source or all object sources
connected to an object database, respectively. The associated structure
has been renamed from `struct object_directory` to `struct odb_source`
in a1e2581a1e (object-store: rename `object_directory` to `odb_source`,
2025-07-01) though, so the names are somewhat stale nowadays.
Rename them to mention the new struct name instead. Furthermore, while
at it, adapt them to our modern naming schema where we first have the
subject followed by a verb.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We have three different locations where we create a new ODB source.
Deduplicate the logic via a new `odb_source_new()` function.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When adding an alternate to the object database we first check whether
or not the path is usable. A path is usable if:
- It actually exists.
- We don't have it in our object sources yet.
While the former check is trivial enough, the latter part is somewhat
subtle and prone for bugs. This is because the function doesn't only
check whether or not the given path is usable. But if it _is_ usable, we
also store that path in the map of object sources immediately.
The tricky part here is that the path that gets stored in the map is
_not_ copied. Instead, we rely on the fact that subsequent code uses
`strbuf_detach()` to store the exact same allocated memory in the
created object source. Consequently, the memory is owned by the source
but _also_ stored in the map. This subtlety is easy to miss, so if one
decides to refactor this code one can easily end up breaking this
mechanism.
Make the relationship more explicit by not storing the path as part of
`alt_odb_usable()`. Instead, store the path after we have created the
source so that we can use the source's path pointer directly.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The current implementation of git-last-modified(1) works by doing a
revision walk, and inspecting the diff at each level of that walk to
annotate entries remaining in the hashmap of paths. In other words, if
the diff at some level touches a path which has not yet been associated
with a commit, then that commit becomes associated with the path.
While a perfectly reasonable implementation, it can perform poorly in
either one of two scenarios:
1. There are many entries of interest, in which case there is simply
a lot of work to do.
2. Or, there are (even a few) entries which have not been updated in a
long time, and so we must walk through a lot of history in order to
find a commit that touches that path.
This patch rewrites the last-modified implementation that addresses the
second point. The idea behind the algorithm is to propagate a set of
'active' paths (a path is 'active' if it does not yet belong to a
commit) up to parents and do a truncated revision walk.
The walk is truncated because it does not produce a revision for every
change in the original pathspec, but rather only for active paths.
More specifically, consider a priority queue of commits sorted by
generation number. First, enqueue the set of boundary commits with all
paths in the original spec marked as interesting.
Then, while the queue is not empty, do the following:
1. Pop an element, say, 'c', off of the queue, making sure that 'c'
isn't reachable by anything in the '--not' set.
2. For each parent 'p' (with index 'parent_i') of 'c', do the
following:
a. Compute the diff between 'c' and 'p'.
b. Pass any active paths that are TREESAME from 'c' to 'p'.
c. If 'p' has any active paths, push it onto the queue.
3. Any path that remains active on 'c' is associated to that commit.
This ends up being equivalent to doing something like 'git log -1 --
$path' for each path simultaneously. But, it allows us to go much faster
than the original implementation by limiting the number of diffs we
compute, since we can avoid parts of history that would have been
considered by the revision walk in the original implementation, but are
known to be uninteresting to us because we have already marked all paths
in that area to be inactive.
To avoid computing many first-parent diffs, add another trick on top of
this and check if all paths active in 'c' are DEFINITELY NOT in c's
Bloom filter. Since the commit-graph only stores first-parent diffs in
the Bloom filters, we can only apply this trick to first-parent diffs.
Comparing the performance of this new algorithm shows about a 2.5x
improvement on git.git:
Benchmark 1: master no bloom
Time (mean ± σ): 2.868 s ± 0.023 s [User: 2.811 s, System: 0.051 s]
Range (min … max): 2.847 s … 2.926 s 10 runs
Benchmark 2: master with bloom
Time (mean ± σ): 949.9 ms ± 15.2 ms [User: 907.6 ms, System: 39.5 ms]
Range (min … max): 933.3 ms … 971.2 ms 10 runs
Benchmark 3: HEAD no bloom
Time (mean ± σ): 782.0 ms ± 6.3 ms [User: 740.7 ms, System: 39.2 ms]
Range (min … max): 776.4 ms … 798.2 ms 10 runs
Benchmark 4: HEAD with bloom
Time (mean ± σ): 307.1 ms ± 1.7 ms [User: 276.4 ms, System: 29.9 ms]
Range (min … max): 303.7 ms … 309.5 ms 10 runs
Summary
HEAD with bloom ran
2.55 ± 0.02 times faster than HEAD no bloom
3.09 ± 0.05 times faster than master with bloom
9.34 ± 0.09 times faster than master no bloom
In short, the existing implementation is comparably fast *with* Bloom
filters as the new implementation is *without* Bloom filters. So, most
repositories should get a dramatic speed-up by just deploying this (even
without computing Bloom filters), and all repositories should get faster
still when computing Bloom filters.
When comparing a more extreme example of
`git last-modified -- COPYING t`, the difference is even 5 times better:
Benchmark 1: master
Time (mean ± σ): 4.372 s ± 0.057 s [User: 4.286 s, System: 0.062 s]
Range (min … max): 4.308 s … 4.509 s 10 runs
Benchmark 2: HEAD
Time (mean ± σ): 826.3 ms ± 22.3 ms [User: 784.1 ms, System: 39.2 ms]
Range (min … max): 810.6 ms … 881.2 ms 10 runs
Summary
HEAD ran
5.29 ± 0.16 times faster than master
As an added benefit, results are more consistent now. For example
implementation in 'master' gives:
$ git log --max-count=1 --format=%H -- pkt-line.h
15df15fe07ef66b51302bb77e393f3c5502629de
$ git last-modified -- pkt-line.h
15df15fe07ef66b51302bb77e393f3c5502629de pkt-line.h
$ git last-modified | grep pkt-line.h
5b49c1af03e600c286f63d9d9c9fb01403230b9f pkt-line.h
With the changes in this patch the results of git-last-modified(1)
always match those of `git log --max-count=1`.
One thing to note though, the results might be outputted in a different
order than before. This is not considerd to be an issue because nowhere
is documented the order is guaranteed.
Based-on-patches-by: Derrick Stolee <stolee@gmail.com>
Based-on-patches-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Toon Claes <toon@iotcl.com>
Acked-by: Taylor Blau <me@ttaylorr.com>
[jc: tweaked use of xcalloc() to unbreak coccicheck]
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Finishing touches to fixes to the recent regression in "git diff -w
--quiet" and anything that needs to internally generate patch to
see if it turns empty.
* jk/diff-patch-dry-run-cleanup:
diff: simplify run_external_diff() quiet logic
diff: drop dry-run redirection to /dev/null
diff: replace diff_options.dry_run flag with NULL file
diff: drop save/restore of color_moved in dry-run mode
diff: send external diff output to diff_options.file
"git maintenance" command learns the "geometric" strategy where it
avoids doing maintenance tasks that rebuilds everything from
scratch.
* ps/maintenance-geometric:
t7900: fix a flaky test due to git-repack always regenerating MIDX
builtin/maintenance: introduce "geometric" strategy
builtin/maintenance: make "gc" strategy accessible
builtin/maintenance: extend "maintenance.strategy" to manual maintenance
builtin/maintenance: run maintenance tasks depending on type
builtin/maintenance: improve readability of strategies
builtin/maintenance: don't silently ignore invalid strategy
builtin/maintenance: make the geometric factor configurable
builtin/maintenance: introduce "geometric-repack" task
builtin/gc: make `too_many_loose_objects()` reusable without GC config
builtin/gc: remove global `repack` variable
The wildmatch code had a corner case bug that mistakenly makes
"foo**/bar" match with "foobar", which has been corrected.
* jk/match-pathname-fix:
match_pathname(): give fnmatch one char of prefix context
match_pathname(): reorder prefix-match check
The 'q'(uit) command in "git add -p" has been improved to quit
without doing any meaningless work before leaving, and giving EOF
(typically control-D) to the prompt is made to behave the same way.
* rs/add-patch-quit:
add-patch: quit on EOF
add-patch: quit without skipping undecided hunks
"git bisect" command did not react correctly to "git bisect help"
and "git bisect unknown", which has been corrected.
* rz/bisect-help-unknown:
bisect: fix handling of `help` and invalid subcommands
"git shortlog" knows "--committer" and "--author" options, which
the command line completion (in contrib/) did not handle well,
which has been corrected.
* kf/log-shortlog-completion-fix:
completion: complete some 'git log' options
Regression fixes for a topic that has already been merged.
* ly/diff-name-only-with-diff-from-content:
diff: stop output garbled message in dry run mode
Two slightly different ways to get at "all the packfiles" in API
has been cleaned up.
* ps/remove-packfile-store-get-packs:
packfile: rename `packfile_store_get_all_packs()`
packfile: introduce macro to iterate through packs
packfile: drop `packfile_store_get_packs()`
builtin/grep: simplify how we preload packs
builtin/gc: convert to use `packfile_store_get_all_packs()`
object-name: convert to use `packfile_store_get_all_packs()`
strbuf_split*() to split a string into multiple strbufs is often a
wrong API to use. A few uses of it have been removed by
simplifying the code.
* ob/gpg-interface-cleanup:
gpg-interface: do not use misdesigned strbuf_split*()
gpg-interface: do not use misdesigned strbuf_split*()
"Symlink symref" has been added to the list of things that will
disappear at Git 3.0 boundary.
* ps/symlink-symref-deprecation:
refs/files: deprecate writing symrefs as symbolic links
A new configuration variable commitGraph.changedPaths allows to
turn "--changed-paths" on by default for "git commit-graph".
* ey/commit-graph-changed-paths-config:
commit-graph: add new config for changed-paths & recommend it in scalar
We track packfiles via two different lists:
- `struct packfile_store::packs` is a list that sorts local packs
first. In addition, these packs are sorted so that younger packs are
sorted towards the front.
- `struct packfile_store::mru` is a list that sorts packs so that
most-recently used packs are at the front.
The reasoning behind the ordering in the `packs` list is that younger
objects stored in the local object store tend to be accessed more
frequently, and that is certainly true for some cases. But there are
going to be lots of cases where that isn't true. Especially when
traversing history it is likely that one needs to access many older
objects, and due to our housekeeping it is very likely that almost all
of those older objects will be contained in one large pack that is
oldest.
So whether or not the ordering makes sense really depends on the use
case at hand. A flexible approach like our MRU list addresses that need,
as it will sort packs towards the front that are accessed all the time.
Intuitively, this approach is thus able to satisfy more use cases more
efficiently.
This reasoning casts some doubt on whether or not it really makes sense
to track packs via two different lists. It causes confusion, and it is
not clear whether there are use cases where the `packs` list really is
such an obvious choice.
Merge these two lists into one most-recently-used list.
Note that there is one important edge case: `for_each_packed_object()`
uses the MRU list to iterate through packs, and then it lists each
object in those packs. This would have the effect that we now sort the
current pack towards the front, thus modifying the list of packfiles we
are iterating over, with the consequence that we'll see an infinite
loop. This edge case is worked around by introducing a new field that
allows us to skip updating the MRU.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When preparing the packfile store we know to also prepare the MRU list
of packfiles with all packs that are currently loaded in the store via
`packfile_store_prepare_mru()`. So we know that the list of packs in the
MRU list should match the list of packs in the non-MRU list.
But there are some direct or indirect callsites that add a packfile to
the store via `packfile_store_add_pack()` without adding the pack to the
MRU. And while functions that access the MRU (e.g. `find_pack_entry()`)
know to call `packfile_store_prepare()`, which knows to prepare the MRU
via `packfile_store_prepare_mru()`, that operation will be turned into a
no-op because the packfile store is already prepared. So this will not
cause us to add the packfile to the MRU, and consequently we won't be
able to find the packfile in our MRU list.
There are only a handful of callers outside of "packfile.c" that add a
packfile to the store:
- "builtin/fast-import.c" adds multiple packs of imported objects, but
it knows to look up objects via `packfile_store_get_packs()`. This
function does not use the MRU, so we're good.
- "builtin/index-pack.c" adds the indexed pack to the store in case it
needs to perform consistency checks on its objects.
- "http.c" adds the fetched pack to the store so that we can access
its objects.
In all of these cases we actually want to access the contained objects.
And luckily, reading these objects works as expected:
1. We eventually end up in `do_oid_object_info_extended()`.
2. Calling `find_pack_entry()` fails because the MRU list doesn't
contain the newly added packfile.
3. The callers don't pass `OBJECT_INFO_QUICK`, so we end up
repreparing the object database. This will also cause us to
reprepare the MRU list.
4. We now retry reading the object via `find_pack_entry()`, and now we
succeed because the MRU list got populated.
This logic feels quite fragile: we intentionally add the packfile to the
store, but we then ultimately rely on repreparing the entire store only
to make the packfile accessible. While we do the correct thing in
`do_oid_object_info_extended()`, other sites that access the MRU may not
know to reprepare.
But besides being fragile it's also a waste of resources: repreparing
the object database requires us to re-read the alternates file and
discard any caches.
Refactor the code so that we unconditionally add packfiles to the MRU
when adding them to a packfile store. This makes the logic less fragile
and ensures that we don't have to reprepare the store to make the pack
accessible.
Note that this does not allow us to drop `packfile_store_prepare_mru()`
just yet: while the MRU list is already populated with all packs now,
the order in which we add these packs is indeterministic for most of the
part. So by first calling `sort_pack()` on the other packfile list and
then re-preparing the MRU list we inherit its sorting.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Move the list of packs into the packfile store. This follows the same
logic as in a previous commit, where we moved the most-recently-used
list of packs, as well.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The function `has_sha1_pack_kept_or_nonlocal()` takes an object ID and
then searches through packed objects to figure out whether the object
exists in a kept or non-local pack. As a performance optimization we
remember the packfile that contains a given object ID so that the next
call to the function first checks that same packfile again.
The way this is written is rather hard to follow though, as the caching
mechanism is intertwined with the loop that iterates through the packs.
Consequently, we need to do some gymnastics to re-start the iteration if
the cached pack does not contain the objects.
Refactor this so that we check the cached packfile at the beginning. We
don't have to re-verify whether the packfile meets the properties as we
have already verified those when storing the pack in `last_found` in the
first place. So all we need to do is to use `find_pack_entry_one()` to
check whether the pack contains the object ID, and to skip the cached
pack in the loop so that we don't search it twice.
Furthermore, stop using the `(void *)1` sentinel value and instead use a
simple `NULL` pointer to indicate that we don't have a last-found pack
yet.
This refactoring significantly simplifies the logic and makes it much
easier to follow.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When approximating the number of objects in a repository we only take
into account two data sources, the multi-pack index and the packfile
indices, as both of these data structures allow us to easily figure out
how many objects they contain.
But the way we currently approximate the number of objects is broken in
presence of a multi-pack index. This is due to two separate reasons:
- We have recently introduced initial infrastructure for incremental
multi-pack indices. Starting with that series, `num_objects` only
counts the number of objects of a specific layer of the MIDX chain,
so we do not take into account objects from parent layers.
This issue is fixed by adding `num_objects_in_base`, which contains
the sum of all objects in previous layers.
- When using the multi-pack index we may count objects contained in
packfiles twice: once via the multi-pack index, but then we again
count them via the packfile itself.
This issue is fixed by skipping any packfiles that have an MIDX.
Overall, given that we _always_ count the packs, we can only end up
overestimating the number of objects, and the overestimation is limited
to a factor of two at most.
The consequences of those issues are very limited though, as we only
approximate object counts in a small number of cases:
- When writing a commit-graph we use the approximate object count to
display the upper limit of a progress display.
- In `repo_find_unique_abbrev_r()` we use it to specify a lower limit
of how many hex digits we want to abbreviate to. Given that we use
power-of-two here to derive the lower limit we may end up with an
abbreviated hash that is one digit longer than required.
- In `estimate_repack_memory()` we may end up overestimating how much
memory a repack needs to pack objects. Conseuqently, we may end up
dropping some packfiles from a repack.
None of these are really game-changing. But it's nice to fix those
issues regardless.
While at it, convert the code to use `repo_for_each_pack()`.
Furthermore, use `odb_prepare_alternates()` instead of explicitly
preparing the packfile store. We really only want to prepare the object
database sources, and `get_multi_pack_index()` already knows to prepare
the packfile store for us.
Helped-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The dumb HTTP protocol directly fetches packfiles from the remote server
and temporarily stores them in a list of packfiles. Those packfiles are
not yet added to the repository's packfile store until we finalize the
whole fetch.
Refactor the code to instead use a `struct packfile_list` to store those
packs. This prepares us for a subsequent change where the `->next`
pointer of `struct packed_git` will go away.
Note that this refactoring creates some temporary duplication of code,
as we now have both `packfile_list_find_oid()` and `find_oid_pack()`.
The latter function will be removed in a subsequent commit though.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Packfiles have two lists associated to them:
- A list that keeps track of packfiles in the order that they were
added to a packfile store.
- A list that keeps track of packfiles in most-recently-used order so
that packfiles that are more likely to contain a specific object are
ordered towards the front.
Both of these lists are hosted by `struct packed_git` itself, So to
identify all packfiles in a repository you simply need to grab the first
packfile and then iterate the `->next` pointers or the MRU list. This
pattern has the problem that all packfiles are part of the same list,
regardless of whether or not they belong to the same object source.
With the upcoming pluggable object database effort this needs to change:
packfiles should be contained by a single object source, and reading an
object from any such packfile should use that source to look up the
object. Consequently, we need to break up the global lists of packfiles
into per-object-source lists.
A first step towards this goal is to move those lists out of `struct
packed_git` and into the packfile store. While the packfile store is
currently sitting on the `struct object_database` level, the intent is
to push it down one level into the `struct odb_source` in a subsequent
patch series.
Introduce a new `struct packfile_list` that is used to manage lists of
packfiles and use it to store the list of most-recently-used packfiles
in `struct packfile_store`. For now, the new list type is only used in a
single spot, but we'll expand its usage in subsequent patches.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
To allow fast lookups of a packfile by name we use a hashmap that has
the packfile name as key and the pack itself as value. But while this is
the perfect use case for a `strmap`, we instead use `struct hashmap` and
store the hashmap entry in the packfile itself.
Simplify the code by using a `strmap` instead.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Previous commits have marked a number of error or warning messages in
"builtin/fast-export.c" and "builtin/fast-import.c" for translation.
As "gpg-interface.c" code is used by the fast-export and fast-import
code, we should make sure that error or warning messages are also all
marked for translation in "gpg-interface.c".
To ensure that, let's mark for translation an error message in a
die() function.
With this, all the error and warning messages emitted by fast-export
and fast-import can be properly translated.
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Some error or warning messages in "builtin/fast-import.c" are marked
for translation, but many are not.
To be more consistent and provide a better experience to people using a
translated version, let's mark all the remaining error or warning
messages for translation.
While at it, let's make the following small changes:
- replace "GIT" or "git" in a few error messages to just "Git",
- replace "Expected from command, got %s" to "expected 'from'
command, got '%s'", which makes it clearer that "from" is a command
and should not be translated,
- downcase error and warning messages that start with an uppercase,
- fix test cases in "t9300-fast-import.sh" that broke because an
error or warning message was downcased,
- split error and warning messages that are too long,
- adjust the indentation of some arguments of the error functions.
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Some error or warning messages in "builtin/fast-export.c" are marked
for translation, but many are not.
To be more consistent and provide a better experience to people using a
translated version, let's mark all the remaining error or warning
messages for translation.
While at it:
- improve how some arguments to some error functions are indented,
- remove "Error:" at the start of an error message,
- downcase error and warning messages that start with an uppercase.
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In "gpg-interface.h", the definitions of the GPG_VERIFY_* boolean flags
are currently using 1, 2 and 4 while we often prefer the bitwise left
shift operator, `<<`, for that purpose to make it clearer that they are
boolean.
Let's use the left shift operator here too. Let's also fix an indent
issue with "4" while at it.
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In "gpg-interface.c", the 'parse_ssh_output()' function takes a
'struct signature_check *sigc' argument and populates many members of
this 'sigc' using information parsed from 'sigc->output' which
contains the ouput of an `ssh-keygen -Y ...` command that was used to
verify an SSH signature.
When it populates 'sigc->fingerprint' though, it uses
`xstrdup(strstr(line, "key ") + 4)` while `strstr(line, "key ")` has
already been computed a few lines above and is already available in
the `key` variable.
Let's simplify this.
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Clean-up "git repack" machinery to prepare for incremental update
of midx files.
* tb/incremental-midx-part-3.1: (49 commits)
builtin/repack.c: clean up unused `#include`s
repack: move `write_cruft_pack()` out of the builtin
repack: move `write_filtered_pack()` out of the builtin
repack: move `pack_kept_objects` to `struct pack_objects_args`
repack: move `finish_pack_objects_cmd()` out of the builtin
builtin/repack.c: pass `write_pack_opts` to `finish_pack_objects_cmd()`
repack: extract `write_pack_opts_is_local()`
repack: move `find_pack_prefix()` out of the builtin
builtin/repack.c: use `write_pack_opts` within `write_cruft_pack()`
builtin/repack.c: introduce `struct write_pack_opts`
repack: 'write_midx_included_packs' API from the builtin
builtin/repack.c: inline packs within `write_midx_included_packs()`
builtin/repack.c: pass `repack_write_midx_opts` to `midx_included_packs`
builtin/repack.c: inline `remove_redundant_bitmaps()`
builtin/repack.c: reorder `remove_redundant_bitmaps()`
repack: keep track of MIDX pack names using existing_packs
builtin/repack.c: use a string_list for 'midx_pack_names'
builtin/repack.c: extract opts struct for 'write_midx_included_packs()'
builtin/repack.c: remove ref snapshotting from builtin
repack: remove pack_geometry API from the builtin
...
We read the input into a strbuf, so we must free it. Without this, t1016
complains in SANITIZE=leak mode.
The bug was introduced in 7673ecd2dc (t1016-compatObjectFormat: add
tests to verify the conversion between objects, 2023-10-01). But nobody
seems to have noticed, probably because CI did not run these tests until
the fix in 6cd8369ef3 (t/lib-gpg: call prepare_gnupghome() in GPG2
prereq, 2024-07-03).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Because gitignore patterns are passed to fnmatch, the handling of
backslashes is the same as it is there: it can be used to escape
metacharacters. We do reference fnmatch(3) for more details, but it may
be friendlier to point out this implication explicitly (especially for
people who want to know about backslash handling and search the
documentation for that word). There are also two cases that I've seen
some other backslash-escaping systems handle differently, so let's
describe those:
1. A backslash before any character treats that character literally,
even if it's not otherwise a meta-character. As opposed to
including the backslash itself (like "foo\bar" in shell expands to
"foo\bar") or forbidding it ("foo\zar" is required to produce a
diagnostic in C).
2. A backslash at the end of the string is an invalid pattern (and not
a literal backslash).
This second one in particular was a point of confusion between our
implementation and the one in JGit. Our wildmatch behavior matches what
POSIX specifies for fnmatch, so the code and documentation are in line.
But let's add a test to cover this case. Note that the behavior here
differs between wildmatch itself (which is what gitignore will use) and
pathspec matching (which will only turn to wildmatch if a literal match
fails). So we match "foo\" to "foo\" in pathspecs, but not via
gitignore.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The strategy in t1016-compatObjectFormat is to build two trees with
identical commits, one tree encoded in sha1 the other tree encoded
in sha256 and to use the compatibility code to test and see if
the two trees are identical.
GPG signatures include the current time as part of the signature.
To make gpg deterministic I forced the use of gpg --faked-system-time.
Unfortunately I did not look closely enough.
By default gpg still allows time to move forward with --faked-system-time.
So in those rare instances when the system is heavily loaded and gpg runs
slower than other times, signatures over the exact same data differ
due to timestamps with a minuscule difference.
Reading through the gpg documentation with a close eye, time can be
frozen by including an exclamation point at the end of the argument to
--faked-system-time.
Add the exclamation point so gpg really runs with a fixed notion of time,
resulting in the exact same data having identical gpg signatures.
That is enough that I can run "t1016-compatObjectFormat.sh --stress"
and I don't see any failures.
It is possible a future change to gpg will make replay protection more
robust and not provide a way to allow two separate runs of gpg to
produce exactly the same signature for exactly the same data. If that
happens a deeper comparison of the two repositories will need to be
performed. A comparison that simply verifies the signatures and
compares the data for equality. For now that is a lot of work
for no gain so I am just documenting the possibility.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Update the usage string of `git bisect` and documentation to match each
other. While at it, also:
1. Move the synopsis of `git bisect` subcommands to the synopsis
section, so that the test `t0450-txt-doc-vs-help.sh` can pass.
2. Document the `git bisect next` subcommand, which exists in the code
but is missing from the documentation.
See also: [1].
[1]: https://lore.kernel.org/git/3DA38465-7636-4EEF-B074-53E4628F5355@gmail.com/
Suggested-by: Ben Knoble <ben.knoble@gmail.com>
Signed-off-by: Ruoyu Zhong <zhongruoyu@outlook.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The placeholder markup is underscore (_), not backtick (`) as well.
The inline-verbatim markup (backticks) handle interior formatting. This
means in this case that it applies HTML `<code>` to the underscores and
`<em>` to the placeholder.
That is the effect, anyway; we can see from the rest of 042d6f34 (doc:
git-checkout: clarify `-b` and `-B`, 2025-09-10) that this was probably
an unintended mix-up.
Acked-by: Julia Evans <julia@jvns.ca>
Signed-off-by: Kristoffer Haugsbakk <code@khaugsbakk.name>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
CI improvements to handle the recent Rust integration better.
* ps/ci-rust:
rust: support for Windows
ci: verify minimum supported Rust version
ci: check for common Rust mistakes via Clippy
rust/varint: add safety comments
ci: check formatting of our Rust code
ci: deduplicate calls to `apt-get update`
"git fast-import" is taught to handle signed tags, just like it
recently learned to handle signed commits, in different ways.
* cc/fast-import-strip-signed-tags:
fast-import: add '--signed-tags=<mode>' option
fast-export: handle all kinds of tag signatures
t9350: properly count annotated tags
lib-gpg: allow tests with GPGSM or GPGSSH prereq first
doc: git-tag: stop focusing on GPG signed tags
"git sparse-checkout" subcommand learned a new "clean" action to
prune otherwise unused working-tree files that are outside the
areas of interest.
* ds/sparse-checkout-clean:
sparse-index: improve advice message instructions
t: expand tests around sparse merges and clean
sparse-index: point users to new 'clean' action
sparse-checkout: add --verbose option to 'clean'
dir: add generic "walk all files" helper
sparse-checkout: match some 'clean' behavior
sparse-checkout: add basics of 'clean' command
sparse-checkout: remove use of the_repository
* ps/remove-packfile-store-get-packs: (55 commits)
packfile: rename `packfile_store_get_all_packs()`
packfile: introduce macro to iterate through packs
packfile: drop `packfile_store_get_packs()`
builtin/grep: simplify how we preload packs
builtin/gc: convert to use `packfile_store_get_all_packs()`
object-name: convert to use `packfile_store_get_all_packs()`
builtin/repack.c: clean up unused `#include`s
repack: move `write_cruft_pack()` out of the builtin
repack: move `write_filtered_pack()` out of the builtin
repack: move `pack_kept_objects` to `struct pack_objects_args`
repack: move `finish_pack_objects_cmd()` out of the builtin
builtin/repack.c: pass `write_pack_opts` to `finish_pack_objects_cmd()`
repack: extract `write_pack_opts_is_local()`
repack: move `find_pack_prefix()` out of the builtin
builtin/repack.c: use `write_pack_opts` within `write_cruft_pack()`
builtin/repack.c: introduce `struct write_pack_opts`
repack: 'write_midx_included_packs' API from the builtin
builtin/repack.c: inline packs within `write_midx_included_packs()`
builtin/repack.c: pass `repack_write_midx_opts` to `midx_included_packs`
builtin/repack.c: inline `remove_redundant_bitmaps()`
...
When a supposedly no-op "git repack" runs across a second boundary,
because the command always touches the MIDX file and updates its
timestamp, "ls -l $GIT_DIR/objects/pack/" before and after the
operation can change, which causes such a test to fail. Only
compare the *.pack files in the directory before and after the
operation to work around this flakyness.
Arguably, git-repack(1) should learn to not rewrite the MIDX in case
we know it is already up-to-date. But this is not a new problem
introduced via the new geometric maintenance task, so for now it
should be good enough to paper over the issue.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
[jc: taken from diff to v4 from v3 that was already merged to 'next']
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add a note after the `git send-email` section explaining how
contributors can confirm that their patches reached the mailing
list by checking https://lore.kernel.org/git/. This helps
contributors verify that their emails were successfully delivered.
Signed-off-by: Queen Ediri Jessa <qjessa662@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The debug ref backend (refs_be_debug) was missing the remove_on_disk
function pointer, which caused a segmentation fault when running
'GIT_TRACE_REFS=1 git refs migrate --ref-format=reftable' commands.
Signed-off-by: Xinyu Ruan <r200981113@gmail.com>
Acked-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
CI update.
* js/ci-github-actions-update:
build(deps): bump actions/github-script from 7 to 8
build(deps): bump actions/setup-python from 5 to 6
build(deps): bump actions/checkout from 4 to 5
build(deps): bump actions/download-artifact from 4 to 5
Recent OpenSSH creates the Unix domain socket to communicate with
ssh-agent under $HOME instead of /tmp, which causes our test to
fail doe to overly long pathname in our test environment, which has
been worked around by using "ssh-agent -T".
* ps/t7528-ssh-agent-uds-workaround:
t7528: work around ETOOMANY in OpenSSH 10.1 and newer
The "--short" option of "git status" that meant output for humans
and "-z" option to show NUL delimited output format did not mix
well, and colored some but not all things. The command has been
updated to color all elements consistently in such a case.
* jk/status-z-short-fix:
status: make coloring of "-z --short" consistent
An earlier addition to "git diff --no-index A B" to limit the
output with pathspec after the two directories misbehaved when
these directories were given with a trailing slash, which has been
corrected.
* jk/diff-no-index-with-pathspec-fix:
diff --no-index: fix logic for paths ending in '/'
Windows "real-time monitoring" interferes with the execution of
tests and affects negatively in both correctness and performance,
which has been disabled in Gitlab CI.
* ps/gitlab-ci-disable-windows-monitoring:
gitlab-ci: disable realtime monitoring to unbreak Windows jobs
The code to squelch output from "git diff -w --name-status"
etc. for paths that "git diff -w -p" would have stayed silent
leaked output from dry-run patch generation, which has been
corrected.
* jc/diff-from-contents-fix:
diff: make sure the other caller of diff_flush_patch_quietly() is silent
Recently we attempted to improve "git diff -w" and friends to
handle cases where patch output would be suppressed, but it
introduced a bug that emits unnecessary output, which has been
corrected.
* jk/diff-from-contents-fix:
diff: restore redirection to /dev/null for diff_from_contents
If we reach the end of the input, e.g. because the user pressed ctrl-D
on Linux, there is no point in showing any more prompts, as we won't get
any reply. Do the same as option 'q' would: Quit.
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In match_pathname(), which we use for matching .gitignore and
.gitattribute patterns, we are comparing paths with fnmatch patterns
(actually our extended wildmatch, which will be important). There's an
extra optimization there: we pre-compute the number of non-wildcard
characters at the beginning of the pattern and do an fspathncmp() on
that prefix.
That lets us avoid fnmatch entirely on patterns without wildcards, and
shrinks the amount of work we hand off to fnmatch. For a pattern like
"foo*.txt" and a path "foobar.txt", we'd cut away the matching "foo"
prefix and just pass "*.txt" and "bar.txt" to fnmatch().
But this misses a subtle corner case. In fnmatch(), we'll think
"bar.txt" is the start of the path, but it's not. This doesn't matter
for the pattern above, but consider the wildmatch pattern "foo**/bar"
and the path "foobar". These two should not match, because there is no
file named "bar", and the "**" applies only to the containing directory
name. But after removing the "foo" prefix, fnmatch will get "**/bar" and
"bar", which it does consider a match, because "**/" can match zero
directories.
We can solve this by giving fnmatch a bit more context. As long as it
has one byte of the matched prefix, then it will know that "bar" is not
the start of the path. In this example it would get "o**/bar" and
"obar", and realize that they cannot match.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
As an optimization, we use fspathncmp() to match a prefix of the pattern
that does not contain any wildcards, and then pass the remainder to
fnmatch(). If it has matched the whole thing, we can return early.
Let's shift this early-return check to before we tweak the pattern and
name strings. That will gives us more flexibility with that tweaking.
It might also save a few instructions, but I couldn't measure any
improvement in doing so (and I wouldn't be surprised if an optimizing
compiler could figure that out itself).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add an install target rule to the Makefiles in contrib/credential in the
same manner as in other Makefiles in contrib such as for contacts or
subtree.
Signed-off-by: Thomas Uhle <thomas.uhle@mailbox.tu-dresden.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Option q implies d, i.e., it marks any undecided hunks towards the
bottom of the hunk array as skipped. This is unnecessary; later code
treats undecided and skipped hunks the same: The only functions that
use UNDECIDED_HUNK and SKIP_HUNK are patch_update_file() itself (but
not after its big for loop) and its helpers get_first_undecided() and
display_hunks().
Streamline the handling of option q by quitting immediately.
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Recent OpenSSH creates the Unix domain socket to communicate with
ssh-agent under $HOME instead of /tmp, which causes our test to
fail doe to overly long pathname in our test environment, which has
been worked around by using "ssh-agent -T".
* ps/t7528-ssh-agent-uds-workaround:
t7528: work around ETOOMANY in OpenSSH 10.1 and newer
Build procedure for a few credential helpers (in contrib/) have
been updated.
* tu/credential-makefile-updates:
contrib/credential: harmonize Makefiles
The "--short" option of "git status" that meant output for humans
and "-z" option to show NUL delimited output format did not mix
well, and colored some but not all things. The command has been
updated to color all elements consistently in such a case.
* jk/status-z-short-fix:
status: make coloring of "-z --short" consistent
We have two different repacking strategies in Git:
- The "gc" strategy uses git-gc(1).
- The "incremental" strategy uses multi-pack indices and `git
multi-pack-index repack` to merge together smaller packfiles as
determined by a specific batch size.
The former strategy is our old and trusted default, whereas the latter
has historically been used for our scheduled maintenance. But both
strategies have their shortcomings:
- The "gc" strategy performs regular all-into-one repacks. Furthermore
it is rather inflexible, as it is not easily possible for a user to
enable or disable specific subtasks.
- The "incremental" strategy is not a full replacement for the "gc"
strategy as it doesn't know to prune stale data.
So today, we don't have a strategy that is well-suited for large repos
while being a full replacement for the "gc" strategy.
Introduce a new "geometric" strategy that aims to fill this gap. This
strategy invokes all the usual cleanup tasks that git-gc(1) does like
pruning reflogs and rerere caches as well as stale worktrees. But where
it differs from both the "gc" and "incremental" strategy is that it uses
our geometric repacking infrastructure exposed by git-repack(1) to
repack packfiles. The advantage of geometric repacking is that we only
need to perform an all-into-one repack when the object count in a repo
has grown significantly.
One downside of this strategy is that pruning of unreferenced objects is
not going to happen regularly anymore. Every geometric repack knows to
soak up all loose objects regardless of their reachability, and merging
two or more packs doesn't consider reachability, either. Consequently,
the number of unreachable objects will grow over time.
This is remedied by doing an all-into-one repack instead of a geometric
repack whenever we determine that the geometric repack would end up
merging all packfiles anyway. This all-into-one repack then performs our
usual reachability checks and writes unreachable objects into a cruft
pack. As cruft packs won't ever be merged during geometric repacks we
can thus phase out these objects over time.
Of course, this still means that we retain unreachable objects for far
longer than with the "gc" strategy. But the maintenance strategy is
intended especially for large repositories, where the basic assumption
is that the set of unreachable objects will be significantly dwarfed by
the number of reachable objects.
If this assumption is ever proven to be too disadvantageous we could for
example introduce a time-based strategy: if the largest packfile has not
been touched for longer than $T, we perform an all-into-one repack. But
for now, such a mechanism is deferred into the future as it is not clear
yet whether it is needed in the first place.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Acked-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
While the user can pick the "incremental" maintenance strategy, it is
not possible to explicitly use the "gc" strategy. This has two
downsides:
- It is impossible to use the default "gc" strategy for a specific
repository when the strategy was globally set to a different strategy.
- It is not possible to use git-gc(1) for scheduled maintenance.
Address these issues by making making the "gc" strategy configurable.
Furthermore, extend the strategy so that git-gc(1) runs for both manual
and scheduled maintenance.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Acked-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The "maintenance.strategy" configuration allows users to configure how
Git is supposed to perform repository maintenance. The idea is that we
provide a set of high-level strategies that may be useful in different
contexts, like for example when handling a large monorepo. Furthermore,
the strategy can be tweaked by the user by overriding specific tasks.
In its current form though, the strategy only applies to scheduled
maintenance. This creates something of a gap, as scheduled and manual
maintenance will now use _different_ strategies as the latter would
continue to use git-gc(1) by default. This makes the strategies way less
useful than they could be on the one hand. But even more importantly,
the two different strategies might clash with one another, where one of
the strategies performs maintenance in such a way that it discards
benefits from the other strategy.
So ideally, it should be possible to pick one strategy that then applies
globally to all the different ways that we perform maintenance. This
doesn't necessarily mean that the strategy always does the _same_ thing
for every maintenance type. But it means that the strategy can configure
the different types to work in tandem with each other.
Change the meaning of "maintenance.strategy" accordingly so that the
strategy is applied to both types, manual and scheduled. As preceding
commits have introduced logic to run maintenance tasks depending on this
type we can tweak strategies so that they perform those tasks depending
on the context.
Note that this raises the question of backwards compatibility: when the
user has configured the "incremental" strategy we would have ignored
that strategy beforehand. Instead, repository maintenance would have
continued to use git-gc(1) by default.
But luckily, we can match that behaviour by:
- Keeping all current tasks of the incremental strategy as
`MAINTENANCE_TYPE_SCHEDULED`. This ensures that those tasks will not
run during manual maintenance.
- Configuring the "gc" task so that it is invoked during manual
maintenance.
Like this, the user shouldn't observe any difference in behaviour.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Acked-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We basically have three different ways to execute repository
maintenance:
1. Manual maintenance via `git maintenance run`.
2. Automatic maintenance via `git maintenance run --auto`.
3. Scheduled maintenance via `git maintenance run --schedule=`.
At the moment, maintenance strategies only have an effect for the last
type of maintenance. This is about to change in subsequent commits, but
to do so we need to be able to skip some tasks depending on how exactly
maintenance was invoked.
Introduce a new maintenance type that discern between manual (1 & 2) and
scheduled (3) maintenance. Convert the `enabled` field into a bitset so
that it becomes possible to specifiy which tasks exactly should run in a
specific context.
The types picked for existing strategies match the status quo:
- The default strategy is only ever executed as part of a manual
maintenance run. It is not possible to use it for scheduled
maintenance.
- The incremental strategy is only ever executed as part of a
scheduled maintenance run. It is not possible to use it for manual
maintenance.
The strategies will be tweaked in subsequent commits to make use of this
new infrastructure.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Acked-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Our maintenance strategies are essentially a large array of structures,
where each of the tasks can be enabled and scheduled individually. With
the current layout though all the configuration sits on the same nesting
layer, which makes it a bit hard to discern which initialized fields
belong to what task.
Improve readability of the individual tasks by using nested designated
initializers instead.
Suggested-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Acked-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When parsing maintenance strategies we completely ignore the
user-configured value in case it is unknown to us. This makes it
basically undiscoverable to the user that scheduled maintenance is
devolving into a no-op.
Change this to instead die when seeing an unknown maintenance strategy.
While at it, pull out the parsing logic into a separate function so that
we can reuse it in a subsequent commit.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Acked-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The geometric repacking task uses a factor of two for its geometric
sequence, meaning that each next pack must contain at least twice as
many objects as the next-smaller one. In some cases it may be helpful to
configure this factor though to reduce the number of packfile merges
even further, e.g. in very big repositories. But while git-repack(1)
itself supports doing this, the maintenance task does not give us a way
to tune it.
Introduce a new "maintenance.geometric-repack.splitFactor" configuration
to plug this gap.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Acked-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Introduce a new "geometric-repack" task. This task uses our geometric
repack infrastructure as provided by git-repack(1) itself, which is a
strategy that especially hosting providers tend to use to amortize the
costs of repacking objects.
There is one issue though with geometric repacks, namely that they
unconditionally pack all loose objects, regardless of whether or not
they are reachable. This is done because it means that we can completely
skip the reachability step, which significantly speeds up the operation.
But it has the big downside that we are unable to expire objects over
time.
To address this issue we thus use a split strategy in this new task:
whenever a geometric repack would merge together all packs, we instead
do an all-into-one repack. By default, these all-into-one repacks have
cruft packs enabled, so unreachable objects would now be written into
their own pack. Consequently, they won't be soaked up during geometric
repacking anymore and can be expired with the next full repack, assuming
that their expiry date has surpassed.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Acked-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
To decide whether or not a repository needs to be repacked we estimate
the number of loose objects. If the number exceeds a certain threshold
we perform the repack, otherwise we don't.
This is done via `too_many_loose_objects()`, which takes as parameter
the `struct gc_config`. This configuration is only used to determine the
threshold. In a subsequent commit we'll add another caller of this
function that wants to pass a different limit than the one stored in
that structure.
Refactor the function accordingly so that we only take the limit as
parameter instead of the whole structure.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Acked-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The global `repack` variable is used to store all command line arguments
that we eventually want to pass to git-repack(1). It is being appended
to from multiple different functions, which makes it hard to follow the
logic. Besides being hard to follow, it also makes it unnecessarily hard
to reuse this infrastructure in new code.
Refactor the code so that we store this variable on the stack and pass
a pointer to it around as needed. This is done so that we can reuse
`add_repack_all_options()` in a subsequent commit.
The refactoring itself is straight-forward. One function that deserves
attention though is `need_to_gc()`: this function determines whether or
not we need to execute garbage collection for `git gc --auto`, but also
for `git maintenance run --auto`. But besides figuring out whether we
have to perform GC, the function also sets up the `repack` arguments.
For `git gc --auto` it's trivial to adapt, as we already have the
on-stack variable at our fingertips. But for the maintenance condition
it's less obvious what to do.
As it turns out, we can just use another temporary variable there that
we then immediately discard. If we need to perform GC we execute a child
git-gc(1) process to repack objects for us, and that process will have
to recompute the arguments anyway.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Acked-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We'd sometimes end up in run_external_diff() to do a dry-run diff (e.g.,
to find content-level changes for --quiet). We recognize this quiet mode
by seeing the lack of DIFF_FORMAT_PATCH in the output format.
But since introducing an explicit dry-run check via 3ed5d8bd73 (diff:
stop output garbled message in dry run mode, 2025-10-20), this logic can
never trigger. We can only get to this function by calling
diff_flush_patch(), and that comes from only two places:
1. A dry-run flush comes from diff_flush_patch_quietly(), which is
always in dry-run mode (so the other half of our "||" is true
anyway).
2. A regular flush comes from diff_flush_patch_all_file_pairs(),
which is only called when output_format has DIFF_FORMAT_PATCH in
it.
So we can simplify our "quiet" condition to just checking dry-run mode
(which used to be a specific flag, but recently became just a NULL
"file" pointer). And since it's so simple, we can just do that inline.
This makes the logic about o->file more obvious, since we handle the
NULL and non-stdout cases next to each other.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
As an added protection against dry-run diffs accidentally producing
output, we redirect diff_options.file to /dev/null. But as of the
previous patch, this now does nothing, since dry-run diffs are
implemented by setting "file" to NULL.
So we can drop this extra code with no change in behavior. This is
effectively a revert of 623f7af284 (diff: restore redirection to
/dev/null for diff_from_contents, 2025-10-17) and 3da4413dbc (diff: make
sure the other caller of diff_flush_patch_quietly() is silent,
2025-10-22), but:
1. We get a conflict because we already dropped the color_moved
handling in an earlier patch. But we just resolve the conflicts to
"theirs" (removing all of the code).
2. We retain the test from 623f7af284.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We introduced a dry_run flag to diff_options in b55e6d36eb (diff: ensure
consistent diff behavior with ignore options, 2025-08-08), with the idea
that the lower-level diff code could skip output when it is set.
As we saw with the bugs fixed by 3ed5d8bd73 (diff: stop output garbled
message in dry run mode, 2025-10-20), it is easy to miss spots. In the
end, we located all of them by checking where diff_options.file is used.
That suggests another possible approach: we can replace the dry_run
boolean with a NULL pointer for "file", as we know that using "file" in
dry_run mode would always be an error. This turns any missed spots from
producing extra output[1] into a segfault. Which is less forgiving, but
that is the point: this is indicative of a programming error, and
complaining loudly and immediately is good.
[1] We protect ourselves against garbled output as a separate step,
courtesy of 623f7af284 (diff: restore redirection to /dev/null for
diff_from_contents, 2025-10-17). So in that sense this patch can
only introduce user-visible errors (since any "bugs" were going to
/dev/null before), but the idea is to catch them rather than quietly
send garbage to /dev/null.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When running a dry-run content-level diff to check whether a "--quiet"
diff has any changes, we have always unset the color_moved variable
since the feature was added in 2e2d5ac184 (diff.c: color moved lines
differently, 2017-06-30). The reasoning is not given explicitly there,
but presumably the idea is that since color_moved requires a lot of
extra computation to match lines but does not actually affect the
found_changes flag, we want to skip it.
Later, in 3da4413dbc (diff: make sure the other caller of
diff_flush_patch_quietly() is silent, 2025-10-22) we copied the same
idea for other dry-run diffs.
But neither spot actually needs to reset this flag at all, because
diff_flush_patch() will not ever compute color_moved. Nor could it, as
it is only looking at a single file-pair, and we detect moves across
files. So color_moved is checked only when we are actually doing real
DIFF_FORMAT_PATCH output, and call diff_flush_patch_all_file_pairs().
So we can get rid of these extra lines to save and restore the
color_moved flag without changing the behavior at all. (Note that there
is no "restore" to drop for the second caller, as we know at that point
we are not generating any output and can just leave the feature
disabled).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Diff output usually goes to the process stdout, but it can be redirected
with the "--output" option. We store this in the "file" pointer of
diff_options, and all of the diff code should write there instead of to
stdout.
But there's one spot we missed: running an external diff cmd. We don't
redirect its output at all, so it just defaults to the stdout of the
parent process. We should instead point its stdout at our output file.
There are a few caveats to watch out for when doing so:
- The stdout field takes a descriptor, not a FILE pointer. We can pull
out the descriptor with fileno().
- The run-command API always closes the stdout descriptor we pass to
it. So we must duplicate it (otherwise we break the FILE pointer,
since it now points to a closed descriptor).
- We don't need to worry about closing our dup'd descriptor, since the
point is that run-command will do it for us (even in the case of an
error). But we do need to make sure we skip the dup() if we set
no_stdout (because then run-command will not look at it at all).
- When the output is going to stdout, it would not be wrong to dup()
the descriptor, but we don't need to. We can skip that extra work
with a simple pointer comparison.
- It seems like you'd need to fflush() the descriptor before handing
off a copy to the child process to prevent out-of-order writes. But
that was true even before this patch! It works because run-command
always calls fflush(NULL) before running the child.
The new test shows the breakage (and fix). The need for duplicating the
descriptor doesn't need a new test; that is covered by the later test
"GIT_EXTERNAL_DIFF with more than one changed files".
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Building a list using commit_list_insert_by_date() has quadratic worst
case complexity. Avoid it by just appending in the loop and sorting at
the end.
The number of merge bases is usually small, so don't expect speedups in
normal repositories. It has no limit, though. The added perf test
shows a nice improvement when dealing with 16384 merge bases:
Test v2.51.1 HEAD
-----------------------------------------------------------------
6010.2: git merge-base 0.55(0.54+0.00) 0.03(0.02+0.00) -94.5%
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The code to squelch output from "git diff -w --name-status"
etc. for paths that "git diff -w -p" would have stayed silent
leaked output from dry-run patch generation, which has been
corrected.
* jc/diff-from-contents-fix:
diff: make sure the other caller of diff_flush_patch_quietly() is silent
Recently we attempted to improve "git diff -w" and friends to
handle cases where patch output would be suppressed, but it
introduced a bug that emits unnecessary output, which has been
corrected.
* jk/diff-from-contents-fix:
diff: restore redirection to /dev/null for diff_from_contents
In t7528 we spawn an SSH agent to verify that we can sign a commit via
it. This test has started to fail on some machines:
+++ ssh-agent
unix_listener_tmp: path "/home/pks/Development/git/build/test-output/trash directory.t7528-signed-commit-ssh/.ssh/agent/s.UTulegefEg.agent.UrPHumMXPq" too long for Unix domain socket
main: Couldn't prepare agent socket
As it turns out this is caused by a change in OpenSSH 10.1 [1]:
* ssh-agent(1), sshd(8): move agent listener sockets from /tmp to
under ~/.ssh/agent for both ssh-agent(1) and forwarded sockets
in sshd(8).
Instead of creating the socket in "/tmp", OpenSSH now creates the socket
in our home directory. And as the home directory gets modified to be
located in our test output directory we end up with paths that are
somewhat long. But Linux has a rather short limit of 108 characters for
socket paths, and other systems have even lower limits, so it is very
easy now to exceed the limit and run into the above error.
Work around the issue by using `ssh-agent -T`, which instructs it to
use the old behaviour and create the socket in "/tmp" again. This switch
has only been introduced with 10.1 though, so for older versions we have
to fall back to not using it. That's fine though, as older versions know
to put the socket into "/tmp" already.
An alternative approach would be to abbreviate the socket name itself so
that we create it as e.g. "sshsock" in the trash directory. But taking
the above example we'd still end up with a path that is 91 characters
long. So we wouldn't really have a lot of headroom, and it is quite
likely that some developers would see the issue on their machines.
[1]: https://www.openssh.com/txt/release-10.1
Reported-by: Xi Ruoyao <xry111@xry111.site>
Suggested-by: brian m. carlson <sandals@crustytoothpaste.net>
Helped-by: Jeff King <peff@peff.net>
Helped-by: Lauri Tirkkonen <lauri@hacktheplanet.fi>
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In get_default_ssh_signing_key(), the default ssh signing key is
retrieved in `key_stdout` buf, which is then split using
strbuf_split_max() into up to two strbufs at a new line and the first
strbuf is returned as a `char *`and not a strbuf.
This makes the function lack the use of strbuf API as no edits are
performed on the split tokens.
Simplify the process of retrieving and returning the desired line by
using strchr() to isolate the line and xmemdupz() to return a copy of the
line. This removes the roundabout way of splitting the string into
strbufs, just to return the line.
Reported-by: Junio Hamano <gitster@pobox.com>
Helped-by: Christian Couder <christian.couder@gmail.com>
Helped-by: Kristoffer Haugsbakk <kristofferhaugsbakk@fastmail.com>
Signed-off-by: Olamide Caleb Bello <belkid98@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In get_ssh_finger_print(), the output of the `ssh-keygen` command is
put into `fingerprint_stdout` strbuf. The string in `fingerprint_stdout`
is then split into up to 3 strbufs using strbuf_split_max(). However they
are not modified after the split thereby not making use of the strbuf API
as the fingerprint token is merely returned as a char * and not a strbuf.
Hence they do not need to be strbufs.
Simplify the process of retrieving and returning the desired token by
using strchr() to isolate the token and xmemdupz() to return a copy of the
token. This removes the roundabout way of splitting the string into
strbufs just to return the token.
Reported-by: Junio Hamano <gitster@pobox.com>
Helped-by: Christian Couder <christian.couder@gmail.com>
Helped-by: Kristoffer Haugsbakk <kristofferhaugsbakk@fastmail.com>
Signed-off-by: Olamide Caleb Bello <belkid98@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Earlier, b55e6d36 (diff: ensure consistent diff behavior with
ignore options, 2025-08-08) introduced "dry-run" mode to the
diff machinery so that content-based diff filtering (like
ignoring space changes or those that match -I<regex>) can first
try to produce a patch without emitting any output to see if
under the given diff filtering condition we would get any output
lines, and a new helper function diff_flush_patch_quietly() was
introduced to use the mode to see an individual filepair needs
to be shown.
However, the solution was not complete. When files are deleted,
file modes change, or there are unmerged entries in the index,
dry-run mode still produces output because we overlooked these
conditions, and as a result, dry-run mode was not quiet.
To fix this, return early in emit_diff_symbol_from_struct() if
we are in dry-run mode. This function will be called by all the
emit functions to output the results. Returning early can avoid
diff output when files are deleted or file modes are changed.
Stop print message in dry-run mode if we have unmerged entries
in index. Discard output of external diff tool in dry-run mode.
Signed-off-by: Lidong Yan <yldhome2d2@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Earlier, we added is a protection for the loop that computes "git
diff --quiet -w" to ensure calls to the diff_flush_patch_quietly()
helper stays quiet. Do the same for another loop that deals with
options like "--name-status" to make calls to the same helper.
Helped-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Documentation updates.
* je/doc-pull:
doc: git-pull: clarify how to exit a conflicted merge
doc: git-pull: delete the example
doc: git-pull: clarify options for integrating remote branch
doc: git-pull: move <repository> and <refspec> params
The beginning of SHA1-SHA256 interoperability work.
* bc/sha1-256-interop-01:
t1010: use BROKEN_OBJECTS prerequisite
t: allow specifying compatibility hash
fsck: consider gpgsig headers expected in tags
rev-parse: allow printing compatibility hash
docs: add documentation for loose objects
docs: improve ambiguous areas of pack format documentation
docs: reflect actual double signature for tags
docs: update offset order for pack index v3
docs: update pack index v3 format
CI update.
* js/ci-github-actions-update:
build(deps): bump actions/github-script from 7 to 8
build(deps): bump actions/setup-python from 5 to 6
build(deps): bump actions/checkout from 4 to 5
build(deps): bump actions/download-artifact from 4 to 5
As documented in git-bisect(1), `git bisect help` should display usage
information. However, since the migration of `git bisect` to a full
builtin command in 73fce29427 (Turn `git bisect` into a full built-in,
2022-11-10), this behavior was broken. Running `git bisect help` would,
instead of showing usage, either fail silently if already in a bisect
session, or otherwise trigger an interactive autostart prompt asking "Do
you want me to do it for you [Y/n]?".
Similarly, since df63421be9 (bisect--helper: handle states directly,
2022-11-10), running invalid subcommands like `git bisect foobar` also
led to the same behavior.
This occurred because `help` and other unrecognized subcommands were
being unconditionally passed to `bisect_state`, which then called
`bisect_autostart`, triggering the interactive prompt.
Fix this by:
1. Adding explicit handling for the `help` subcommand to show usage;
2. Validating that unrecognized commands are actually valid state
commands before calling `bisect_state`;
3. Showing an error with usage for truly invalid commands.
This ensures that `git bisect help` displays the usage as documented,
and invalid commands fail cleanly without entering interactive mode.
Alternate terms are still handled correctly through
`check_and_set_terms`.
Signed-off-by: Ruoyu Zhong <zhongruoyu@outlook.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The changed-path Bloom filters feature has proven stable and reliable
over several years of use, delivering significant performance
improvement for file history computation in large monorepos. Currently
a user can opt-in to writing the changed-path Bloom filters using the
"--changed-paths" option to "git commit-graph write". The filters will
be persisted until the user drops the filters using the
"--no-changed-paths" option. For this functionality, refer to 0087a87ba8
(commit-graph: persist existence of changed-paths, 2020-07-01).
Large monorepos using Git's background maintenance to build and update
commit-graph files could use an easy switch to enable this feature
without a foreground computation. In this commit, we're proposing a new
config option "commitGraph.changedPaths":
* If "true", "git commit-graph write" will write Bloom filters,
equivalent to passing "--changed-paths";
* If "false" or "unset", Bloom filters will be written during "git
commit-graph write" only if the filters already exist in the current
commit-graph file. This matches the default behaviour of "git
commit-graph write" without any "--[no-]changed-paths" option. Note
"false" can disable a previous "true" config value but doesn't imply
"--no-changed-paths".
This config will always respect the precedence of command line option
"--[no-]changed-paths".
We also set this new config as optional recommended config in scalar to
turn on this feature for large repos.
Helped-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Emily Yang <emilyyang.git@gmail.com>
Acked-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* tb/incremental-midx-part-3.1: (49 commits)
builtin/repack.c: clean up unused `#include`s
repack: move `write_cruft_pack()` out of the builtin
repack: move `write_filtered_pack()` out of the builtin
repack: move `pack_kept_objects` to `struct pack_objects_args`
repack: move `finish_pack_objects_cmd()` out of the builtin
builtin/repack.c: pass `write_pack_opts` to `finish_pack_objects_cmd()`
repack: extract `write_pack_opts_is_local()`
repack: move `find_pack_prefix()` out of the builtin
builtin/repack.c: use `write_pack_opts` within `write_cruft_pack()`
builtin/repack.c: introduce `struct write_pack_opts`
repack: 'write_midx_included_packs' API from the builtin
builtin/repack.c: inline packs within `write_midx_included_packs()`
builtin/repack.c: pass `repack_write_midx_opts` to `midx_included_packs`
builtin/repack.c: inline `remove_redundant_bitmaps()`
builtin/repack.c: reorder `remove_redundant_bitmaps()`
repack: keep track of MIDX pack names using existing_packs
builtin/repack.c: use a string_list for 'midx_pack_names'
builtin/repack.c: extract opts struct for 'write_midx_included_packs()'
builtin/repack.c: remove ref snapshotting from builtin
repack: remove pack_geometry API from the builtin
...
When using the structure subcommand for git-repo(1), evaluating a
repository may take some time depending on its shape. Add a progress
meter to provide feedback to the user about what is happening. The
progress meter is enabled by default when the command is executed from a
tty. It can also be explicitly enabled/disabled via the --[no-]progress
option.
Signed-off-by: Justin Tobler <jltobler@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
All repository structure stats are outputted in a human-friendly table
form. This format is not suitable for machine parsing. Add a --format
option that supports three output modes: `table`, `keyvalue`, and `nul`.
The `table` mode is the default format and prints the same table output
as before.
With the `keyvalue` mode, each line of output contains a key-value pair
of a repository stat. The '=' character is used to delimit between keys
and values. The `nul` mode is similar to `keyvalue`, but key-values are
delimited by a NUL character instead of a newline. Also, instead of a
'=' character to delimit between keys and values, a newline character is
used. This allows stat values to support special characters without
having to cquote them. These two new modes provides output that is more
machine-friendly.
Signed-off-by: Justin Tobler <jltobler@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The amount of objects in a repository can provide insight regarding its
shape. To surface this information, use the path-walk API to count the
number of reachable objects in the repository by object type. All
regular references are used to determine the reachable set of objects.
The object counts are appended to the same table containing the
reference information.
Signed-off-by: Justin Tobler <jltobler@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The structure of a repository's history can have huge impacts on the
performance and health of the repository itself. Currently, Git lacks a
means to surface repository metrics regarding its structure/shape via a
single command. Acquiring this information requires users to be familiar
with the relevant data points and the various Git commands required to
surface them. To fill this gap, supplemental tools such as git-sizer(1)
have been developed.
To allow users to more readily identify repository structure related
information, introduce the "structure" subcommand in git-repo(1). The
goal of this subcommand is to eventually provide similar functionality
to git-sizer(1), but natively in Git.
The initial version of this command only iterates through all references
in the repository and tracks the count of branches, tags, remote refs,
and other reference types. The corresponding information is displayed in
a human-friendly table formatted in a very similar manner to
git-sizer(1). The width of each table column is adjusted automatically
to satisfy the requirements of the widest row contained.
Subsequent commits will surface additional relevant data points to
output and also provide other more machine-friendly output formats.
Based-on-patch-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Justin Tobler <jltobler@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When filtering refs, `ref_kind_from_refname()` is used to determine the
ref type. In a subsequent commit, this same logic is reused when
counting refs by type. Export the function to prepare for this change.
Signed-off-by: Justin Tobler <jltobler@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When setting up `struct ref_filter` for filter_refs(), the
`name_patterns` field must point to an array of pattern strings even if
no patterns are required. To improve this interface, treat a NULL
`name_patterns` field the same as when it points to an empty array.
Signed-off-by: Justin Tobler <jltobler@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Subcommand functions are often prefixed with `cmd_` to denote that they
are an entrypoint. Rename repo_info() to cmd_repo_info() accordingly.
Signed-off-by: Justin Tobler <jltobler@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Show option P in the prompt and explain it properly on a dedicated line
in online help and documentation.
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* tb/incremental-midx-part-3.1: (64 commits)
builtin/repack.c: clean up unused `#include`s
repack: move `write_cruft_pack()` out of the builtin
repack: move `write_filtered_pack()` out of the builtin
repack: move `pack_kept_objects` to `struct pack_objects_args`
repack: move `finish_pack_objects_cmd()` out of the builtin
builtin/repack.c: pass `write_pack_opts` to `finish_pack_objects_cmd()`
repack: extract `write_pack_opts_is_local()`
repack: move `find_pack_prefix()` out of the builtin
builtin/repack.c: use `write_pack_opts` within `write_cruft_pack()`
builtin/repack.c: introduce `struct write_pack_opts`
repack: 'write_midx_included_packs' API from the builtin
builtin/repack.c: inline packs within `write_midx_included_packs()`
builtin/repack.c: pass `repack_write_midx_opts` to `midx_included_packs`
builtin/repack.c: inline `remove_redundant_bitmaps()`
builtin/repack.c: reorder `remove_redundant_bitmaps()`
repack: keep track of MIDX pack names using existing_packs
builtin/repack.c: use a string_list for 'midx_pack_names'
builtin/repack.c: extract opts struct for 'write_midx_included_packs()'
builtin/repack.c: remove ref snapshotting from builtin
repack: remove pack_geometry API from the builtin
...
Update these Makefiles to be in line with other Makefiles from contrib
such as for contacts or subtree by making the following changes:
* Make the default settings after including config.mak.autogen and
config.mak.
* Add the missing $(CPPFLAGS) to the compiler command as well as the
missing $(CFLAGS) to the linker command.
* Use a pattern rule for compilation instead of a dedicated rule for
each compile unit.
* Get rid of $(MAIN), $(SRCS) and $(OBJS) and simply use their values
such as git-credential-libsecret and git-credential-libsecret.o.
* Strip @ from $(RM) to let the clean target rule be verbose.
* Define .PHONY for all special targets (all, clean).
Signed-off-by: Thomas Uhle <thomas.uhle@mailbox.tu-dresden.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
1. '--exclude=' option to 'git log' and 'git shortlog' are missing. Add the
option to __git_log_shortlog_options.
2. The `--committer` option in `git log` requires a pattern, such as
`--committer=ba`, but in `git shortlog`, specifying a pattern results in
an error: “error: option `committer' takes no value.” Handle them as
separate options for completion rather than a shared one.
Signed-off-by: KIYOTA Fumiya <aimluck.kiyota@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When an on-disk sparse index is expanded to a full one, it could be
due to some worktree state that requires looking at file entries
hidden within sparse tree entries. This can be avoided if the
worktree is cleaned up and some other issues related to the index
state are resolved.
Expand the advice message to include all of these cases, since 'git
sparse-checkout clean' is not currently capable of handling all
cases.
In the future, we may improve the behavior of 'git sparse-checkout
clean' to handle all of the cases.
Helped-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
On Windows, the MSYS layer translates absolute path names generated by
a shell script from the POSIX style /c/dir/file to the Windows style
C:/dir/file form that is understood by git.exe. This happens only when
the absolute path stands on its own as a program argument or a value of
an environment variable.
The earlier commits 749d6d166d (config: values of pathname type can be
prefixed with :(optional), 2025-09-28) and ccfcaf399f (parseopt: values
of pathname type can be prefixed with :(optional), 2025-09-28) added
test cases where ":(optional)" is inserted before an absolute path.
$PWD is used to construct the absolute paths, which gives the POSIX
form, and the result is ":(optional)/c/dir/template". Such command line
arguments are no longer recognized as absolute paths and do not undergo
translation.
Existing test cases that expect that the specified file does not exist
are not incorrect (after all, git.exe will not find /c/dir/template).
Yet, they are conceptually incorrect. That the use of $PWD is erroneous
is revealed by a test case that expects that the optional file exists.
Since no such test case is present, add one. Use "$(pwd)" to generate
the absolute paths, so that the command line arguments become
":(optional)C:/dir/template".
Signed-off-by: Johannes Sixt <j6t@kdbg.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When running "git status -z --short", the marker on modified index
entries (e.g., "M") is colorized, but the "??" marker for untracked
entries is not. Let's fix the "??" entries to show color here.
At first glance you might think that neither should be colorized, as
usually one would use "-z" to get machine-readable output. But this is a
tricky and unusual case. We have two output formats, "--short" and
"--porcelain" which are substantially similar, but differ in that
"--short" is for humans who want something short and "--porcelain" is
for machines. And "-z" by itself, without any other output option, does
default to "--porcelain", so "git status -z" will not colorize anything.
But if you explicitly ask for "-z" and "--short" together, then that is
asking for the human-readable output, but separated by NULs. This is
unlikely to be useful directly, but could for example be used if the
output will be shown to a human outside of the terminal. At any rate,
the current behavior is clearly wrong (since we colorize some things but
not others), and I think colorizing everything is the least-surprising
thing we can do here.
Reported-by: Langbart <Langbart@protonmail.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
An earlier addition to "git diff --no-index A B" to limit the
output with pathspec after the two directories misbehaved when
these directories were given with a trailing slash, which has been
corrected.
* jk/diff-no-index-with-pathspec-fix:
diff --no-index: fix logic for paths ending in '/'
A few more things that patch authors can do to help maintainer to
keep track of their topics better.
* tb/doc-submitting-patches:
SubmittingPatches: guidance for multi-series efforts
SubmittingPatches: extend release-notes experiment to topic names
The code in "git add -p" and friends to iterate over hunks was
riddled with bugs, which has been corrected.
* rs/add-patch-options-fix:
add-patch: reset "permitted" at loop start
add-patch: let options a and d roll over like y and n
add-patch: let options k and K roll over like j and J
add-patch: let options y, n, j, and e roll over to next undecided
add-patch: document that option J rolls over
add-patch: improve help for options j, J, k, and K
Instead of three library archives (one for git, one for reftable,
and one for xdiff), roll everything into a single libgit.a archive.
This would help later effort to FFI into Rust.
* en/make-libgit-a:
make: delete REFTABLE_LIB, add reftable to LIB_OBJS
make: delete XDIFF_LIB, add xdiff to LIB_OBJS
In --quiet mode, since we produce only an exit code for "something was
changed" and no actual output, we can often get by with just a
tree-level diff. However, certain options require us to actually look at
the file contents (e.g., if we are ignoring whitespace changes). We have
a flag "diff_from_contents" for that, and if it is set we call
diff_flush() on each path.
To avoid producing any output (since we were asked to be --quiet), we
traditionally just redirected the output to /dev/null. That changed in
b55e6d36eb (diff: ensure consistent diff behavior with ignore options,
2025-08-08), which replaced that with a "dry_run" flag. In theory, with
dry_run set, we should produce no output. But it carries a risk of
regression: if we forget to respect dry_run in any of the output paths,
we'll accidentally produce output.
And indeed, there is at least one such regression in that commit, as it
covered only the case where we actually call into xdiff, and not
creation or deletion diffs, where we manually generate the headers. We
even test this case in t4035, but only with diff-tree, which does not
show the bug by default because it does not require diff_from_contents.
But git-diff does, because it allows external diff programs by default
(so we must dig into each diff filepair to decide if it requires running
an external diff that may declare two distinct blobs to actually be the
same).
We should fix all of those code paths to respect dry_run correctly, but
in the meantime we can protect ourselves more fully by restoring the
redirection to /dev/null. This gives us an extra layer of protection
against regressions dues to other code paths we've missed.
Though the original issue was reported with "git diff" (and due to its
default of --ext-diff), I've used "diff-tree -w" in the new test. It
triggers the same issue, but I think the fact that "-w" implies
diff_from_contents is a bit more obvious, and fits in with the rest of
t4035.
Reported-by: Jake Zimmerman <jake@zimmerman.io>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The Tags and Heads window always opens at a default position and size,
requiring users to reposition it each time. Remember its geometry
between sessions in the config file as `geometry(showrefs)`.
Note that the existing configuration is sourced in proc savestuff
right before new settings are written. This makes the old settings
available as local variables(!) and does not overwrite the current
settings. Since we need access to the global geometry(showrefs), it
is necessary to unset the local variable.
Helped-by: Michael Rappazzo <rappazzo@gmail.com>
Signed-off-by: Johannes Sixt <j6t@kdbg.org>
This reverts commit b9bee11526ec (gitk: Only restore window size from
~/.gitk, not position, 2008-03-10).
The earlier commit e9937d2a03a4 (Make gitk work reasonably well on
Cygwin, 2007-02-01) reworked the window layout considerably. Much of
this became irrelevant around 2011 after Cygwin gained an X11 server
and switched to a supportable port of the Unix/X11 Tcl/Tk (it is now
on the current 8.6 code base).
Part of the necessary change was to restore the window size across
sessions, but the position was also restored. This raised complaints
on the mailing list[*], because Gitk was opened on the wrong monitor.
b9bee11526ec was the compromise, because it was only the size that
mattered for the Cygwin layout engine to work.
I personally, find it annoying when Gitk pops up on a random location
on the screen, in particular, since many other applications restore
the window positions across sessions, so why not Gitk as well? (I do
not operate multi-monitor setups, so I cannot test the case.)
[*] https://lore.kernel.org/git/47AAA254.2020008@thorn.ws/
Helped-by: Mark Levedahl <mlevedahl@gmail.com>
Signed-off-by: Johannes Sixt <j6t@kdbg.org>
In a preceding commit we have removed `packfile_store_get_packs()`. With
this function removed it's somewhat useless to still have the "all"
infix in `packfile_store_get_all_packs()`. Rename the latter to drop
that infix.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We have a bunch of different sites that want to iterate through all
packs of a given `struct packfile_store`. This pattern is somewhat
verbose and repetitive, which makes it somewhat cumbersome.
Introduce a new macro `repo_for_each_pack()` that removes some of the
boilerplate.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In the preceding commits we have removed all remaining callers of
`packfile_store_get_packs()`, the function is thus unused now. Remove
it.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When using multiple threads in git-grep(1) we eagerly preload both the
gitmodules file as well as the packfiles so that the threads won't race
with one another to initialize these data structures.
For packfiles, this is done by calling `packfile_store_get_packs()`,
which first loads our packfiles and then returns a pointer to the first
such packfile. This pointer is ignored though, as all we really care
about is that `packfile_store_prepare()` was called.
Historically, that function was file-local to "packfile.c", but that
changed with 4188332569 (packfile: move `get_multi_pack_index()` into
"midx.c", 2025-09-02). We can thus simplify the code by calling that
function directly.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When running maintenance tasks via git-maintenance(1) we have a couple
of auto-conditions that check whether or not a specific task should be
running. One such check is for incremental repacks, which essentially
use `git multi-pack-index repack` to repack a set of smaller packfiles
into one larger packfile.
The auto-condition for this task checks how many packfiles there are
that aren't indexed by any multi-pack index. If there is a sufficient
number then we execute the above command to combine those into a single
pack and add that pack to the MIDX.
As we don't care about MIDX'd packs we use `packfile_store_get_packs()`,
which knows to not load any packs that are indexed by a MIDX. But as
explained in the preceding commit, we want to get rid of that function.
We already handle packfiles that have a MIDX by the very nature of this
function, as we explicitly count non-MIDX'd packs. As such, we can
trivially switch over to use `packfile_store_get_all_packs()` instead.
Do so.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When searching for abbreviated or when trying to disambiguate object IDs
we do this in two steps:
1. We search through the multi-pack index.
2. We search through all packfiles not part of any multi-pack index.
The second step uses `packfile_store_get_packs()`, which knows to skip
loading any packfiles that are indexed by an MIDX; this is exactly what
we want.
But that function is somewhat problematic, as its behaviour is stateful
and is influenced by `packfile_store_get_all_packs()`. This function
basically does the same as `packfile_store_get_packs()`, but in addition
it also loads all packfiles indexed by an MIDX. The problem here is that
both of these functions act on the same linked list of packfiles, and
thus depending on whether or not `get_all_packs()` was called the result
returned by `get_packs()` will be different. Consequently, all callers
of `get_packs()` need to be prepared to see MIDX'd packs even though
these should in theory be excluded.
This interface is confusing and thus potentially dangerous, which is why
we're converting all callers of `get_packs()` to use `get_all_packs()`
instead.
Do so for the above functions in "object-name.c". As explained, we
already know to skip any MIDX'd packs in both `find_abbrev_len_packed()`
and `find_short_packed_object()`, so it's fine to start loading MIDX'd
packfiles.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* tb/incremental-midx-part-3.1: (64 commits)
builtin/repack.c: clean up unused `#include`s
repack: move `write_cruft_pack()` out of the builtin
repack: move `write_filtered_pack()` out of the builtin
repack: move `pack_kept_objects` to `struct pack_objects_args`
repack: move `finish_pack_objects_cmd()` out of the builtin
builtin/repack.c: pass `write_pack_opts` to `finish_pack_objects_cmd()`
repack: extract `write_pack_opts_is_local()`
repack: move `find_pack_prefix()` out of the builtin
builtin/repack.c: use `write_pack_opts` within `write_cruft_pack()`
builtin/repack.c: introduce `struct write_pack_opts`
repack: 'write_midx_included_packs' API from the builtin
builtin/repack.c: inline packs within `write_midx_included_packs()`
builtin/repack.c: pass `repack_write_midx_opts` to `midx_included_packs`
builtin/repack.c: inline `remove_redundant_bitmaps()`
builtin/repack.c: reorder `remove_redundant_bitmaps()`
repack: keep track of MIDX pack names using existing_packs
builtin/repack.c: use a string_list for 'midx_pack_names'
builtin/repack.c: extract opts struct for 'write_midx_included_packs()'
builtin/repack.c: remove ref snapshotting from builtin
repack: remove pack_geometry API from the builtin
...
Commit 5040f9f164 ("doc: add technical design doc for large object
promisors", 2025-02-18) added the large object promisors document
as a technical document (with a '.txt' extension). The merge commit
2c6fd30198 ("Merge branch 'cc/lop-remote'", 2025-03-05) seems to
have renamed the file with an '.adoc' extension.
Despite the '.adoc' extension, this document was not being formatted
by asciidoc(tor) as part of the docs build. In order to do so, add
the document to the make and meson build files.
Having added the document to the build, asciidoc and asciidoctor find
(slightly different) problems with the syntax of the input document.
The first set of warnings (only issued by asciidoc) relate to some
'section title out of sequence: expected level 3, got level 4'. This
document uses 'setext' style of section headers, using a series of
underline characters, where the character used denotes the level of
the title. From document title to level 5 (see [1]), these characters
are =, -, ~, ^, +. This does not seem to fit the error message, which
implies that those characters denote levels 0 -> 4. Replacing the headings
underlined with '+' by the '^' character eliminates these warnings.
The second set of warnings (only issued by asciidoctor) relate to some
headings which seem to use both arabic and roman numerals as part of
a single 'list' sequence. This elicited either 'unterminated listing
block' or (for example) 'list item index: expected I, got II' warnings.
In order not to mix arabic and roman numerals, remove the numeral from
the '0) Non goals' heading. Similarly, the remaining roman numeral
entries had the ')' removed and turned into regular headings with I, II,
III ... at the beginning.
[1] https://asciidoctor.org/docs/asciidoc-recommended-practices/
Signed-off-by: Ramsay Jones <ramsay@ramsayjones.plus.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The formatting markup syntax used in this document (markdown?) is not
interpreted correctly by asciidoc or asciidoctor. The main problem is
the use of a '## ' prefix markup for some sub-headings, along with the
use of '```' code markup and some missing literal blocks.
In order to improve the (html) document formatting:
- replace the '## ' prefix sub-title syntax with the '~~' underlining
syntax for the relevant sub-headings.
- replace the '```' code markup, which causes asciidoc(tor) to simply
remove the marked up text, with a literal block '----' markup.
- the second ascii diagram, in the 'Merging commit-graph files'
section, is not rendered correctly by asciidoctor (asciidoc is fine)
so enclose it in a '....' block.
Signed-off-by: Ramsay Jones <ramsay@ramsayjones.plus.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Both asciidoc and asciidoctor issue warnings about 'list item index:
expected n got n-1' for n=1->7 on lines 928, 931, 951, 974, 980, 1033
and 1049. In asciidoc, numbered lists must start at one, whereas this
file has a list starting at zero. Also, asciidoc and asciidoctor warn
about 'section title out of sequence: expected level 1, got level 2'
on line 17. (asciidoc only complains about the first instance of this,
while asciidoctor complains about them all, on lines 95, 258, 303, 316,
545, 612, 752, 824, 895, 923 and 1053). These warnings stem from the
section titles not being correctly nested within a document/chapter
title.
In order to address the first set of warnings, simply renumber the list
from one to seven, rather than zero to six. Fortunately, this does not
require altering additional text, since the enumeration of 'Known Bugs'
is not referred to anywhere else in the document.
In order to address the second set of warnings, change the section title
syntax from '=== title ===' to '== title ==', effectively reducing the
nesting level of the title by one. Also, some apparent (sub-)titles are
not marked up with sub-title syntax, so add some '=== ' prefix(s) to the
relevant headings.
In addition to the warnings, address some other formatting issues:
- the use of heavily nested unordered lists is not reflected in the
output (making the file totally unreadable) because each level of
nesting requires a different syntax. (i.e. replace '*' with '**'
for the second level, '*' with '***' for the third level, etc.)
- make use of literal blocks and manual indentation to get asciidoc
and asciidoctor to display even remotely similar output.
- make use of labelled lists, in some places, to get a similar looking
output to the input, for both asciidoc and asciidoctor.
- replace the trailing space in: `git grep ${SEARCH_TERM} OLDREV `
otherwise the entire line in which that appears is removed from
the output.
Signed-off-by: Ramsay Jones <ramsay@ramsayjones.plus.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Both asciidoc and ascidoctor issue warnings about 'list item index:
expected n got n-1' for n=1->9 on lines 13, 15, 17, 20, 23, 25, 29,
31 and 33. In asciidoc, numbered lists must start at one, whereas this
file has a list starting at zero. Also, asciidoc and asciidoctor warn
about 'section title out of sequence: expected level 1, got level 2'
on line 38. (asciidoc only complains about the first instance of this,
while asciidoctor complains about them all, on lines 94, 141, 142,
184, 185, 257, 288, 289, 290, 397, 424, 485, 486 and 487). These
warnings stem from the section titles not being correctly nested within
a document/chapter title.
In order to address the first set of warnings, simply renumber the list
from one to nine, rather than zero to eight. This also requires altering
the text which refers to the section numbers, including other section
titles.
In order to address the second set of warnings, change the section title
syntax from '=== title ===' to '== title ==', effectively reducing the
nesting level of the title by one. Also, some of the titles are given
over multiple lines (they are very long), with an title '===' prefix
on each line. This leads to them being treated as separate sections
with no body text (as you can see from the line numbers given for the
asciidoctor warnings, above). So, for these titles, turn them into a
single (long) line of text.
In addition to the warnings, address some other formatting issues:
- the ascii branch diagrams didn't format correctly on asciidoctor
so include them in a literal block.
- several blocks of text were intended to be formatted 'as is' but
were not included in a literal block.
- in section 8, format the (A)->(D) in the text description as a
literal with `` marks, since (C) is rendered as a copyright
symbol in html otherwise.
- in section 9, a sub-list of two items is not formatted as such.
change the '*' introducer to '**' to correct the sub-list format.
Signed-off-by: Ramsay Jones <ramsay@ramsayjones.plus.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Over the past several dozen commits, we have moved a large amount of
functionality out of the repack builtin and into other files like
repack.c, repack-cruft.c, repack-filtered.c, repack-midx.c, and
repack-promisor.c.
These files specify the minimal set of `#include`s that they need to
compile successfully, but we did not change the set of `#include`s in
the repack builtin itself.
Now that the code movement is complete, let's clean up that set of
`#include`s and trim down the builtin to include the minimal amount of
external headers necessary to compile.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In an identical fashion as the previous commit, move the function
`write_cruft_pack()` into its own compilation unit, and make the
function visible through the repack.h API.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In a similar fashion as in previous commits, move the function
`write_filtered_pack()` out of the builtin and into its own compilation
unit.
This function is now part of the repack.h API, but implemented in its
own "repack-filtered.c" unit as it is a separate component from other
kinds of repacking operations.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The "pack_kept_objects" variable is defined as static to the repack
builtin, but is inherently related to the pack-objects arguments that
the builtin uses when generating new packs.
Move that field into the "struct pack_objects_args", and shuffle around
where we append the corresponding command-line option when preparing a
pack-objects process. Specifically:
- `write_cruft_pack()` always wants to pass "--honor-pack-keep", so
explicitly set the `pack_kept_objects` field to "0" when initializing
the `write_pack_opts` struct before calling `write_cruft_pack()`.
- `write_filtered_pack()` no longer needs to handle writing the
command-line option "--honor-pack-keep" when preparing a pack-objects
process, since its call to `prepare_pack_objects()` will have already
taken care of that.
`write_filtered_pack()` also reads the `pack_kept_objects` field to
determine whether to write the existing kept packs with a leading "^"
character, so update that to read through the `po_args` pointer
instead.
- `cmd_repack()` also no longer has to write the "--honor-pack-keep"
flag explicitly, since this is also handled via its call to
`prepare_pack_objects()`.
Since there is a default value for "pack_kept_objects" that relies on
whether or not we are writing a bitmap (and not writing a MIDX), extract
a default initializer for `struct pack_objects_args` that keeps this
conditional default behavior.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In a similar spirit as the previous commit(s), now that the function
`finish_pack_objects_cmd()` has no explicit dependencies within the
repack builtin, let's extract it.
This prepares us to extract the remaining two functions within the
repack builtin that explicitly write packfiles, which are
`write_cruft_pack()` and `write_filtered_pack()`, which will be done in
the future commits.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
To prepare to move the `finish_pack_objects_cmd()` function out of the
builtin and into the repack.h API, there are a couple of things we need
to do first:
- First, let's take advantage of `write_pack_opts_is_local()` function
introduced in the previous commit instead of passing "local"
explicitly.
- Let's also avoid referring to the static 'packtmp' field within
builtin/repack.c by instead accessing it through the write_pack_opts
argument.
There are three callers which need to adjust themselves in order to
account for this change. The callers which reside in write_cruft_pack()
and write_filtered_pack() both already have an "opts" in scope, so they
can pass it through transparently.
The other call (at the bottom of `cmd_repack()`) needs to initialize its
own write_pack_opts to pass the necessary fields over to the direct call
to `finish_pack_objects_cmd()`.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Similar to the previous commit, the functions `write_cruft_pack()` and
`write_filtered_pack()` both compute a "local" variable via the exact
same mechanism:
const char *scratch;
int local = skip_prefix(opts->destination, opts->packdir, &scratch);
Not only does this cause us to repeat the same pair of lines, it also
introduces an unnecessary "scratch" variable that is common between both
functions.
Instead of repeating ourselves, let's extract that functionality into a
new function in the repack.h API called "write_pack_opts_is_local()".
That function takes a pointer to a "struct write_pack_opts" (which has
as fields both "destination" and "packdir"), and can encapsulate the
dangling "scratch" field.
Extract that function and make it visible within the repack.h API, and
use it within both `write_cruft_pack()` and `write_filtered_pack()`.
While we're at it, match our modern conventions by returning a "bool"
instead of "int", and use `starts_with()` instead of `skip_prefix()` to
avoid storing the dummy "scratch" variable.
The remaining duplication (that is, that both `write_cruft_pack()` and
`write_filtered_pack()` still both call `write_pack_opts_is_local()`)
will be addressed in the following commit.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Both callers within the repack builtin which call functions that take a
'write_pack_opts' structure have the following pattern:
struct write_pack_opts opts = {
.packdir = packdir,
.packtmp = packtmp,
.pack_prefix = find_pack_prefix(packdir, packtmp),
/* ... */
};
int ret = write_some_kind_of_pack(&opts, /* ... */);
, but both "packdir" and "packtmp" are fields within the write_pack_opts
struct itself!
Instead of also computing the pack_prefix ahead of time, let's have the
callees compute it themselves by moving `find_pack_prefix()` out of the
repack builtin, and have it take a write_pack_opts pointer instead of
the "packdir" and "packtmp" fields directly.
This avoids the callers having to do some prep work that is common
between the two of them, but also avoids the potential pitfall of
accidentally writing:
.pack_prefix = find_pack_prefix(packtmp, packdir),
(which is well-typed) when the caller meant to instead write:
.pack_prefix = find_pack_prefix(packdir, packtmp),
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Similar to the changes made in the previous commit to
`write_filtered_pack()`, teach `write_cruft_pack()` to take a
`write_pack_opts` struct and use that where possible.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
There are various functions within the 'repack' builtin which are
responsible for writing different kinds of packs. They include:
- `static int write_filtered_pack(...)`
- `static int write_cruft_pack(...)`
as well as the function `finish_pack_objects_cmd()`, which is
responsible for finalizing a new pack write, and recording the checksum
of its contents in the 'names' list.
Both of these `write_` functions have a few things in common. They both
take a pointer to the 'pack_objects_args' struct, as well as a pair of
character pointers for `destination` and `pack_prefix`.
Instead of repeating those arguments for each function, let's extract an
options struct called "write_pack_opts" which has these three parameters
as member fields. While we're at it, add fields for "packdir," and
"packtmp", both of which are static variables within the builtin, and
need to be read from within these two functions.
This will shorten the list of parameters that callers have to provide to
`write_filtered_pack()`, avoid ambiguity when passing multiple variables
of the same type, and provide a unified interface for the two functions
mentioned earlier.
(Note that "pack_prefix" can be derived on the fly as a function of
"packdir" and "packtmp", making it unnecessary to store "pack_prefix"
explicitly. This commit ignores that potential cleanup in the name of
doing as few things as possible, but a later commit will make that
change.)
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Now that we have sufficiently cleaned up the write_midx_included_packs()
function, we can move it (along with the struct repack_write_midx_opts)
out of the builtin, and into the repack.h header.
Since this function (and the static ones that it depends on) are
MIDX-specific details of the repacking process, move them to the
repack-midx.c compilation unit instead of the general repack.c one.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
To write a MIDX at the end of a repack operation, 'git repack' presently
computes the set of packs to write into the MIDX, before invoking
`write_midx_included_packs()` with a `string_list` containing those
packs.
The logic for computing which packs are supposed to appear in the
resulting MIDX is within `midx_included_packs()`, where it is aware of
details like which cruft pack(s) were written/combined, if/how we did a
geometric repack, etc.
Computing this list ourselves before providing it to the sole function
to make use of that list `write_midx_included_packs()` is somewhat
awkward. In the future, repack will learn how to write incremental
MIDXs, which will use a very different pack selection routine.
Instead of doing something like:
struct string_list included_packs = STRING_LIST_INIT_DUP;
if (incremental) {
midx_incremental_included_packs(&included_packs, ...):
write_midx_incremental_included_packs(&included_packs, ...);
} else {
midx_included_packs(&included_packs, ...):
write_midx_included_packs(&included_packs, ...);
}
in the future, let's have each function that writes a MIDX be
responsible for itself computing the list of included packs. Inline the
declaration and initialization of `included_packs` into the
`write_midx_included_packs()` function itself, and repeat that pattern
in the future when we introduce new ways to write MIDXs.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Instead of passing individual parameters (in this case, "existing",
"names", and "geometry") to `midx_included_packs()`, pass a pointer to a
`repack_write_midx_opts` structure instead.
Besides reducing the number of parameters necessary to call the
`midx_included_packs` function, this refactoring sets us up nicely to
inline the call to `midx_included_packs()` into
`write_midx_included_packs()`, thus making the caller (in this case,
`cmd_repack()`) oblivious to the set of packs being written into the
MIDX.
In order to do this, `repack_write_midx_opts` has to keep track of the
set of existing packs, so add an additional field to point to that set.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
After writing a new MIDX, the repack command removes any bitmaps
belonging to packs which were written into the MIDX.
This is currently done in a separate function outside of
`write_midx_included_packs()`, which forces the caller to keep track of
the set of packs written into the MIDX.
Prepare to no longer require the caller to keep track of such
information by inlining the clean-up into `write_midx_included_packs()`.
Future commits will make the caller oblivious to the set of packs
included in the MIDX altogether.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The next commit will inline the call to `remove_redundant_bitmaps()`
into `write_midx_included_packs()`. Reorder these two functions to avoid
a forward declaration to `remove_redundant_bitmaps()`.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Instead of storing the list of MIDX pack names separately, let's inline
it into the existing_packs struct, further reducing the number of
parameters we have to pass around.
This amounts to adding a new string_list to the existing_packs struct,
and populating it via `existing_packs_collect()`. This is fairly
straightforward to do, since we are already looping over all packs, all
we need to do is:
if (p->multi_pack_index)
string_list_append(&existing->midx_packs, pack_basename(p));
Note, however, that this check *must* come before other conditions where
we discard and do not keep track of a pack, including the condition "if
(!p->pack_local)" immediately below. This is because the existing
routine which collects MIDX pack names does so blindly, and does not
discard, for example, non-local packs.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When writing a new MIDX, repack must determine whether or not there are
any packs in the MIDX it is replacing (if one exists) that are not
somehow represented in the new MIDX (e.g., either by preserving the pack
verbatim, or rolling it up as part of a geometric repack, etc.).
In order to do this, it keeps track of a list of pack names from the
MIDX present in the repository at the start of the repack operation.
Since we manipulate and close the object store, we cannot rely on the
repository's in-core representation of the MIDX, since this is subject
to change and/or go away.
When this behavior was introduced in 5ee86c273b (repack: exclude cruft
pack(s) from the MIDX where possible, 2025-06-23), we maintained an
array of character pointers instead of using a convenience API, such as
string-list.h.
Store the list of MIDX pack names in a string_list, thereby reducing the
number of parameters we have to pass to `midx_has_unknown_packs()`.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The function 'write_midx_included_packs()', which is responsible for
writing a new MIDX with a given set of included packs, currently takes a
list of six arguments.
In order to extract this function out of the builtin, we have to pass
in a few additional parameters, like 'midx_must_contain_cruft' and
'packdir', which are currently declared as static variables within the
builtin/repack.c compilation unit.
Instead of adding additional parameters to `write_midx_included_packs()`
extract out an "opts" struct that names these parameters, and pass a
pointer to that, making it less cumbersome to add additional parameters.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When writing a MIDX, 'git repack' takes a snapshot of the repository's
references and writes the result out to a file, which it then passes to
'git multi-pack-index write' via the '--refs-snapshot'.
This is done in order to make bitmap selections with respect to what we
are packing, thus avoiding a race where an incoming reference update
causes us to try and write a bitmap for a commit not present in the
MIDX.
Extract this functionality out into a new repack-midx.c compilation
unit, and expose the necessary functions via the repack.h API.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Now that the pack_geometry API is fully factored and isolated from the
rest of the builtin, declare it within repack.h and move its
implementation to "repack-geometry.c" as a separate component.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
For similar reasons as the preceding commit, pass the "packdir" variable
directly to `pack_geometry_remove_redundant()` as a parameter to the
function.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Prepare to move pack_geometry-related APIs to their own compilation unit
by passing in the static "pack_kept_objects" variable directly as a
parameter to the 'pack_geometry_init()' function.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Rename functions which work with 'struct pack_geometry' to begin with
"pack_geometry_". While we're at it, change `free_pack_geometry()` to
instead be named `pack_geometry_release()` to match our conventions, and
make clear that that function frees the contents of the struct, not the
memory allocated to hold the struct itself.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Now that we have properly factored the portion of the builtin which is
responsible for repacking promisor objects, we can move that function
(and associated dependencies) out of the builtin entirely.
Similar to previous extractions, this function is declared in repack.h,
but implemented in a separate repack-promisor.c file. This is done to
separate promisor-specific repacking functionality from generic repack
utilities (like "existing_packs", and "generated_pack" APIs).
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In a similar spirit as previous commit(s), pass the "packtmp" variable
to "repack_promisor_objects()" as an explicit parameter of the function,
preparing us to move this function in a following commit.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Now that we have factored the "generated_pack" API, we can move it to
repack.ch, further slimming down builtin/repack.c.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Repeat what was done in the preceding commit for the
`generated_pack_install()` function, which needs both "packdir" and
"packtmp".
(As an aside, it is somewhat unfortunate that the final three parameters
to this function are all "const char *", making errors like passing
"packdir" and "packtmp" in the wrong order easy. We could define a new
structure here, but that may be too heavy-handed.)
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In a similar spirit as previous commits, this function needs to know the
temporary pack prefix, which it currently accesses through the static
"packtmp" variable within builtin/repack.c.
Pass it explicitly as a function parameter to facilitate moving this
function out of builtin/repack.c entirely.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Once all new packs are known to exist, 'repack' installs their contents
from their temporary location into their permanent one. This is a
semi-involved procedure for each pack, since for each extension (e.g.,
".idx", ".pack", ".mtimes", and so on) we have to either:
- adjust the filemode of the temporary file before renaming it into
place, or
- die() if we are missing a non-optional extension, or
- unlink() any existing file for extensions that we did not generate
(e.g., if a non-cruft pack we generated was identical to, say, a
cruft pack which existed at the beginning of the process, we have to
remove the ".mtimes" file).
Extract this procedure into its own function, and call it
"generated_pack_install"(). This will set us up for pulling this
function out of the builtin entirely and making it part of the repack.h
API, which will be done in a future commit.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The name "generated_pack_data" is somewhat redundant, since the contents
of the struct *is* the data associated with the generated pack.
Rename the structure to just "generated_pack", resulting in less awkward
function names, like "generated_pack_has_ext()" which is preferable to
"generated_pack_data_has_ext()".
Rename a few related functions to align with the convention that
functions to do with a struct "S" should be prefixed with "S_".
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The repack builtin defines an API for keeping track of which packs
were found in the repository at the beginning of the repack operation.
This is used to classify what state a pack was in (kept, non-kept, or
cruft), and is also used to mark which packs to delete (or keep) at the
end of a repack operation.
Now that the prerequisite refactoring is complete, this API is isolated
enough that it can be moved out to repack.[ch] and removed from the
builtin entirely.
As a result, some of its functions become static within repack.c,
cleaning up the visible API.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
There are a couple of spots that cause warnings within the
existing_packs API without DISABLE_SIGN_COMPARE_WARNINGS under
DEVELOPER=1 mode.
In both cases, we have int values that are being compared against size_t
ones. Neither of these two cases are incorrect, and the cast is
completely OK in practice. But both are unnecessary, since:
- in existing_packs_mark_for_deletion_1(), 'hexsz' should be defined as
a size_t anyway, since algop->hexsz is.
- in existing_packs_collect(), 'i' should be defined as a size_t since
it is counting up to the value of a string_list's 'nr' field.
(This patch is a little bit of noise, but I would rather see us squelch
these warnings ahead of moving the existing_packs API into a separate
compilation unit to avoid having to define DISABLE_SIGN_COMPARE_WARNINGS
in repack.c.)
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
builtin/repack.c defines a static "packdir" to instruct pack-objects on
where to write any new packfiles. This is also the directory scanned
when removing any packfiles which were made redundant by the latest
repack.
Prepare to move the "existing_packs_remove_redundant" function to its
own compilation unit by passing in this information as a parameter to
that function.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Extract "remove_redundant_pack()" as generic repack-related
functionality by moving its implementation to the repack.[ch]
compilation unit.
This is a prerequisite to moving the "existing_packs" API, which is one
of the callers of this function. (The remaining caller in the pack
geometry code will eventually move to its own compilation unit as well,
and will likewise rely on this function.)
While moving it over, prefix the function name with "repack_" to
indicate that it belongs to the repack-subsystem.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Rename many of the 'struct existing_packs'-related functions according
to the convention introduced in and described by 541204aabe
(Documentation: document naming schema for structs and their functions,
2024-07-30).
Note that some functions which operate over an individual entry in the
list of existing packs are prefixed with "existing_pack_" instead of the
plural form.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Now that the 'prepare_pack_objects' function no longer refers to
external, static variables, move it out to repack.h as generic
functionality.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The static variable 'delta_base_offset' determines whether or not we
pass the "--delta-base-offset" command-line argument when spawning
pack-objects as a child process. Its introduction dates back to when
repack was rewritten in C, all the way back in a1bbc6c017 (repack:
rewrite the shell script in C, 2013-09-15).
'struct pack_objects_args' was introduced much later on in 4571324b99
(builtin/repack.c: allow configuring cruft pack generation, 2022-05-20),
but did not move the 'delta_base_offset' variable.
Since the 'delta_base_offset' is a property of an individual
pack-objects command, re-introduce that variable as a member of 'struct
pack_objects_args', which will enable further code movement in the
subsequent commits.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
A subsequent commit will remove 'delta_base_offset' as a static variable
within builtin/repack.c, and reintroduce it as a member of the 'struct
pack_objects_args'.
As a result, the repack_config callback will need to have both the
cruft- and non-cruft 'struct pack_objects_args's in scope. Introduce a
new 'struct repack_config_ctx' to allow the callee to provide both
pointers to the callback.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Over the years, builtin/repack.c has turned into a grab-bag of
functionality powering the 'git repack' builtin. Among its many
capabilities, it:
- can build and spawn 'git pack-objects' commands, which in turn
generate new packs
- has infrastructure to manage the set of existing packs in a
repository
- has infrastructure to split a sequence of packs into a geometric
progression based on object size
- can manage both generating and combining cruft packs together
- can write new MIDXs
to name a few.
As a result, this builtin has accumulated a lot of code, making adding
new functionality difficult. In the future, 'repack' will learn how to
manage a chain of incremental MIDXs, adding yet more functionality into
the builtin.
As a prerequisite step, let's first move some of the functionality in
the builtin into its own repack.[ch].
This will be done over the course of many steps, since there are many
individual components, some of which will end up in other, yet-to-exist
compilation units of their own. Some of the code movement here is also
non-trivial, so performing it in individual steps will make it easier to
verify.
Let's start by migrating 'struct pack_objects_args' (and the related
corresponding pack_objects_args_release() function) into repack.h, and
teach both the Makefile and Meson how to build the new compilation unit.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In previous commits, we started passing either repository or
git_hash_algo pointers around to various spots within builtin/repack.c
to reduce our dependency on the_repository in the hope of undef'ing
USE_THE_REPOSITORY_VARIABLE.
This commit takes us as far as we can (easily) go in that direction by
removing the only use of a convenience function that only exists when
USE_THE_REPOSITORY_VARIABLE is defined.
Unfortunately, the only other such function is "is_bare_repository()",
which is less than straightforward to convert into, say,
"repo_is_bare()", the latter of the two accepting a repository pointer.
Punt on that for now, and declare this commit as the stopping point for
our efforts in the direction of undef'ing USE_THE_REPOSITORY_VARIABLE.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In a similar spirit as previous commits, avoid referring directly to
"the_hash_algo" in builtin/repack.c::finish_pack_objects_cmd() and
instead accept one as a parameter to the function.
Since this function has a number of callers throughout the builtin, the
diff is a little noisier than previous commits. However, each hunk is
limited to passing the hash_algo parameter from a repository pointer
that is already in scope.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In a similar spirit as the previous commits, avoid referring directly to
"the_hash_algo" within builtin/repack.c::repack_promisor_objects().
Since there is already a repository pointer in scope, use its hash_algo
value instead.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In a similar spirit as the previous commit, avoid referring directly to
"the_hash_algo" within builtin/repack.c::write_oid().
Unlike the previous commit, we are within a callback function, so must
introduce a new struct to pass additional data through its "data"
pointer.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The "mark_packs_for_deletion_1" function uses "the_hash_algo->hexsz" to
isolate a pack's checksum before deleting it to avoid deleting a newly
written pack having the same checksum (that is, some generated pack
wound up identical to an existing pack).
Avoid this by passing down a "struct git_hash_algo" pointer, and refer to
the hash algorithm through it instead.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Pass a "struct repository" pointer to the 'repack_promisor_objects()'
function to avoid using "the_repository".
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The 'remove_redundant_pack()' function uses "the_repository" to obtain,
and optionally remove, the repository's MIDX. Instead of relying on
"the_repository", pass around a "struct repository *" parameter through
its callers, and use that instead.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Avoid using "the_repository" in various MIDX-related ref snapshotting
functions.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
There are a number of spots within builtin/repack.c which refer to
"the_repository", and either make use of the "existing packs" API
or otherwise have a 'struct existing_packs *' in scope.
Add a "repo" member to "struct existing_packs" and use that instead of
"the_repository" in such locations.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Reduce builtin/repack.c's reliance on `the_repository` by using the
currently-UNUSED "repo" parameter within cmd_repack().
The following commits will continue to reduce the usage of
the_repository in other places within builtin/repack.c.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Carry over the fixups from 8c3d7c5f (RelNotes: minor fixups before
2.51.1, 2025-10-15).
Signed-off-by: Kristoffer Haugsbakk <code@khaugsbakk.name>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Update old-style shell path checks to use the modern test
helpers 'test_path_is_file' and 'test_path_is_dir' for improved
runtime diagnosis.
Signed-off-by: Solly <solobarine@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
From user feedback:
- One user is confused about why `git reset --merge`
(why not just `git reset`?). Handle this by mentioning
`git merge --abort` and `git reset --abort` instead, which have a
more obvious meaning.
- 2 users want to know what "In older versions of Git" means exactly
(in versions older than 1.7.0). Handle this by removing the warning
since it was added 15 years ago (in 3f8fc184c0e2c)
Signed-off-by: Julia Evans <julia@jvns.ca>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
From user feedback: this example is confusing because it implies that
`git pull` will run `git merge` by default, but the default is
`--ff-only`.
We could instead show an example of a fast-forward merge, but that may
not add a lot since fast-forward merges are relatively simple. This lets
us keep the description short.
Signed-off-by: Julia Evans <julia@jvns.ca>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
From user feedback:
- One user is confused about the current default ("I was convinced that
the git default was still to merge on pull")
- One user is confused about why "git fetch" isn't mentioned earlier
- One user says they always forget what the arguments to `git pull` are
and that it's not immediately obvious that `--no-rebase` means "merge"
- One user wants `--ff-only` to be mentioned
Resolve this by listing the options for integrating the the remote
branch. This should help users figure out at a glance which one they
want to do, and make it clearer that --ff-only is the default.
Signed-off-by: Julia Evans <julia@jvns.ca>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
From user feedback:
- it's confusing that we use both <branch> and <refspec> to refer to the
second argument
- one user is not clear about what `refs/heads/*:refs/remotes/origin/*`
is meant to be an example of ("is it like a path?")
The DESCRIPTION section is also doing a lot right now: it's trying to
describe both how the <repository> and <refspec> arguments work (which
is pretty complex, as seen in the DEFAULT BEHAVIOUR section)
as well as how `git pull` calls `git fetch` and merge/rebase/etc
depending on the arguments.
Handle this by moving the description of the <repository> and <refspec>
arguments to the OPTIONS section, so that we can focus on the
merge/rebase/etc behaviour in the DESCRIPTION section, and refer folks
to the later sections for details.
Use the term "upstream" instead of 'the "remote" and "merge"
configuration for the current branch' since users are more likely to
know what an "upstream" is.
Signed-off-by: Julia Evans <julia@jvns.ca>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Documentation mark-up fix.
* ja/doc-markup-attached-paragraph-fix:
doc: fix indentation of refStorage item in git-config(1)
doc: change the markup of paragraphs following a nested list item
Clarify the "--merge-base" command line option in "git merge-tree".
* en/doc-merge-tree-describe-merge-base:
Documentation/git-merge-tree.adoc: clarify the --merge-base option
Deal more gracefully with directory / file conflicts when the files
backend is used for ref storage, by failing only the ones that are
involved in the conflict while allowing others.
* kn/refs-files-case-insensitive:
refs/files: handle D/F conflicts during locking
refs/files: handle F/D conflicts in case-insensitive FS
refs/files: use correct error type when lock exists
refs/files: catch conflicts on case-insensitive file-systems
"git rebase -i" failed to clean-up the commit log message when the
command commits the final one in a chain of "fixup" commands, which
has been corrected.
* pw/rebase-i-cleanup-fix:
sequencer: remove VERBATIM_MSG flag
rebase -i: respect commit.cleanup when picking fixups
Some among "git add -p" and friends ignored color.diff and/or
color.ui configuration variables, which is an old regression, which
has been corrected.
* jk/add-i-color:
contrib/diff-highlight: mention interactive.diffFilter
add-interactive: manually fall back color config to color.ui
add-interactive: respect color.diff for diff coloring
stash: pass --no-color to diff plumbing child processes
A corner case bug in "git log -L..." has been corrected.
* sg/line-log-boundary-fixes:
line-log: show all line ranges touched by the same diff range
line-log: fix assertion error
A broken or malicious "git fetch" can say that it has the same
object for many many times, and the upload-pack serving it can
exhaust memory storing them redundantly, which has been corrected.
* ps/upload-pack-oom-protection:
upload-pack: don't ACK non-commits repeatedly in protocol v2
t5530: modernize tests
Fixes multiple crashes around midx write-out codepaths.
* ds/midx-write-fixes:
midx-write: simplify error cases
midx-write: reenable signed comparison errors
midx-write: use uint32_t for preferred_pack_idx
midx-write: use cleanup when incremental midx fails
midx-write: put failing response value back
midx-write: only load initialized packs
"git repack --path-walk" lost objects in some corner cases, which
has been corrected.
cf. <CABPp-BHFxxGrqKc0m==TjQNjDGdO=H5Rf6EFsf2nfE1=TuraOQ@mail.gmail.com>
* ds/path-walk-repack-fix:
path-walk: create initializer for path lists
path-walk: fix setup of pending objects
Under a race against another process that is repacking the
repository, especially a partially cloned one, "git fetch" may
mistakenly think some objects we do have are missing, which has
been corrected.
* jk/fetch-check-graph-objects-fix:
fetch-pack: re-scan when double-checking graph objects
Various options to "git diff" that makes comparison ignore certain
aspects of the differences (like "space changes are ignored",
"differences in lines that match these regular expressions are
ignored") did not work well with "--name-only" and friends.
* ly/diff-name-only-with-diff-from-content:
diff: ensure consistent diff behavior with ignore options
"git diff --no-index" run inside a subdirectory under control of a
Git repository operated at the top of the working tree and stripped
the prefix from the output, and oddballs like "-" (stdin) did not
work correctly because of it. Correct the set-up by undoing what
the set-up sequence did to cwd and prefix.
* jc/diff-no-index-in-subdir:
diff: --no-index should ignore the worktree
Various bugs about rename handling in "ort" merge strategy have
been fixed.
* en/ort-rename-fixes:
merge-ort: fix directory rename on top of source of other rename/delete
merge-ort: fix incorrect file handling
merge-ort: clarify the interning of strings in opt->priv->path
t6423: fix missed staging of file in testcases 12i,12j,12k
t6423: document two bugs with rename-to-self testcases
merge-ort: drop unnecessary temporary in check_for_directory_rename()
merge-ort: update comments to modern testfile location
"git push" had a code path that led to BUG() but it should have
been a die(), as it is a response to a usual but invalid end-user
action to attempt pushing an object that does not exist.
cf. <xmqqo6spiyqp.fsf@gitster.g>
* dl/push-missing-object-error:
remote.c: convert if-else ladder to switch
remote.c: remove BUG in show_push_unqualified_ref_name_error()
t5516: remove surrounding empty lines in test bodies
"git refs migrate" to migrate the reflog entries from a refs
backend to another had a handful of bugs squashed.
* ps/reflog-migrate-fixes:
refs: fix invalid old object IDs when migrating reflogs
refs: stop unsetting REF_HAVE_OLD for log-only updates
refs/files: detect race when generating reflog entry for HEAD
refs: fix identity for migrated reflogs
ident: fix type of string length parameter
builtin/reflog: implement subcommand to write new entries
refs: export `ref_transaction_update_reflog()`
builtin/reflog: improve grouping of subcommands
Documentation/git-reflog: convert to use synopsis type
During interactive rebase, using 'drop' on a merge commit lead to
an error, which was incorrect.
* js/rebase-i-allow-drop-on-a-merge:
rebase -i: permit 'drop' of a merge commit
Grammar and typo fixes. Also change “work it around” to “work around”.
Signed-off-by: Kristoffer Haugsbakk <code@khaugsbakk.name>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The "files" backend has the ability to store symbolic refs as symbolic
links, which can be configured via "core.preferSymlinkRefs". This
feature stems back from the early days: the initial implementation of
symbolic refs used symlinks exclusively. The symref format was only
introduced in 9b143c6e15 (Teach update-ref about a symbolic ref stored
in a textfile., 2005-09-25) and made the default in 9f0bb90d16
(core.prefersymlinkrefs: use symlinks for .git/HEAD, 2006-05-02).
This is all about 20 years ago, and there are no known reasons nowadays
why one would want to use symlinks instead of symrefs. Mark the feature
for deprecation in Git 3.0.
Note that this only deprecates _writing_ symrefs as symbolic links.
Reading such symrefs is still supported for now.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The initial patch series that introduced Rust into the core of Git only
cared about macOS and Linux. This specifically leaves out Windows, which
indeed fails to build right now due to two issues:
- The Rust runtime requires `GetUserProfileDirectoryW()`, but we don't
link against "userenv.dll".
- The path of the Rust library built on Windows is different than on
most other systems systems.
Fix both of these issues to support Windows.
Note that this commit fixes the Meson-based job in GitHub's CI. Meson
auto-detects the availability of Rust, and as the Windows runner has
Rust installed by default it already enabled Rust support there. But due
to the above issues that job fails consistently.
Install Rust on GitLab CI, as well, to improve test coverage there.
Based-on-patch-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Based-on-patch-by: Ezekiel Newren <ezekielnewren@gmail.com>
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In the current state of our Rust code base we don't really have any
requirements for the minimum supported Rust version yet, as we don't use
any features introduced by a recent version of Rust. Consequently, we
have decided that we want to aim for a rather old version and edition of
Rust, where the hope is that using an old version will make alternatives
like gccrs viable earlier for compiling Git.
But while we specify the Rust edition, we don't yet specify a Rust
version. And even if we did, the Rust version would only be enforced for
our own code, but not for any of our dependencies.
We don't yet have any dependencies at the current point in time. But
let's add some safeguards by specifying the minimum supported Rust
version and using cargo-msrv(1) to verify that this version can be
satisfied for all of our dependencies.
Note that we fix the version of cargo-msrv(1) at v0.18.1. This is the
latest release supported by Ubuntu's Rust version.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Introduce a CI check that uses Clippy to perform checks for common
mistakes and suggested code improvements. Clippy is the official static
analyser of the Rust project and thus the de-facto standard.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The `decode_varint()` and `encode_varint()` functions in our Rust crate
are reimplementations of the respective C functions. As such, we are
naturally forced to use the same interface in both Rust and C, which
makes use of raw pointers. The consequence is that the code needs to be
marked as unsafe in Rust.
It is common practice in Rust to provide safety documentation for every
block that is marked as unsafe. This common practice is also enforced by
Clippy, Rust's static analyser. We don't have Clippy wired up yet, and
we could of course just disable this check. But we're about to wire it
up, and it is reasonable to always enforce documentation for unsafe
blocks.
Add such safety comments to already squelch those warnings now. While at
it, also document the functions' behaviour.
Helped-by: "brian m. carlson" <sandals@crustytoothpaste.net>
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Introduce a CI check that verifies that our Rust code is well-formatted.
This check uses `cargo fmt`, which is a wrapper around rustfmt(1) that
executes formatting for all Rust source files. rustfmt(1) itself is the
de-facto standard for formatting code in the Rust ecosystem.
The rustfmt(1) tool allows to tweak the final format in theory. In
practice though, the Rust ecosystem has aligned on style "editions".
These editions only exist to ensure that any potential changes to the
style don't cause reformats to existing code bases. Other than that,
most Rust projects out there accept this default style of a specific
edition.
Let's do the same and use that default style. It may not be anyone's
favorite, but it is consistent and by making it part of our CI we also
enforce it right from the start.
Note that we don't have to pick a specific style edition here, as the
edition is automatically derived from the edition we have specified in
our "Cargo.toml" file.
The implemented script looks somewhat weird as we perfom manual error
handling instead of using something like `set -e`. The intent here is
that subsequent commits will add more checks, and we want to execute all
of these checks regardless of whether or not a previous check failed.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When installing dependencies we first check for the distribution that is
in use and then we check for the specific job. In the first step we
already install all dependencies required to build and test Git, whereas
the second step installs a couple of additional dependencies that are
only required to perform job-specific tasks.
In both steps we use `apt-get update` to update our repository sources.
This is unnecessary though: all platforms that use Aptitude would have
already executed this command in the distro-specific step anyway.
Drop the redundant calls.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Our CI script requires "sudo" that can be told to preserve
environment, but Ubuntu replaced with "sudo" with an implementation
that lacks the feature. Work this around by reinstalling the
original version.
* ps/ci-avoid-broken-sudo-on-ubuntu:
ci: fix broken jobs on Ubuntu 25.10 caused by switch to sudo-rs(1)
Adjust to the way newer versions of cURL selectivel enables tracing
options, so that our tests can continue to work.
* jk/curl-global-trace-components:
curl: add support for curl_global_trace() components
Fix missing single-quote pairs in a documentation page.
* kh/doc-interpret-trailers-markup-fix:
doc: interpret-trailers: close all pairs of single quotes
Makefile tried to run multiple "cargo build" which would not work
very well; serialize their execution to work it around.
* da/cargo-serialize:
Makefile: build libgit-rs and libgit-sys serially
The start_delayed_progress() function in the progress eye-candy API
did not clear its internal state, making an initial delay value
larger than 1 second ineffective, which has been corrected.
* js/progress-delay-fix:
progress: pay attention to (customized) delay time
A few places where an size_t value was cast to curl_off_t without
checking has been updated to use the existing helper function.
* js/curl-off-t-fixes:
http-push: avoid new compile error
imap-send: be more careful when casting to `curl_off_t`
http: offer to cast `size_t` to `curl_off_t` safely
Clang-format update to let our control macros formatted the way we
had them traditionally, e.g., "for_each_string_list_item()" without
space before the parentheses.
* jt/clang-format-foreach-wo-space-before-parenthesis:
clang-format: exclude control macros from SpaceBeforeParens
Update the instruction to use of GGG in the MyFirstContribution
document to say that a GitHub PR could be made against `git/git`
instead of `gitgitgadget/git`.
* ds/doc-ggg-pr-fork-clarify:
doc: clarify which remotes can be used with GitGitGadget
The compatObjectFormat extension is used to hide an incomplete
feature that is not yet usable for any purpose other than
developing the feature further. Document it as such to discourage
its use by mere mortals.
* bc/doc-compat-object-format-not-working:
docs: note that extensions.compatobjectformat is incomplete
The "do you still use it?" message given by a command that is
deeply deprecated and allow us to suggest alternatives has been
updated.
* kh/you-still-use-whatchanged-fix:
BreakingChanges: remove claim about whatchanged reports
whatchanged: remove not-even-shorter clause
whatchanged: hint about git-log(1) and aliasing
you-still-use-that??: help the user help themselves
t0014: test shadowing of aliases for a sample of builtins
git: allow alias-shadowing deprecated builtins
git: move seen-alias bookkeeping into handle_alias(...)
git: add `deprecated` category to --list-cmds
Makefile: don’t add whatchanged after it has been removed
Configuration variables that take a pathname as a value
(e.g. blame.ignorerevsfile) can be marked as optional by prefixing
":(optoinal)" before its value.
* jc/optional-path:
parseopt: values of pathname type can be prefixed with :(optional)
config: values of pathname type can be prefixed with :(optional)
t7500: fix GIT_EDITOR shell snippet
t7500: make each piece more independent
"git format-patch --range-diff=... --notes=..." did not drive the
underlying range-diff with correct --notes parameter, ending up
comparing with different set of notes from its main patch output
you would get from "git format-patch --notes=..." for a singleton
patch.
* kh/format-patch-range-diff-notes:
format-patch: handle range-diff on notes correctly for single patches
revision: add rdiff_log_arg to rev_info
range-diff: rename other_arg to log_arg
A lot of code clean-up of xdiff.
Split out of a larger topic.
* en/xdiff-cleanup:
xdiff: change type of xdfile_t.changed from char to bool
xdiff: add macros DISCARD(0), KEEP(1), INVESTIGATE(2) in xprepare.c
xdiff: rename rchg -> changed in xdfile_t
xdiff: delete chastore from xdfile_t
xdiff: delete fields ha, line, size in xdlclass_t in favor of an xrecord_t
xdiff: delete redundant array xdfile_t.ha
xdiff: delete struct diffdata_t
xdiff: delete local variables that alias fields in xrecord_t
xdiff: delete superfluous function xdl_get_rec() in xemit
xdiff: delete unnecessary fields from xrecord_t and xdfile_t
xdiff: delete local variables and initialize/free xdfile_t directly
xdiff: delete static forward declarations in xprepare
Marking a hunk 'selected' in "git add -p" and then splitting made
all the split pieces 'selected'; this has been changed to make them
all 'undecided', which gives better end-user experience.
* pw/add-p-hunk-splitting-fix:
add-patch: update hunk splitability after editing
add -p: mark split hunks as undecided
The "string-list" API function to find where a given string would
be inserted got updated so that it can use unrealistically huge
array index that would only fit in size_t but not int or ssize_t
to achieve unstated goal.
* sj/string-list:
refs: enable sign compare warnings check
string-list: change "string_list_find_insert_index" return type to "size_t"
string-list: replace negative index encoding with "exact_match" parameter
string-list: use bool instead of int for "exact_match"
Documentation for "git log --pretty" options has been updated
to make it easier to translate.
* jn/doc-help-translaing-pretty-options:
doc: do not break sentences into "lego" pieces
The reftable backend learned to sanity check its on-disk data more
carefully.
* kn/reftable-consistency-checks:
refs/reftable: add fsck check for checking the table name
reftable: add code to facilitate consistency checks
fsck: order 'fsck_msg_type' alphabetically
Documentation/fsck-msgids: remove duplicate msg id
reftable: check for trailing newline in 'tables.list'
refs: move consistency check msg to generic layer
refs: remove unused headers
Code clean-up around commit-graph.
* ps/commit-graph-per-object-source:
commit-graph: pass graphs that are to be merged as parameter
commit-graph: return commit graph from `repo_find_commit_pos_in_graph()`
commit-graph: return the prepared commit graph from `prepare_commit_graph()`
revision: drop explicit check for commit graph
blame: drop explicit check for commit graph
Documentation mark-up fix.
* ja/doc-markup-attached-paragraph-fix:
doc: fix indentation of refStorage item in git-config(1)
doc: change the markup of paragraphs following a nested list item
Our CI script requires "sudo" that can be told to preserve
environment, but Ubuntu replaced with "sudo" with an implementation
that lacks the feature. Work this around by reinstalling the
original version.
* ps/ci-avoid-broken-sudo-on-ubuntu:
ci: fix broken jobs on Ubuntu 25.10 caused by switch to sudo-rs(1)
In b0b910e052 (cat-file.c: add batch handling for submodules,
2025-06-02), we began handling submodule entries specially when batching
cat-file like so:
$ echo :sha1collisiondetection | git.compile cat-file --batch-check
855827c583bc30645ba427885caa40c5b81764d2 submodule
Commit b0b910e052 notes that submodules are handled differently than
non-existent objects, which print "<given-name> <type>", since there is
(a) no object to resolve the OID of in the first place, and as commit
b0b910e052 notes, (b) for submodules in particular, it is useful to know
what commit it points at without having to spawn another Git process.
That commit does so by calling report_object_status() and passing in
"oid_to_hex(&data->oid)" for the "obj_name" parameter. This is
unnecessary, however, since report_object_status() will do the same
automatically if given a NULL "obj_name" argument.
That behavior dates back to 6a951937ae (cat-file: add
--batch-all-objects option, 2015-06-22), so rely on that instead of
having the caller open-code that part of report_object_status().
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Convert this command documentation to the modern synopsis style based on
similar work.[1] Concretely:
• Change the Synopsis section from `verse` to a `synopsis` block which
will automatically apply the correct formatting to various elements
(although this Synopsis is very simple)
• Use backticks (`) for code-like things which will also use the correct
formatting for interior placeholders (`<orderfile>`)
• Use inline-verbatim on options listing
† 1: E.g.,
• 026f2e3b (doc: convert git-log to new documentation format,
2025-07-07)
• b983aaab (doc: convert git-switch manpage to new synopsis style,
2025-05-25)
• 16543967 (doc: convert git-mergetool manpage to new synopsis
style, 2025-05-25)
Signed-off-by: Kristoffer Haugsbakk <code@khaugsbakk.name>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Recently, eaaddf5791 (fast-import: add '--signed-commits=<mode>'
option, 2025-09-17) added support for controlling how signed commits
are handled by `git fast-import`, but there is no option yet to
decide about signed tags.
To remediate that, let's add a '--signed-tags=<mode>' option to
`git fast-import` too.
With this, both `git fast-export` and `git fast-import` have both
a '--signed-tags=<mode>' and a '--signed-commits=<mode>' supporting
the same <mode>s.
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Currently the handle_tag() function in "builtin/fast-export.c" searches
only for "\n-----BEGIN PGP SIGNATURE-----\n" in the tag message to find
a tag signature.
This doesn't handle all kinds of OpenPGP signatures as some can start
with "-----BEGIN PGP MESSAGE-----" too, and this doesn't handle SSH and
X.509 signatures either as they use "-----BEGIN SSH SIGNATURE-----" and
"-----BEGIN SIGNED MESSAGE-----" respectively.
To handle all these kinds of tag signatures supported by Git, let's use
the parse_signed_buffer() function to properly find signatures in tag
messages.
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In "t9350-fast-export.sh", these existing tests:
- 'fast-export | fast-import when main is tagged'
- 'cope with tagger-less tags'
are checking the number of annotated tags in the test repo by comparing
it with some hardcoded values.
This could be an issue if some new tests that have some prerequisites
add new annotated tags to the repo before these existing tests. When
the prerequisites would be satisfied, the number of annotated tags
would be different from when some prerequisites would not be satisfied.
As we are going to add new tests that add new annotated tags in a
following commit, let's properly count the number of annotated tag in
the repo by incrementing a counter each time a new annotated tag is
added, and then by comparing the number of annotated tags to the value
of the counter when checking the number of annotated tags.
This is a bit ugly, but it makes it explicit that some tests are
interdependent. Alternative solutions, like moving the new tests to
the end of the script, were considered, but were rejected because they
would instead hide the technical debt and could confuse developers in
the future.
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When the 'GPG' prereq is lazily tested, `mkdir "$GNUPGHOME"` could
fail if the "$GNUPGHOME" directory already exists. This can happen if
the 'GPGSM' or the 'GPGSSH' prereq has been lazily tested before as they
already create "$GNUPGHOME".
To allow the GPGSM or the GPGSSH prereq to appear before the GPG prereq
in some test scripts, let's refactor the creation and setup of the
"$GNUPGHOME"` directory in a new prepare_gnupghome() function that uses
`mkdir -p "$GNUPGHOME"`.
This will be useful in a following commit.
Unfortunately the new prepare_gnupghome() function cannot be used when
lazily testing the GPG2 prereq, because that would expose existing,
hidden bugs in "t1016-compatObjectFormat.sh", so let's just document
that with a NEEDSWORK comment.
Helped-by: Todd Zullinger <tmz@pobox.com>
Helped-by: Collin Funk <collin.funk1@gmail.com>
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
It looks like the documentation of `git tag` is focused a bit too
much on GPG signed tags.
This starts with the "NAME" section where the command is described
with:
"Create, list, delete or verify a tag object signed with GPG"
while for example `git branch` is described with simply:
"List, create, or delete branches"
This could give the false impression that `git tag` only works with
tag objects, not with lightweight tags, and that tag objects are
always GPG signed.
In the "DESCRIPTION" section, it looks like only "GnuPG signed tag
objects" can be created by the `-s` and `-u <key-id>` options, and it
seems `gpg.program` can only specify a "custom GnuPG binary".
This goes on in the "OPTIONS" section too, especially about the `-s`
and `-u <key-id>` options.
The "CONFIGURATION" section also doesn't talk about how to configure
the command to work with X.509 and SSH signatures.
Let's rework all that to make sure users have a more accurate and
balanced view of what the command can do.
Helped-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Ubuntu 25.10 has been released. One prominent change in this version of
Ubuntu is the switch to some Rust-based utilities. Part of this switch
is also that Ubuntu now defaults to sudo-rs(1).
Unfortunately, this breaks our CI because sudo-rs(1) does not support
the `--preserve-env` flag. Let's revert back to the C-based sudo(1)
implementation to fix this.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
c348192a (t1016: clean up style, 2024-10-22) fixed a coding style
violation that has an extra space between redirection operator ">"
and the redirection target, but at the same time, replaced the use
of "git config" to set a configuration variable to be used by the
remainder of tests with "test_config". The pattern employed here is
that the first set-up test prepares the environment to be used by
subsequent tests, which then use the settings left by this set-up
test to perform their tasks. Using test_config in the first set-up
test means the config setting made by the set-up test is reverted at
the end of the first set-up test, which totally misses the point.
Go back to use "git config" to fix this.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Clarify the "--merge-base" command line option in "git merge-tree".
* en/doc-merge-tree-describe-merge-base:
Documentation/git-merge-tree.adoc: clarify the --merge-base option
GitLab CI improvements.
* ps/gitlab-ci-windows-improvements:
t8020: fix test failure due to indeterministic tag sorting
gitlab-ci: upload Meson test logs as JUnit reports
gitlab-ci: drop workaround for Python certificate store on Windows
gitlab-ci: ignore failures to disable realtime monitoring
gitlab-ci: dedup instructions to disable realtime monitoring
Make sure that normal paragraphs in most user-facing docs[1] don’t
use literal blocks. This can easily happen if you try to maintain
indentation in order to continue a block; that might work in
e.g. Markdown variants, but not in AsciiDoc.
The fixes are straightforward, i.e. just deindent the block and maybe
add line continuations. The only exception is git-sparse-checkout(1)
where we also replace indentation used for *intended* literal blocks
with `----`.
† 1: These have not been considered:
• `Documentation/howto/`
• `Documentation/technical/`
• `Documentation/gitprotocol*`
Signed-off-by: Kristoffer Haugsbakk <code@khaugsbakk.name>
Acked-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The GPG2 prereq added in 2f36339fa8 (t/lib-gpg: introduce new prereq
GPG2, 2023-06-04) does not create the $GNUPGHOME directory.
Tests which use the GPG2 prereq without previously using the GPG prereq
fail because of the missing directory. This currently affects
t1016-compatObjectFormat.
Ensure $GNUPGHOME is created in the GPG2 prereq.
Signed-off-by: Todd Zullinger <tmz@pobox.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We create the $GNUPGHOME directory in both the GPG and GPGSSH prereqs.
Replace the redundancy with a function.
Use `mkdir -p` to ensure we do not fail if a test includes more than one
of these prereqs.
Signed-off-by: Todd Zullinger <tmz@pobox.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
With 9842c0c749 (stash: honor stash.index in apply, pop modes,
2025-09-21) merged in a5d4779e6e (Merge branch 'dk/stash-apply-index',
2025-09-29), we did not advertise the connection between the new config
option stash.index and the implicit use of git-stash via --autostash
(which may also be configured). Do so.
Signed-off-by: D. Ben Knoble <ben.knoble+github@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When hash compatibility mode is enabled, we cannot write broken objects
because they cannot be mapped into the other hash algorithm. Use the
BROKEN_OBJECTS prerequisite to disable these tests and the writing of
broken objects in this mode.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We want to specify a compatibility hash for testing interactions for
SHA-256 repositories where we have SHA-1 compatibility enabled. Allow
the user to specify this scenario in the test suite by setting
GIT_TEST_DEFAULT_HASH to "sha256:sha1".
Note that this will get passed into GIT_DEFAULT_HASH, which Git itself
does not presently support. However, we will support this in a future
commit.
Since we'll now want to know the value for a specific version, let's add
the ability to specify either the storage hash (in this case, SHA-256)
or the compatibility hash (SHA-1). We use a different value for the
compatibility hash that will be enabled for all repositories
(test_repo_compat_hash_algo) versus the one that is used individually in
some tests (test_compat_hash_algo), since we want to still run those
individual tests without requiring that the testsuite be run fully in a
compatibility mode.
In some cases, we'll need to adjust our test suite to work in a proper
way with a compatibility hash. For example, in such a case, we'll only
use pack index v3, since v1 and v2 lack support for multiple algorithms.
Since we won't want to write those older formats, we'll need to skip
tests that do so. Let's add a COMPAT_HASH prerequisite for this
purpose.
Finally, in this scenario, we can no longer rely on having broken
objects work since we lack compatibility mappings to rewrite objects in
the repository. Add a prerequisite, BROKEN_OBJECTS, that we define in
terms of COMPAT_HASH and checks to see if creating deliberately broken
objects is possible, so that we can disable these tests if not.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When we're creating a tag, we want to make sure that gpgsig and
gpgsig-sha256 headers are allowed for the commit. The default fsck
behavior is to ignore the fact that they're left over, but some of our
tests enable strict checking which flags them nonetheless. Add
improved checking for these headers as well as documentation and several
tests.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Right now, we have a way to print the storage hash, the input hash, and
the output hash, but we lack a way to print the compatibility hash. Add
a new type to --show-object-format, compat, which prints this value.
If no compatibility hash exists, simply print a newline. This is
important to allow users to use multiple options at once while still
getting unambiguous output.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We currently have no documentation for how loose objects are stored.
Let's add some here so it's easy for people to understand how they
work.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
It is fair to say that our pack and indexing code is quite complex.
Contributors who wish to work on this code or implementors of other
implementations would benefit from clear, unambiguous documentation
about how our data formats are structured and encoded and what data is
used in the computation of certain values. Unfortunately, some of this
data is missing, which leads to confusion and frustration.
Let's document some of this data to help clarify things. Specify over
what data CRC32 values are computed and also note which CRC32 algorithm
is used, since Wikipedia mentions at least four 32-bit CRC algorithms
and notes that it's possible to use different bit orderings.
In addition, note how we encode objects in the pack. One might be led
to believe that packed objects are always stored with the "<type>
<size>\0" prefix of loose objects, but that is not the case, although
for obvious reasons this data is included in the computation of the
object ID. Explain why this is for the curious reader.
Finally, indicate what the size field of the packed object represents.
Otherwise, a reader might think that the size of a delta is the size of
the full object or that it might contain the offset or object ID,
neither of which are the case. Explain clearly, however, that the
values represent uncompressed sizes to avoid confusion.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The documentation for the hash function transition reflects the original
design where the SHA-256 signature would always be placed in a header.
However, due to a missed patch in Git 2.29, we shipped SHA-256 support
such that the signature for the current algorithm is always an in-body
signature and the opposite algorithm is always in a header. Since the
documentation is inaccurate, update it to reflect the correct
information.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The current design of pack index v3 has items in two different orders:
sorted shortened object ID order and pack order. The shortened object
IDs and the pack index offset values are in the former order and
everything else is in the latter.
This, however, poses some problems. We have many parts of the packfile
code that expect to find out data about an object knowing only its index
in pack order. With the current design, to find the pack offset after
having looked up the index in pack order, we must then look up the full
object ID and use that to look up the shortened object ID to find the
pack offset, which is inconvenient, inefficient, and leads to poor cache
usage.
Instead, let's change the offset values to be looked up by pack order.
This works better because once we know the pack order offset, we can
find the full object name and its location in the pack with a simple
index into their respective tables. This makes many operations much
more efficient, especially with the functions we already have, and it
avoids the need for the revindex with pack index v3.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Our current pack index v3 format uses 4-byte integers to find the
trailer of the file. This effectively means that the file cannot be
much larger than 2^32. While this might at first seem to be okay, we
expect that each object will have at least 64 bytes worth of data, which
means that no more than about 67 million objects can be stored.
Again, this might seem fine, but unfortunately, we know of many users
who attempt to create repos with extremely large numbers of commits to
get a "high score," and we've already seen repositories with at least 55
million commits. In the interests of gracefully handling repositories
even for these well-intentioned but ultimately misguided users, let's
change these lengths to 8 bytes.
For the checksums at the end of the file, we're producing 32-byte
SHA-256 checksums because that's what we already do with pack index v2
and SHA-256. Truncating SHA-256 doesn't pose any actual security
problems other than those related to the reduced size, but our pack
checksum must already be 32 bytes (since SHA-256 packs have 32-byte
checksums) and it simplifies the code to use the existing hashfile logic
for these cases for the index checksum as well.
In addition, even though we may not need cryptographic security for the
index checksum, we'd like to avoid arguments from auditors and such for
organizations that may have compliance or security requirements. Using
the simple, boring choice of the full SHA-256 hash avoids all possible
discussion related to hash truncation and removes impediments for these
organizations.
Note that we do not yet have a pack index v3 implementation in Git, so
it should be fine to change this format. However, such an
implementation has been written for future inclusion following this
format.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When `NO_SYMLINK_HEAD` is defined, `create_ref_symlink()` is hard-coded
as `(-1)`, and as a consequence the condition `!create_ref_symlink()`
always evaluates to false, rendering any code guarded by that condition
unreachable.
Therefore, clang is _technically_ correct when it complains about
unreachable code. It does completely miss the fact that this is okay
because on _other_ platforms, where `NO_SYMLINK_HEAD` is not defined,
the code isn't unreachable at all.
Let's use the same trick as in 82e79c63642c (git-compat-util: add
NOT_CONSTANT macro and use it in atfork_prepare(), 2025-03-17) to
appease clang while at the same time keeping the `-Wunreachable` flag
to potentially find _actually_ unreachable code.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
It allows for more consistent patches that way.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We want to make them relative to the top-level directory.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Dip our toes a bit to (optionally) use Rust implemented helper
called from our C code.
* ps/rust-balloon:
ci: enable Rust for breaking-changes jobs
ci: convert "pedantic" job into full build with breaking changes
BreakingChanges: announce Rust becoming mandatory
varint: reimplement as test balloon for Rust
varint: use explicit width for integers
help: report on whether or not Rust is enabled
Makefile: introduce infrastructure to build internal Rust library
Makefile: reorder sources after includes
meson: add infrastructure to build internal Rust library
Handling of an empty subdirectory of .git/refs/ in the ref-files
backend has been corrected.
* kn/ref-cache-seek-fix:
refs/ref-cache: fix SEGFAULT when seeking in empty directories
"git reflog write" did not honor the configured user.name/email
which has been corrected.
* ml/reflog-write-committer-info-fix:
builtin/reflog: respect user config in "write" subcommand
Occasionally there are efforts to contribute to the Git project that
span more than one patch series in order to achieve a broader goal. By
convention, the maintainer has typically suffixed the topic names with
"-part-one", or "-part-1" and so on.
Document that convention and suggest some guidance on how to structure
proposed topic names for multi-series efforts.
Suggested-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In d255105c99 (SubmittingPatches: release-notes entry experiment,
2024-03-25), we began an experiment to have contributors suggest a topic
description to appear in our RelNotes and "What's cooking?" reports.
Extend that experiment to also welcome suggested topic branch names in
addition to descriptions.
Suggested-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
A few places where an size_t value was cast to curl_off_t without
checking has been updated to use the existing helper function.
* js/curl-off-t-fixes:
http-push: avoid new compile error
imap-send: be more careful when casting to `curl_off_t`
http: offer to cast `size_t` to `curl_off_t` safely
Clang-format update to let our control macros formatted the way we
had them traditionally, e.g., "for_each_string_list_item()" without
space before the parentheses.
* jt/clang-format-foreach-wo-space-before-parenthesis:
clang-format: exclude control macros from SpaceBeforeParens
Code clean-up around the in-core list of all the pack files and
object database(s).
* ps/packfile-store:
packfile: refactor `get_packed_git_mru()` to work on packfile store
packfile: refactor `get_all_packs()` to work on packfile store
packfile: refactor `get_packed_git()` to work on packfile store
packfile: move `get_multi_pack_index()` into "midx.c"
packfile: introduce function to load and add packfiles
packfile: refactor `install_packed_git()` to work on packfile store
packfile: split up responsibilities of `reprepare_packed_git()`
packfile: refactor `prepare_packed_git()` to work on packfile store
packfile: reorder functions to avoid function declaration
odb: move kept cache into `struct packfile_store`
odb: move MRU list of packfiles into `struct packfile_store`
odb: move packfile map into `struct packfile_store`
odb: move initialization bit into `struct packfile_store`
odb: move list of packfiles into `struct packfile_store`
packfile: introduce a new `struct packfile_store`
* ps/gitlab-ci-windows-improvements:
t8020: fix test failure due to indeterministic tag sorting
gitlab-ci: upload Meson test logs as JUnit reports
gitlab-ci: drop workaround for Python certificate store on Windows
gitlab-ci: ignore failures to disable realtime monitoring
gitlab-ci: dedup instructions to disable realtime monitoring
* ps/rust-balloon:
ci: enable Rust for breaking-changes jobs
ci: convert "pedantic" job into full build with breaking changes
BreakingChanges: announce Rust becoming mandatory
varint: reimplement as test balloon for Rust
varint: use explicit width for integers
help: report on whether or not Rust is enabled
Makefile: introduce infrastructure to build internal Rust library
Makefile: reorder sources after includes
meson: add infrastructure to build internal Rust library
In the previous step, we introduced an optional filename that can be
given to a configuration variable, and nullify the fact that such a
configuration setting even existed if the named path is missing or
empty.
Let's do the same for command line options that name a pathname.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: D. Ben Knoble <ben.knoble+github@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Sometimes people want to specify additional configuration data
as "best effort" basis. Maybe commit.template configuration file points
at somewhere in ~/template/ but on a particular system, the file may not
exist and the user may be OK without using the template in such a case.
When the value given to a configuration variable whose type is
pathname wants to signal such an optional file, it can be marked by
prepending ":(optional)" in front of it. Such a setting that is
marked optional would avoid getting the command barf for a missing
file, as an optional configuration setting that names a missing
file is not even seen.
cf. <xmqq5ywehb69.fsf@gitster.g>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: D. Ben Knoble <ben.knoble+github@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2140b140 (commit: error out for missing commit message template,
2011-02-25) defined
GIT_EDITOR="echo hello >\"\$1\""
for these two tests, with the intention that 'hello' would be
written in the given file, but as Phillip Wood points out,
GIT_EDITOR is invoked by shell after getting expanded to
sh -c 'echo hello >"$1" "$@"' 'echo hello >"$1"' path/to/file
which is not what we want.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add glue code in 'refs/reftable-backend.c' which calls the reftable
library to perform the fsck checks. Here we also map the reftable errors
to Git' fsck errors.
Introduce a check to validate table names for a given reftable stack.
Also add 'badReftableTableName' as a corresponding error within Git. The
reftable specification mentions:
It suggested to use
${min_update_index}-${max_update_index}-${random}.ref as a naming
convention.
So treat non-conformant file names as warnings.
While adding the fsck header to 'refs/reftable-backend.c', modify the
list to maintain lexicographical ordering.
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The `git refs verify` command is used to run consistency checks on the
reference backends. This command is also invoked when users run 'git
fsck'. While the files-backend has some fsck checks added, the reftable
backend lacks such checks. Let's add the required infrastructure and a
check to test for the files present in the reftable directory.
Since the reftable library is treated as an independent library we
should ensure that the library code works independently without
knowledge about Git's internals. To do this, add both 'reftable/fsck.c'
and 'reftable/reftable-fsck.h'. Which provide an entry point
'reftable_fsck_check' for running fsck checks over a provided reftable
stack. The callee provides the function with callbacks to handle issue
and information reporting.
The added check, goes over all tables in the reftable stack validates
that they have a valid name. It not, it raises an error.
While here, move 'reftable/error.o' in the Makefile to retain
lexicographic ordering.
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The list of 'fsck_msg_type' seem to be alphabetically ordered, but there
are a few small misses. Fix this by sorting the sub-sections of the
list to maintain alphabetical ordering.
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The `gitmodulesLarge` is repeated twice. Remove the second duplicate.
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In the reftable format, the 'tables.list' file contains a
newline separated list of tables. While we parse this file, we do not
check or care about the last newline. Tighten the parser in
`parse_names()` to return an appropriate error if the last newline is
missing.
This requires modification to `parse_names()` to now return the error
while accepting the output as a third argument.
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The files-backend prints a message before the consistency checks run.
Move this to the generic layer so both the files and reftable backend
can benefit from this message.
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In the 'refs/' namespace, some of the included header files are not
needed, let's remove them.
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Commit 5a12fd2a8c (doc: change the markup of paragraphs following a
nested list item, 2025-09-27) converted the list of items in
config/extensions.adoc into a definition list. This caused a small
regression in the indentation of one item, but only when built with
AsciiDoctor. You can see the problem with:
$ ./doc-diff --asciidoctor 5a12fd2a8c^ 5a12fd2a8c
--- a/c44beea485f0f2feaf460e2ac87fdd5608d63cf0-asciidoctor/home/peff/share/man/man1/git-config.1
+++ b/5a12fd2a8c850df311aa149c9bad87b7cb002abb-asciidoctor/home/peff/share/man/man1/git-config.1
@@ -3128,9 +3128,9 @@ CONFIGURATION FILE
• reftable for the reftable format. This format is
experimental and its internals are subject to change.
- Note that this setting should only be set by git-init(1) or git-
- clone(1). Trying to change it after initialization will not work
- and will produce hard-to-diagnose issues.
+ Note that this setting should only be set by git-init(1) or git-
+ clone(1). Trying to change it after initialization will not work and
+ will produce hard-to-diagnose issues.
relativeWorktrees
If enabled, indicates at least one worktree has been linked with
(along with many other changes which are correctly fixing what
5a12fd2a8c intended to fix). The "Note" paragraph should remain aligned
with the bullet points, as they are left-aligned with the rest of the
definition text.
The confusion comes from a paragraph following a list item (ironically,
the same case that 5a12fd2a8c was solving!). We can solve it by adding
"--" block markers around the nested list. We couldn't have done that
before 5a12fd2a8c because before then our list was nested inside another
set of block markers, something that AsciiDoctor has trouble with. But
now that we are a top-level definition list, it is OK to do so (and in
fact, you can see that commit already made a similar adjustment for the
worktreeConfig entry).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
What happens if you run `git push` without any arguments is actually
extremely complex to explain, as discussed in the previous commit.
But it's very easy to explain what `git push <remote> <branch>` does, so
start the man page by explaining what that does.
The hope is that someone could just stop reading the man page here and
never learn anything else about `git push`, and that would be fine.
Signed-off-by: Julia Evans <julia@jvns.ca>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
From user feedback: 6 users says they found the "what to push"
paragraphs confusing, for many different reasons, including:
* what does "..." in <refspec>... mean?
* "consult XXX configuration" is hard to parse
* it refers to the `git-config` man page even though the config
information for `git push` is included in this man page under
CONFIGURATION
* the default ("push to a branch with the same name") is what they use
99% of the time, they would have expected it to appear earlier instead
of at the very end
* not understanding what the term "upstream" means in Git
("are branches tracked by some system besides their names?"")
Also, the current explanation of `push.default=simple` ("the
current branch is pushed to the corresponding upstream branch, but
as a safety measure, the push is aborted if the upstream branch
does not have the same name as the local one.") is not accurate:
`push.default=simple` does not always require you to set a corresponding
upstream branch.
Address all of these by
* using a numbered "in order of precedence" list
* giving a more accurate explanation of how `push.default=simple` works
* giving a little bit of context around "upstream branch": it's
something that you may have to set explicitly
* referring to the new UPSTREAM BRANCHES section
The default behaviour is still discussed pretty late but it should be
easier to skim now to get to the relevant information.
In "`git push` may fail if...", I'm intentionally being vague about
what exactly `git push` does, because (as discussed on the mailing list)
the behaviour of `push.default=simple` is very confusing, perhaps broken,
and certainly not worth trying to explain in an introductory context.
`push.default.simple` sometimes requires you to set an upstream and
sometimes doesn't and the exact conditions under which it does/doesn't
are hard to describe.
Signed-off-by: Julia Evans <julia@jvns.ca>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
It's not obvious that "`branch.*.remote` configuration"` refers to the
upstream, so say "upstream" instead.
The sentence is also quite hard to parse right now, use "defaults to" to
simplify it.
Signed-off-by: Julia Evans <julia@jvns.ca>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
From user feedback: one user mentioned that they don't know what the
term "upstream branch" means. As far as I can tell, the most complete
description is under the `--track` option in `git branch`. Upstreams
are an important concept in Git and the `git branch` man page is not an
obvious place for that information to live.
There's also a very terse description of "upstream branch" in the
glossary that's missing a lot of key information, like the fact that the
upstream is used by `git status` and `git pull`, as well as a
description in `git-config` in `branch.<name>.remote` which doesn't
explain the relationship to `git status` either.
Since the `git pull`, `git push`, and `git fetch` man pages already
include sections on REMOTES and the syntax for URLs, add a section on
UPSTREAM BRANCHES to `urls-remotes.adoc`.
In the new UPSTREAM BRANCHES section, cover the various ways that
upstreams branches are automatically set in Git, since users may
mistakenly think that their branch does not have an upstream branch if
they didn't explicitly set one.
A terminology note: Git uses two terms for this concept:
- "tracking" as in "the tracking information for the 'foo' branch"
or the `--track` option to `git branch`
- "upstream" or "upstream branch", as in `git push --set-upstream`.
This term is also used in the `git rebase` man page to refer to the
first argument to `git rebase`, as well as in `git pull` to refer to
the branch which is going to be merged into the current branch ("merge
the upstream branch into the current branch")
Use "upstream branch" as a heading for this concept even though the term
"upstream branch" is not always used strictly in the sense of "the
tracking information for the current branch". "Upstream" is used much
more often than "tracking" in the Git docs to refer to this concept and
the goal is to help users understand the docs.
Signed-off-by: Julia Evans <julia@jvns.ca>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
From user feedback, 5 users are unsure what "ref" and/or "objects" means
in this context. 3 users said they don't know what "complete the refs"
means.
Many users also commented that receive hooks do not seem like the most
important thing to know about `git push`, and that this information
should not be the second sentence in the man page.
Use more familiar language to make it more accessible to users who do
not know what a "ref" is and move the "hooks" comment to the end.
Signed-off-by: Julia Evans <julia@jvns.ca>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Don't accumulate allowed options from any visited hunks, start fresh at
the top of the loop instead and only record the allowed options for the
current hunk.
Reported-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Options a and d stage and unstage all undecided hunks towards the bottom
of the array of hunks, respectively, and then roll over to the very
first hunk. The first part is similar to y and n if the current hunk is
the last one in the array, but they roll over to the next undecided
hunk if there is any. That's more useful; do it for a and d as well.
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Options j and J roll over at the bottom and go to the first undecided
hunk and hunk 1, respectively. Let options k and K do the same when
they reach the top of the hunk array, so let them go to the last
undecided hunk and the last hunk, respectively, for consistency. Also
use the same direction-neutral error messages.
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The options y, n, and e mark the current hunk as decided. If there's
another undecided hunk towards the bottom of the hunk array they go
there. If there isn't, but there is another undecided hunk towards the
top then they go to the very first hunk, no matter if it has already
been decided on.
The option j does basically the same move. Technically it is not
allowed if there's no undecided hunk towards the bottom, but the
variable "permitted" is never reset, so this permission is retained
from the very first hunk. That may a bug, but this behavior is at
least consistent with y, n, and e and arguably more useful than
refusing to move.
Improve the roll-over behavior of these four options by moving to the
first undecided hunk instead of hunk 1, consistent with what they do
when not rolling over.
Also adjust the error message for j, as it will only be shown if
there's no other undecided hunk in either direction.
Reported-by: Windl, Ulrich <u.windl@ukr.de>
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The variable "permitted" is not reset after moving to a different hunk,
so it only accumulates permission and doesn't necessarily reflect those
of the current hunk. This may be a bug, but is actually useful with the
option J, which can be used at the last hunk to roll over to the first
hunk. Make this particular behavior official.
Also adjust the error message, as it will only be shown if there's just
a single hunk.
Suggested-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The options j, J, k, and K don't affect the status of the current hunk.
They just go to a different one. This is true whether the current hunk
is undecided or not. Avoid misunderstanding by no longer mentioning
the current hunk explicitly in their help texts.
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
After fixing the tricky compare warning introduced by calling
"string_list_find_insert_index", there are only two loop iterator type
mismatches. Fix them to enable compare warnings check.
Signed-off-by: shejialuo <shejialuo@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
As "string_list_find_insert_index" is a simple wrapper of
"get_entry_index" and the return type of "get_entry_index" is already
"size_t", we could simply change its return type to "size_t".
Update all callers to use size_t variables for storing the return value.
The tricky fix is the loop condition in "mailmap.c" to properly handle
"size_t" underflow by changing from `0 <= --i` to `i--`.
Remove "DISABLE_SIGN_COMPARE_WARNINGS" from "mailmap.c" as it's no
longer needed with the proper unsigned types.
Signed-off-by: shejialuo <shejialuo@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The "string_list_find_insert_index()" function is used to determine
the correct insertion index for a new string within the string list.
The function also doubles up to convey if the string is already
existing in the list, this is done by returning a negative index
"-1 -index". Users are expected to decode this information. This
approach has several limitations:
1. It requires the callers to look into the detail of the function to
understand how to decode the negative index encoding.
2. Using int for indices can cause overflow issues when dealing with
large string lists.
To address these limitations, change the function to return size_t for
the index value and use a separate bool parameter to indicate whether
the index refers to an existing entry or an insertion point.
In some cases, the callers of "string_list_find_insert_index" only need
the index position and don't care whether an exact match is found.
However, "get_entry_index" currently requires a non-NULL "exact_match"
parameter, forcing these callers to declare unnecessary variables.
Let's allow callers to pass NULL for the "exact_match" parameter when
they don't need this information, reducing unnecessary variable
declarations in calling code.
Signed-off-by: shejialuo <shejialuo@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The "exact_match" parameter in "get_entry_index" is used to indicate
whether a string is found or not, which is fundamentally a true/false
value. As we allow the use of bool, let's use bool instead of int to
make the function more semantically clear.
Signed-off-by: shejialuo <shejialuo@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The sentence needs to be whole to be properly translated.
Signed-off-by: Jean-Noël Avila <jn.avila@free.fr>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
- Switch the synopsis to a synopsis block which will automatically
format placeholders in italics and keywords in monospace
- Use _<placeholder>_ instead of <placeholder> in the description
- Use `backticks` for keywords and more complex option
descriptions. The new rendering engine will apply synopsis rules to
these spans.
Also add the config section in the manual page and do not refer to the man
page in the description of settings when this description is already in the
man page.
Signed-off-by: Jean-Noël Avila <jn.avila@free.fr>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
- Switch the synopsis to a synopsis block which will automatically
format placeholders in italics and keywords in monospace
- Use _<placeholder>_ instead of <placeholder> in the description
- Use `backticks` for keywords and more complex option
descriptions. The new rendering engine will apply synopsis rules to
these spans.
Also add the config section in the manual page and do not refer to the man
page in the description of settings when this description is already in the
man page.
Signed-off-by: Jean-Noël Avila <jn.avila@free.fr>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
- Switch the synopsis to a synopsis block which will automatically
format placeholders in italics and keywords in monospace
- Use _<placeholder>_ instead of <placeholder> in the description
- Use `backticks` for keywords and more complex option
descriptions. The new rendering engine will apply synopsis rules to
these spans.
Also do not refer to the man page in the description of settings when this
description is already in the man page.
Signed-off-by: Jean-Noël Avila <jn.avila@free.fr>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* 'master' of https://github.com/j6t/gitk:
gitk: set minimum size on configuration dialog
gitk: separate code blocks for configuration dialog
gitk: make configuration dialog resizing useful
gitk: add theme selection to color configuration page
gitk: add proc run_themeloader
gitk: eliminate unused ui color variables
gitk: eliminate Interface color option from gui
gitk: use text labels for next/prev search buttons
gitk: use text labels for commit ID buttons
gitk: do not invoke tk_setPalette
gitk: use config variables to define and load a theme
gitk: make sha1but a ttk::button
gitk: use themed spinboxes
gitk: fix MacOS 10.14 "Mojave" crash on launch
gitk: fix error when remote tracking branch is deleted
* ml/themes:
gitk: set minimum size on configuration dialog
gitk: separate code blocks for configuration dialog
gitk: make configuration dialog resizing useful
gitk: add theme selection to color configuration page
gitk: add proc run_themeloader
gitk: eliminate unused ui color variables
gitk: eliminate Interface color option from gui
gitk: use text labels for next/prev search buttons
gitk: use text labels for commit ID buttons
gitk: do not invoke tk_setPalette
gitk: use config variables to define and load a theme
gitk: make sha1but a ttk::button
gitk: use themed spinboxes
Signed-off-by: Johannes Sixt <j6t@kdbg.org>
gitk sets no size limit on its configuration dialog, allowing the user
to collapse the window so almost nothing is visible. The geometry
manager sets an initial size so all the widgets are visible, though
ignores the potentially very long text in the entry widgets in doing so.
Let's use this initial size as the minimum. The size information is
computed in Tk's idle processing queue, so a wait is required.
Signed-off-by: Mark Levedahl <mlevedahl@gmail.com>
gitk's configuration dialog uses a large number of widgets, and this
code is hard to read as there is no easily recognizable grouping or
breaks. Help this by adding space between items that occupy a single row
in the dialog.
Signed-off-by: Mark Levedahl <mlevedahl@gmail.com>
gitk's configuration dialog can be resized, but this does not expand the
space allocated to any widgets. Some items may have long lines of text
that would be visible if the widgets expanded, but this does not happen.
The top-level container uses a two column grid and allocates any space
change equally to both columns. However, the configuration pages are
contained in one cell so half the additional space is wasted if
expanding. Also, the individual configuration pages do not mark any
column or widgets to expand, so any additional space given is just used
as padding.
Collapse the top-level page to have one column, placing the "OK" and
"Cancel" buttons in a non-resizing frame in column 1 (this keeps the
buttons in constant geometry as the dialog is expanded). This makes all
additional space go to the configuration page.
Mark column 3 of the individual pages to get all additional space, and
mark the text widgets in that column so they will expand to use the
space. While we're at it, eliminate or simplify use of frames to contain
column 2 content, and harmonize the indents of that content.
prefspage_general adds a special "spacer" label in row 2, column 1, that
causes all of the subsequent rows with no column 1 content to indent,
and this carries over to the next notebook tab (prefspage_color) through
some undocumented feature. The fonts page has a different indent, again
for unknown reason. The documented approach would be to use -padx
explicitly on all the rows to set the indents.
Signed-off-by: Mark Levedahl <mlevedahl@gmail.com>
The only values possible for 'changed' is 1 and 0, which exactly maps
to a bool type. It might not look like this because action1 and action2
(which use to be dis1, and dis2) were also of type char and were
assigned numerical values within a few lines of 'changed' (what used to
be rchg).
Using DISCARD/KEEP/INVESTIGATE for action1[i]/action2[j], and true/false
for changed[k] makes it clear to future readers that these are
logically separate concepts.
Best-viewed-with: --color-words
Signed-off-by: Ezekiel Newren <ezekielnewren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This commit is refactor-only; no behavior is changed. A future commit
will use bool literals for changed[i].
The functions xdl_clean_mmatch() and xdl_cleanup_records() will be
cleaned up more in a future patch series. The changes to
xdl_cleanup_records(), in this patch, are just to make it clear why
`char rchg` is refactored to `bool changed`.
Rename dis* to action* and replace literal numericals with macros.
The old names came from when dis* (which I think was short for discard)
was treated like a boolean, but over time it grew into a ternary state
machine. The result was confusing because dis* and rchg* both used 0/1
values with different meanings.
The new names and macros make the states explicit. nm is short for
number of matches, and mlim is a heuristic limit:
nm == 0 -> action[i] = DISCARD -> changed[i] = true
0 < nm < mlim -> action[i] = KEEP -> changed[i] = false
nm >= mlim -> action[i] = INVESTIGATE -> changed[i] = xdl_clean_mmatch()
When need_min is true, only DISCARD and KEEP occur because the limit
is effectively infinite.
Best-viewed-with: --color-words
Signed-off-by: Ezekiel Newren <ezekielnewren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The --merge-base option for merge-tree has a few slightly awkward
constructions or omissions:
* Split the initial long sentence describing the option into two,
making the instructions and the limitations clearer for readers.
* Add context to the final sentence that might be obvious to some
readers but isn't immediately obvious to all.
* The discussion about lack of support for multiple merge bases
simply leave folks wondering why that matters and could help or
hurt. Separate it out and add a brief explanation.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Commit bcf7edee09 ("meson: generate articles", 2024-12-27) added the
generation of the 'howto' and 'technical' documents to the meson build.
At this time those documents had a '*.txt' file extension, but they were
renamed with an '*.adoc' extension by commit 1f010d6bdf ("doc: use .adoc
extension for AsciiDoc files", 2025-01-20), for the most part. For the
meson build, commit 87eccc3a81 ("meson: fix building technical and howto
docs", 2025-03-02) fixed the meson.build files, which had not been
updated when the files were renamed.
However, the 'Documentation/Makefile' has not been updated to include
all of the recently added technical documents. In particular, the
following are built by meson, but not by the Makefile:
commit-graph.adoc
directory-rename-detection.adoc
packfile-uri.adoc
remembering-renames.adoc
repository-version.adoc
rerere.adoc
sparse-checkout.adoc
sparse-index.adoc
In order to ensure that both build systems format the same technical
documents, add the above documents to the TECH_DOCS variable in the
Documentation/Makefile.
Signed-off-by: Ramsay Jones <ramsay@ramsayjones.plus.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Same idea as the previous commit except that I don't know when or if
reftable will be turned into a Rust crate.
Signed-off-by: Ezekiel Newren <ezekielnewren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In a future patch series the 'xdiff' Rust crate will be added. Delete
the creation of the static library file for xdiff to avoid a name
conflict. This also moves toward the goal of Rust only needing to link
against libgit.a.
Changes to Meson are not required as the xdiff library is already
included in Meson's libgit.a.
xdiff-objs was a historical make target to allow building just the
objects in xdiff. Since it was defined in terms of XDIFF_OBJS (which
no longer exists) this convenience make target no longer makes sense.
Remove it.
Signed-off-by: Ezekiel Newren <ezekielnewren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The "do you still use it?" message given by a command that is
deeply deprecated and allow us to suggest alternatives has been
updated.
* kh/you-still-use-whatchanged-fix:
BreakingChanges: remove claim about whatchanged reports
whatchanged: remove not-even-shorter clause
whatchanged: hint about git-log(1) and aliasing
you-still-use-that??: help the user help themselves
t0014: test shadowing of aliases for a sample of builtins
git: allow alias-shadowing deprecated builtins
git: move seen-alias bookkeeping into handle_alias(...)
git: add `deprecated` category to --list-cmds
Makefile: don’t add whatchanged after it has been removed
The build procedure based on meson learned a target to only build
documentation, similar to "make doc".
* ps/meson-build-docs:
ci: don't compile whole project when testing docs with Meson
meson: print docs backend as part of the summary
meson: introduce a "docs" alias to compile documentation only
The use of "git config get" command to learn how ANSI color
sequence is for a particular type, e.g., "git config get
--type=color --default=reset no.such.thing", isn't very ergonomic.
* ps/config-get-color-fixes:
builtin/config: do not spawn pager when printing color codes
builtin/config: special-case retrieving colors without a key
builtin/config: do not die in `get_color()`
t1300: small style fixups
t1300: write test expectations in the test's body
"git fast-import" learned that "--signed-commits=<how>" option that
corresponds to that of "git fast-export".
* cc/fast-import-strip-signed-commits:
fast-import: add '--signed-commits=<mode>' option
gpg-interface: refactor 'enum sign_mode' parsing
"git refs optimize" is added for not very well explained reason
despite it does the same thing as "git pack-refs"...
* ms/refs-optimize:
t: add test for git refs optimize subcommand
t0601: refactor tests to be shareable
builtin/refs: add optimize subcommand
doc: pack-refs: factor out common options
builtin/pack-refs: factor out core logic into a shared library
builtin/pack-refs: convert to use the generic refs_optimize() API
reftable-backend: implement 'optimize' action
files-backend: implement 'optimize' action
refs: add a generic 'optimize' API
The work to build on the bulk-checkin infrastructure to create many
objects at once in a transaction and to abstract it into the
generic object layer continues.
* jt/odb-transaction:
odb: add transaction interface
object-file: update naming from bulk-checkin
object-file: relocate ODB transaction code
bulk-checkin: drop flush_odb_transaction()
builtin/update-index: end ODB transaction when --verbose is specified
bulk-checkin: remove ODB transaction nesting
In e6c06e87a2 (last-modified: fix bug when some paths remain unhandled,
2025-09-18), we have fixed a bug where under certain circumstances,
git-last-modified(1) would BUG because there's still some unhandled
paths. The fix claims that the root cause here is criss-cross merges,
and it adds a test case that seemingly exercises this.
Curiously, this test case fails on some systems because the actual
output does not match our expectations:
diff --git a/expect b/actual
index 5271607..bdc620e 100644
--- a/expect
--- b/actual
@@ -1,3 +1,3 @@
km3 a
-k2 k
+km2 k
1 file
error: last command exited with $?=1
not ok 15 - last-modified with subdir and criss-cross merge
The output we see is git-name-rev(1) with `--annotate-stdin`. What it
does is to take the output of git-last-modified(1), which contains
object IDs of the blamed commits, and convert those object IDs into the
names of the corresponding tags. Interestingly, we indeed have both "k2"
and "km2" as tags, and even more interestingly both of these tags point
to the same commit. So the output we get isn't _wrong_, as the tags are
ambiguous.
But why do both of these tags point to the same commit? "km2" really is
supposed to be a merge, but due to the way the test is constructed the
merge turns into a fast-forward merge. Which means that the resulting
commit-graph does not even contain a criss-cross merge in the first place!
A quick test though shows that the test indeed triggers the bug, so
the initial analysis that the behaviour is triggered by such merges
must be wrong.
And it is: seemingly, the issue isn't with criss-cross merges, but
rather with a graph where different files in the same directory were
modified on both sides of a merge.
Refactor the test so that we explicitly test for this specific situation
instead of mentioning the "criss-cross merge" red herring. As the test
is very specific to the actual layout of the repository we also adapt it
to use its own standalone repository.
Note that this requires us to drop the `test_when_finished` call in
`check_last_modified` because it's not supported to execute that
function in a subshell.
This refactoring also fixes the original tag ambiguity that caused us to
fail on some platforms.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When running tests, Meson knows to output both a test log as well as a
JUnit test report that collates results. We don't currently upload these
results in our GitLab CI at all, which makes it hard to see which tests
ran, but also which of our tests may have failed.
Upload these JUnit reports as artifacts to make this information more
accessible. Note that we also do this for some jobs that don't use Meson
and thus don't generate these reports in the first place. GitLab CI
handles missing reports gracefully though, so there is no reason to
special-case those jobs that don't use Meson.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
On Windows, we have been running into some issues in the past where the
certificate store for Python is broken on the GitLab CI runners using
Windows. The consequence was that we weren't able to establish any SSL
connections via Python, but we need that feature so that we can download
the Meson wraps. The workaround we employed was to import certificates
from the cURL project into the certificate store via OpenSSL.
This is obviously an ugly workaround. But even more importantly, this
workaround fails every time Chocolatey updates its OpenSSL packages. The
problem here is that the old OpenSSL package installer will be removed
immediately once the newer version was published, But the Chocolatey
community repository may not yet have propagated the new version of this
package to all of its caches. The result is that for a couple hours (or
sometimes even one or two days) we always fail to install OpenSSL until
the new version was propagated.
Luckily though, it turns out that the workaround doesn't seem to be
required anymore. Drop it to work around the intermittent failures and
to clean up some now-unneeded legacy cruft.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We have recently introduced a change to disable realtime monitoring for
Windows job in GitLab CI. This change led (and still leads) to a quite
significant speedup.
But there's a catch: seemingly, some of the runners we use already have
realtime monitoring disabled. On such a machine, trying to disable the
feature again leads to an error that causes the whole job to fail.
Safeguard against such failures by explicitly ignoring them.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The instruction to disable realtime monitoring are shared across all of
our Windows-based jobs. Deduplicate it so that we can more readily
iterate on it.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Enable Rust for our breaking-changes jobs so that we can verify that the
build infrastructure and the converted Rust subsystems work as expected.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The "pedantic" CI job is building on Fedora with `DEVOPTS=pedantic`.
This build flag doesn't do anything anymore starting with 6a8cbc41ba
(developer: enable pedantic by default, 2021-09-03), where we have
flipped the default so that developers have to opt-out of pedantic
builds via the "no-pedantic" option. As such, all this job really does
is to do a normal build on Fedora, which isn't all that interesting.
Convert that job into a full build-and-test job that uses Meson with
breaking changes enabled. This plugs two gaps:
- We now test on another distro that we didn't run tests on
beforehand.
- We verify that breaking changes work as expected with Meson.
Furthermore, in a subsequent commit we'll modify both jobs that use
breaking changes to also enable Rust. By converting the Fedora job to
use Meson, we ensure that we test our Rust build infrastructure for both
build systems.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Over the last couple of years the appetite for bringing Rust into the
codebase has grown significantly across the developer base. Introducing
Rust is a major change though and has ramifications for the whole
ecosystem:
- Some platforms have a Rust toolchain available, but have not yet
integrated it into their build infrastructure.
- Some platforms don't have any support for Rust at all.
- Some platforms may have to figure out how to fit Rust into their
bootstrapping sequence.
Due to this, and given that Git is a critical piece of infrastructure
for the whole industry, we cannot just introduce such a heavyweight
dependency without doing our due diligence.
Instead, preceding commits have introduced a test balloon into our build
infrastructure that convert one tiny subsystem to use Rust. For now,
using Rust to build that subsystem is entirely optional -- if no Rust
support is available, we continue to use the C implementation. This test
balloon has the intention to give distributions time and let them ease
into our adoption of Rust.
Having multiple implementations of the same subsystem is not sustainable
though, and the plan is to eventually be able to use Rust freely all
across our codebase. As such, there is the intent to make Rust become a
mandatory part of our build process.
Add an announcement to our breaking changes that Rust will become
mandatory in Git 3.0. A (very careful and non-binding) estimate might be
that this major release might be released in the second half of next
year, which should give distributors enough time to prepare for the
change.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Implement a trivial test balloon for our Rust build infrastructure by
reimplementing the "varint.c" subsystem in Rust. This subsystem is
chosen because it is trivial to convert and because it doesn't have any
dependencies to other components of Git.
If support for Rust is enabled, we stop compiling "varint.c" and instead
compile and use "src/varint.rs".
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The varint subsystem currently uses implicit widths for integers. On the
one hand we use `uintmax_t` for the actual value. On the other hand, we
use `int` for the length of the encoded varint.
Both of these have known maximum values, as we only support at most 16
bytes when encoding varints. Thus, we know that we won't ever exceed
`uint64_t` for the actual value and `uint8_t` for the prefix length.
Refactor the code to use explicit widths. Besides making the logic
platform-independent, it also makes our life a bit easier in the next
commit, where we reimplement "varint.c" in Rust.
Suggested-by: Ezekiel Newren <ezekielnewren@gmail.com>
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We're about to introduce support for Rust into the core of Git, where
some (trivial) subsystems are converted to Rust. These subsystems will
also retain a C implementation though as Rust is not yet mandatory.
Consequently, it now becomes possible for a Git version to have bugs
that are specific to whether or not it is built with Rust support
overall.
Expose information about whether or not Git was built with Rust via our
build info. This means that both `git version --build-options`, but also
`git bugreport` will now expose that bit of information. Hopefully, this
should make it easier for us to discover any Rust-specific issues.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Introduce infrastructure to build the internal Rust library. This
mirrors the infrastructure we have added to Meson in the preceding
commit. Developers can enable the infrastructure by passing the new
`WITH_RUST` build toggle.
Inspired-by: Ezekiel Newren <ezekielnewren@gmail.com>
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In an upcoming change we'll make some of the sources compile
conditionally based on whether or not `WITH_RUST` is defined. To let
developers specify that flag in their "config.mak" we'll thus have to
reorder our sources so that they come after the include of that file.
Do so.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add the infrastructure into Meson to build an internal Rust library.
Building the Rust parts of Git are for now entirely optional, as they
are mostly intended as a test balloon for both Git developers, but also
for distributors of Git. So for now, they may contain:
- New features that are not mission critical to Git and that users can
easily live without.
- Alternative implementations of small subsystems.
If these test balloons are successful, we will eventually make Rust a
mandatory dependency for our build process in Git 3.0.
The availability of a Rust toolchain will be auto-detected by Meson at
setup time. This behaviour can be tweaked via the `-Drust=` feature
toggle.
Next to the linkable Rust library, also wire up tests that can be
executed via `meson test`. This allows us to use the native unit testing
capabilities of Rust.
Note that the Rust edition is currently set to 2018. This edition is
supported by Rust 1.49, which is the target for the upcoming gcc-rs
backend. For now we don't use any features of Rust that would require a
newer version, so settling on this old version makes sense so that
gcc-rs may become an alternative backend for compiling Git. If we _do_
want to introduce features that were added in more recent editions of
Rust though we should reevaluate that choice.
Inspired-by: Ezekiel Newren <ezekielnewren@gmail.com>
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
As more and more developer tools use AI, we are facing two main risks
related to AI generated content:
- its situation regarding copyright and license is not clear,
and:
- more and more bad quality content could be submitted for review to
the mailing list.
To mitigate both risks, let's add an "Use of Artificial Intelligence"
section to "Documentation/SubmittingPatches" with the goal of
discouraging its blind use to generate content that is submitted to
the project, while still allowing us to benefit from its help in some
innovative, useful and less risky ways.
Helped-by: Rick Sanders <rick@sfconservancy.org>
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Documentation was inaccurate since 9a121b0d226 (credential: handle
`credential.<partial-URL>.<key>` again, 2020-04-24)
Add tests for documented behaviour.
Signed-off-by: M Hickford <mirth.hickford@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The 'cache_ref_iterator_seek()' function is used to seek the
`ref_iterator` to the desired reference in the ref-cache mechanism. We
use the seeking functionality to implement the '--start-after' flag in
'git-for-each-ref(1)'.
When using the files-backend with packed-refs, it is possible that some
of the refs directories are empty. For e.g. just after repacking, the
'refs/heads' directory would be empty. The ref-cache seek mechanism,
doesn't take this into consideration when descending into a
subdirectory, and makes an out of bounds access, causing SEGFAULT as we
try to access entries within the directory. Fix this by breaking out of
the loop when we enter an empty directory.
Since we start with the base directory of 'refs/' which is never empty,
it is okay to perform this check after the first iteration in the
`do..while` clause.
Add tests which simulate this behavior and also provide coverage over
using the feature over packed-refs.
Helped-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
gitk allows configuring a particular theme in its configuration file
(default on linux: ~/.config/git/gitk), but offers no ability to modify
this from gitk's configuration editor. Let's add this to the color
configuration page.
Present the offered themes in a list, and allow choosing / modifying a
theme definition file ($themeloader). Update the list of themes if the
theme file is modified, and update the theme if specifically requested
(by default, just change the value for use after gitk is restarted).
Any theme definition file can change the global options database,
affecting potentially any theme. So, the ultimate configuration should
have either
- no theme definition file (themeloader = {}), and a native Tk, theme,
or
- themeloader naming a valid file, and $theme naming a theme defined by
that file.
But, there is no trivial way to enforce the above. Shrug.
Helped-by: Johannes Sixt <j6t@kdbg.org>
Signed-off-by: Mark Levedahl <mlevedahl@gmail.com>
The reflog write recognizes only GIT_COMMITTER_NAME and
GIT_COMMITTER_EMAIL environment variables, but forgot to honor the
user.name and user.email configuration variables, due to lack of
repo_config() call to grab these values from the configuration files.
The test suite sets these variables, so this behavior was unnoticed.
Ensure that the reflog write also uses the values of user.name and
user.email if set in the Git configuration.
Co-authored-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Michael Lohmann <git@lohmann.sh>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The field rchg (now 'changed') declares if a line in a file is changed
or not. A later commit will change it's type from 'char' to 'bool'
to make its purpose even more clear.
Best-viewed-with: --color-words
Signed-off-by: Ezekiel Newren <ezekielnewren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
xdfile_t currently uses chastore_t which is an arena allocator. I
think that xrecord_t used to be a linked list and recs didn't exist
originally. When recs was added I think they forgot to remove
xdfile_t.next, but was overlooked. This dual data structure setup
makes the code somewhat confusing.
Additionally the C type chastore_t isn't FFI friendly, and provides
little to no performance benefit over using realloc to grow an array.
Performance impact of deleting fields from xdfile_t:
Deleting ha is about 5% slower.
Deleting cha is about 5% faster.
Delete ha, but keep cha
time hyperfine --warmup 3 -L exe build_v2.51.0/git,build_delete_ha/git '{exe} log --oneline --shortstat --diff-algorithm=myers -3000 v2.39.1 >/dev/null'
Benchmark 1: build_v2.51.0/git log --oneline --shortstat --diff-algorithm=myers -3000 v2.39.1 >/dev/null
Time (mean ± σ): 1.269 s ± 0.017 s [User: 1.135 s, System: 0.128 s]
Range (min … max): 1.249 s … 1.286 s 10 runs
Benchmark 2: build_delete_ha/git log --oneline --shortstat --diff-algorithm=myers -3000 v2.39.1 >/dev/null
Time (mean ± σ): 1.339 s ± 0.017 s [User: 1.234 s, System: 0.099 s]
Range (min … max): 1.320 s … 1.358 s 10 runs
Summary
build_v2.51.0/git log --oneline --shortstat --diff-algorithm=myers -3000 v2.39.1 >/dev/null ran
1.06 ± 0.02 times faster than build_delete_ha/git log --oneline --shortstat --diff-algorithm=myers -3000 v2.39.1 >/dev/null
Delete cha, but keep ha
time hyperfine --warmup 3 -L exe build_v2.51.0/git,build_delete_chastore/git '{exe} log --oneline --shortstat --diff-algorithm=myers -3000 v2.39.1 >/dev/null'
Benchmark 1: build_v2.51.0/git log --oneline --shortstat --diff-algorithm=myers -3000 v2.39.1 >/dev/null
Time (mean ± σ): 1.290 s ± 0.001 s [User: 1.154 s, System: 0.130 s]
Range (min … max): 1.288 s … 1.292 s 10 runs
Benchmark 2: build_delete_chastore/git log --oneline --shortstat --diff-algorithm=myers -3000 v2.39.1 >/dev/null
Time (mean ± σ): 1.232 s ± 0.017 s [User: 1.105 s, System: 0.121 s]
Range (min … max): 1.205 s … 1.249 s 10 runs
Summary
build_delete_chastore/git log --oneline --shortstat --diff-algorithm=myers -3000 v2.39.1 >/dev/null ran
1.05 ± 0.01 times faster than build_v2.51.0/git log --oneline --shortstat --diff-algorithm=myers -3000 v2.39.1 >/dev/null
Delete ha AND chastore
time hyperfine --warmup 3 -L exe build_v2.51.0/git,build_delete_ha_and_chastore/git '{exe} log --oneline --shortstat --diff-algorithm=myers -3000 v2.39.1 >/dev/null'
Benchmark 1: build_v2.51.0/git log --oneline --shortstat --diff-algorithm=myers -3000 v2.39.1 >/dev/null
Time (mean ± σ): 1.291 s ± 0.002 s [User: 1.156 s, System: 0.129 s]
Range (min … max): 1.287 s … 1.295 s 10 runs
Benchmark 2: build_delete_ha_and_chastore/git log --oneline --shortstat --diff-algorithm=myers -3000 v2.39.1 >/dev/null
Time (mean ± σ): 1.306 s ± 0.001 s [User: 1.195 s, System: 0.105 s]
Range (min … max): 1.305 s … 1.308 s 10 runs
Summary
build_v2.51.0/git log --oneline --shortstat --diff-algorithm=myers -3000 v2.39.1 >/dev/null ran
1.01 ± 0.00 times faster than build_delete_ha_and_chastore/git log --oneline --shortstat --diff-algorithm=myers -3000 v2.39.1 >/dev/null
Best-viewed-with: --color-words
Signed-off-by: Ezekiel Newren <ezekielnewren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The fields from xdlclass_t are aliases of xrecord_t:
xdlclass_t.line -> xrecord_t.ptr
xdlclass_t.size -> xrecord_t.size
xdlclass_t.ha -> xrecord_t.ha
xdlclass_t carries a copy of the data in xrecord_t, but instead of
embedding xrecord_t it duplicates the individual fields. A future
commit will change the types used in xrecord_t so embed it in
xdlclass_t first, so we don't have to remember to change the types
here as well.
Best-viewed-with: --color-words
Helped-by: Phillip Wood <phillip.wood123@gmail.com>
Signed-off-by: Ezekiel Newren <ezekielnewren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When 0 <= i < xdfile_t.nreff the following is true:
xdfile_t.ha[i] == xdfile_t.recs[xdfile_t.rindex[i]]
This makes the code about 5% slower. The fields rindex and ha are
specific to the classic diff (myers and minimal). I plan on creating a
struct for classic diff, but there's a lot of cleanup that needs to be
done before that can happen and leaving ha in would make those cleanups
harder to follow.
A subsequent commit will delete the chastore cha from xdfile_t. That
later commit will investigate deleting ha and cha independently and
together.
Signed-off-by: Ezekiel Newren <ezekielnewren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Every field in this struct is an alias for a certain field in xdfile_t.
diffdata_t.nrec -> xdfile_t.nreff
diffdata_t.ha -> xdfile_t.ha
diffdata_t.rindex -> xdfile_t.rindex
diffdata_t.rchg -> xdfile_t.rchg
I think this struct existed before xdfile_t, and was kept for backward
compatibility reasons. I think xdiffi should have been refactored to
use the new (xdfile_t) struct, but was easier to alias it instead.
The local variables rchg* and rindex* don't shorten the lines by much,
nor do they really need to be there to make the code more readable.
Delete them.
Signed-off-by: Ezekiel Newren <ezekielnewren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Use the type xrecord_t as the local variable for the functions in the
file xdiff/xemit.c. Most places directly reference the fields inside of
this struct, doing that here makes it more consistent with the rest of
the code.
Signed-off-by: Ezekiel Newren <ezekielnewren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When xrecord_t was a linked list, and recs didn't exist, I assume this
function walked the list until it found the right record. Accessing
a contiguous array is so trivial that this function is now superfluous.
Delete it.
Signed-off-by: Ezekiel Newren <ezekielnewren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
gitk currently accepts a single themeloader file via the config file,
and will source this with errors reported to the console. This is fine
for simple configuration, but will not support interactive theme
exploration from the gui. In particular, a themeloader file must be
sourced only once as the themes defined cannot be re-defined. Also,
errors must be handled rather than just aborting while printing to the
console. So, add a proc to handle the above, supporting expansion of
the gui config pages.
Signed-off-by: Mark Levedahl <mlevedahl@gmail.com>
gitk has a number of variables used in setting up colors for the classic
(non-themed) widget set. These variables are unused with ttk, so let's
eliminate them. But, leave the variables in the config file for now -
those can be eliminated after this change is merged.
Signed-off-by: Mark Levedahl <mlevedahl@gmail.com>
gitk offers to change the ui color on the colors prefs page, but the
variable set has no effect because gitk is using themes. Let's eliminate
the "Interface" color selection option from that page.
Signed-off-by: Mark Levedahl <mlevedahl@gmail.com>
gitk allows searching for commits with various criteria, and provides
up/down search buttons to facilitate this search. These buttons are
labelled with bitmaps, and those bitmaps are not always recolored
correctly for the ui scheme as the theme colors are not known. Let's
just use text labels on these, allowing the styles to handle any
coloring needed. Use utf codepoints for the arrows, presuming that these
code points are available in the selected font.
Signed-off-by: Mark Levedahl <mlevedahl@gmail.com>
gitk maintains a stack of commit ids visited, and allows navigating
these using a pair of buttons shown with arrows using bitmaps. An attempt
is made to recolor these bitmaps to handle different color schemes, but
this is unreliable across multiple themes as the required colors are not
universally known. Let's just use text labels for these buttons,
allowing the themes to recolor the text along with everything else. Use
utf code points for the text, presuming that these arrow glyphs are
available in the selected font.
Signed-off-by: Mark Levedahl <mlevedahl@gmail.com>
gitk uses themed widgets with a user selected theme, but also invokes
tk_setPalette to configure colors for the non-themed widgets including
the menubar. However, themes in general are expected to configure
those colors already. The builtin themes (default, alt, clam, classic on
unix/X11) all have compatible colors, and need no such reconfiguration,
and (most, if not all) available themes set the options database for this
purpose as well. Furthermore, gitk in the past avoided invoking
tk_setPalette on Windows to avoid some issues.
So, let's stop calling tk_setPalette everywhere, and just rely upon the
selected theme (possibly user installed) to have set all needed colors.
Note: if a user installs more than one theme using $themeloader, the last
one installed will have defined the colors to be used. Those colors will
probably be incorrect for any other set, including Tk's builtin set.
Signed-off-by: Mark Levedahl <mlevedahl@gmail.com>
gitk uses themed tk, but has no capability to alter the theme defined
by Tk. While there are documented ways to install other themes, and
to make one the default, these methods are obscure at best. Instead,
let's offer two config variables:
- theme this is the name of the theme to use, and must be available.
- themeloader - this is the full pathname of a tcl script that
will load one or more themes into the Tk namespace.
By default, theme is set to the theme active when Tk is started, and
themeloader = {}. These variables must be defined to something else to
have any user visible effect.
Signed-off-by: Mark Levedahl <mlevedahl@gmail.com>
All the final paragraphs on these three options are rendered as
literal blocks. The intent was surely to keep each of them wed to their
respective description list items. But the attempt at maintaining the
indentation level of the block causes each them to be interpreted as a
code block, since code blocks can be represented using indentation.
We need to use list continuation (+) in order to keep them wed to
their blocks.
There is also an unordered list which sandwiches two paragraphs on an
option. We don’t need to do anything about that since it attaches to the
description list item without list continuation (i.e. it is already
correct). But for consistency let’s use list continuation and an open
block on it.
Signed-off-by: Kristoffer Haugsbakk <code@khaugsbakk.name>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git last-modified" operating in non-recursive mode used to trigger
a BUG(), which has been corrected.
* tc/last-modified-recursive-fix:
last-modified: fix bug when some paths remain unhandled
Deal more gracefully with directory / file conflicts when the files
backend is used for ref storage, by failing only the ones that are
involved in the conflict while allowing others.
* kn/refs-files-case-insensitive:
refs/files: handle D/F conflicts during locking
refs/files: handle F/D conflicts in case-insensitive FS
refs/files: use correct error type when lock exists
refs/files: catch conflicts on case-insensitive file-systems
Some places in the code confused a variable that is *not* a boolean
to enable color but is an enum that records what the user requested
to do about color. A couple of bugs of this sort have been fixed,
while the code has been cleaned up to prevent similar bugs in the
future.
* jk/color-variable-fixes:
config: store want_color() result in a separate bool
add-interactive: retain colorbool values longer
color: return bool from want_color()
color: use git_colorbool enum type to store colorbools
pretty: use format_commit_context.auto_color as colorbool
diff: stop passing ecbdata->use_color as boolean
diff: pass o->use_color directly to fill_metainfo()
diff: don't use diff_options.use_color as a strict bool
diff: simplify color_moved check when flushing
grep: don't treat grep_opt.color as a strict bool
color: return enum from git_config_colorbool()
color: use GIT_COLOR_* instead of numeric constants
The stash.index configuration variable can be set to make "git stash
pop/apply" pretend that it was invoked with "--index".
* dk/stash-apply-index:
stash: honor stash.index in apply, pop modes
stash: refactor private config globals
t3905: remove unneeded blank line
t3903: reduce dependencies on previous tests
There are double frees and leaks around setup_revisions() API used
in "git stash show", which has been fixed, and setup_revisions()
API gained a wrapper to make it more ergonomic when using it with
strvec-manged argc/argv pairs.
* jk/setup-revisions-freefix:
revision: retain argv NULL invariant in setup_revisions()
treewide: pass strvecs around for setup_revisions_from_strvec()
treewide: use setup_revisions_from_strvec() when we have a strvec
revision: add wrapper to setup_revisions() from a strvec
revision: manage memory ownership of argv in setup_revisions()
stash: tell setup_revisions() to free our allocated strings
"git rebase -i" failed to clean-up the commit log message when the
command commits the final one in a chain of "fixup" commands, which
has been corrected.
* pw/rebase-i-cleanup-fix:
sequencer: remove VERBATIM_MSG flag
rebase -i: respect commit.cleanup when picking fixups
Keep giving hint about the default initial branch name for users
who may be surprised after Git 3.0 switch-over.
* jc/3.0-default-initial-branch-to-main-addendum:
initial branch: give hints after switching the default name
Declare that "git init" that is not otherwise configured uses
'main' as the initial branch, not 'master', starting Git 3.0.
* pw/3.0-default-initial-branch-to-main:
t0613: stop setting default initial branch
t9902: switch default branch name to main
t4013: switch default branch name to main
breaking-changes: switch default branch to main
"git send-email --compose --reply-to=<address>" used to add
duplicated Reply-To: header, which made mailservers unhappy. This
has been corrected.
* nb/send-email-no-dup-reply-to:
send-email: don't duplicate Reply-to: in intro message
* ps/packfile-store:
packfile: refactor `get_packed_git_mru()` to work on packfile store
packfile: refactor `get_all_packs()` to work on packfile store
packfile: refactor `get_packed_git()` to work on packfile store
packfile: move `get_multi_pack_index()` into "midx.c"
packfile: introduce function to load and add packfiles
packfile: refactor `install_packed_git()` to work on packfile store
packfile: split up responsibilities of `reprepare_packed_git()`
packfile: refactor `prepare_packed_git()` to work on packfile store
packfile: reorder functions to avoid function declaration
odb: move kept cache into `struct packfile_store`
odb: move MRU list of packfiles into `struct packfile_store`
odb: move packfile map into `struct packfile_store`
odb: move initialization bit into `struct packfile_store`
odb: move list of packfiles into `struct packfile_store`
packfile: introduce a new `struct packfile_store`
These tests prepare the working tree & index state to have something
to be committed, and try a sequence of "test_must_fail git commit".
If an earlier one did not fail by a bug, a later one will fail for
a wrong reason (namely, "nothing to commit").
Give them "--allow-empty" to make sure that they would work even
when there is nothing to commit by accident.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: D. Ben Knoble <ben.knoble+github@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The formatter currently suggests adding a space between a control macro
and parentheses. In the Git project, this is not typically expected. Set
`SpaceBeforeParens` to `ControlStatementsExceptControlMacros`
accordingly.
Helped-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: Justin Tobler <jltobler@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Asciidoctor and asciidoc.py have different behaviors when a paragraph
follows a nested list item. Asciidoctor has a bug[1] that makes it keep a
plus sign (+) used to attached paragraphs at the beginning of the paragraph.
This commit uses workarounds to avoid this problem by using second level
definition lists and open blocks.
[1]:https://github.com/asciidoctor/asciidoctor/issues/4704
Signed-off-by: Jean-Noël Avila <jn.avila@free.fr>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
xrecord_t.next, xdfile_t.hbits, xdfile_t.rhash are initialized,
but never used for anything by the code. Remove them.
Best-viewed-with: --color-words
Signed-off-by: Ezekiel Newren <ezekielnewren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
These local variables are essentially a hand-rolled additional
implementation of xdl_free_ctx() inlined into xdl_prepare_ctx(). Modify
the code to use the existing xdl_free_ctx() function so there aren't
two ways to free such variables.
Signed-off-by: Ezekiel Newren <ezekielnewren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Move xdl_prepare_env() later in the file to avoid the need
for static forward declarations.
Best-viewed-with: --color-moved
Signed-off-by: Ezekiel Newren <ezekielnewren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
With the recent update in Git for Windows/ARM64 as of
https://github.com/git-for-windows/git-sdk-arm64/commit/21b288e16358
cURL was updated from v8.15.0 to v8.16.0, and the LLVM-based builds (but
strangely not the GCC-based builds) continuously greet me thusly:
http-push.c:211:2: error: call to '_curl_easy_setopt_err_long' declared
with 'warning' attribute: curl_easy_setopt expects a long argument
[-Werror,-Wattribute-warning]
CC builtin/apply.o
211 | curl_easy_setopt(curl, CURLOPT_INFILESIZE, buffer->buf.len);
| ^
C:/a/git-sdk-arm64/git-sdk-arm64/minimal-sdk/clangarm64/include/curl/typecheck-gcc.h:50:15:
note: expanded from macro 'curl_easy_setopt'
50 | _curl_easy_setopt_err_long(); \
| ^
1 error generated.
make: *** [Makefile:2877: http-push.o] Error 1
The easiest way to shut up that compile error (which is legitimate,
seeing as the `CURLOPT_INFILESIZE` options expects a `long` parameter,
but `buffer->buf.len` refers to the `size_t` attribute of a `strbuf`)
would be to simply cast the parameter to a `long`.
However, there is a much better solution: To use the
`CURLOPT_INFILESIZE_LARGE` option instead, which was added in cURL
v7.11.0 (see https://curl.se/ch/7.11.0.html) and which Git _already_
uses in `curl_append_msgs_to_imap()`.
This fix was the motivation for renaming `xcurl_off_t()` to
`cast_size_t_to_curl_off_t()` and making it available more broadly,
which is the reason why it is used here, too.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When casting a `size_t` to `curl_off_t`, there is a currently uncommon
chance that the value can be cut off (`curl_off_t` is expected to be a
signed 64-bit data type).
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This commit moves the `xcurl_off_t()` function, which validates that a
given value fits within the `curl_off_t` data type and then casts it, to
a more central place so that it can be used outside of `remote-curl.c`,
too.
At the same time, this function is renamed to conform better with the
naming convention of the helper functions that safely cast from one data
type to another which has been well established in `git-compat-util.h`.
With this move, `gettext.h` must be `#include`d in `http.h` to allow the
error message to remain translatable.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
gitk's 'Commit ID' button uses a classic widget, not a themed one,
leading to inconsistent style. Commit 51a7e8b654 (d93f1713b0 ("gitk: Use
themed tk widgets", 2009-04-17) that added themed widgets did not touch
this particular widget, but does not say why. Regardless, let's use a
themed button to be consistent with the rest of the interface.
Signed-off-by: Mark Levedahl <mlevedahl@gmail.com>
If one of the two provided paths for git diff --no-index ends in a '/',
a failure similar to the following occurs:
$ git diff --no-index -- /tmp/ /tmp/ ':!'
fatal: `pos + len' is too far after the end of the buffer
This occurs because of an incorrect calculation of the skip lengths in
diff_no_index(). The code wants to calculate the length of the string,
but add one in case the string doesn't end with a slash.
The method it uses is incorrect, as it always checks the trailing NUL
character of the string. This will never be a '/', so we always add one.
In the event that we *do* have a trailing slash, this will create an
off-by-one length error later when using the skip value.
The most straightforward fix would be to correct the skip1 and skip2
lengths by using ends_with().
However, Johannes made a good point that the existing logic is wasting a
lot of computation. We generate the match string by copying the path in
and then skipping almost all of it immediately with a potentially
expensive memmove() from the strbuf_remove() call. We also re-initialize
the match stringbuf each time we call read_directory_contents.
The read_directory_contents really wants a path that is rooted at the
start of the directory scan. We're currently building this by taking the
full path and stripping out the start portion. Instead, replace this
logic by building up the portion of the match as we go.
Start by initializing two strbuf in diff_no_index containing the empty
string. Pass these into queue_diff, which in turn passes the appropriate
left or right side into read_directory_contents.
As before, we build up the matches by appending elements to the match
path and then clearing them using strbuf_setlen.
In the recursive portion of the queue_diff algorithm, we build up new
match paths the same way that we build up new buffer paths, by appending
the elements and then clearing them with strbuf_setlen after each
iteration. This is cheaper as it avoids repeated allocations, and is a
bit simpler to track what is going on.
Add a couple of test cases that pass in paths already ending in '/', to
ensure the tests cover this regression.
Reported-by: Johannes Schindelin <Johannes.Schindelin@gmx.de>
Closes: https://lore.kernel.org/git/c75ec5f9-407a-6555-d4fb-bb629d54ec61@gmx.de/
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
[jc: small leakfixes at the end of diff_no_index()]
Signed-off-by: Junio C Hamano <gitster@pobox.com>
(The two next paragraphs are taken from the previous commit.)
git-format-patch(1) supports Git notes by showing them beneath the
patch/commit message, similar to git-log(1). The command also supports
showing those same notes ref names in the range diff output.
Note *the same* ref names; any Git notes options or configuration
variables need to be handed off to the range-diff machinery. This works
correctly in the case when the range diff is on the cover letter. But it
does not work correctly when the output is a single patch with an
embedded range diff.
Concretely, git-format-patch(1) needs to pass `--[no-]notes` options on
to the range-diff subprocess in `range-diff.c`. Range diffs for single-
commit series are handled in `log-tree.c`. But `log-tree.c` had no
access to any `log_arg` variable before we added it to `rev_info` in the
previous commit.
Use that new struct member to fix this inconsistency.
Signed-off-by: Kristoffer Haugsbakk <code@khaugsbakk.name>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
git-format-patch(1) supports Git notes by showing them beneath the
patch/commit message, similar to git-log(1). The command also supports
showing those same notes ref names in the range diff output.
Note *the same* ref names; any Git notes options or configuration
variables need to be handed off to the range-diff machinery. This works
correctly in the case when the range diff is on the cover letter. But it
does not work correctly when the output is a single patch with an
embedded range diff.
Concretely, git-format-patch(1) needs to pass `--[no-]notes` options
on to the range-diff subprocess in `range-diff.c`. This is handled in
`builtin/log.c` by the local variable `log_arg` in the case of mul-
tiple commits, but not in the single commit case where there is no
cover letter and the range diff is embedded in the patch output; the
range diff is then made in `log-tree.c`, whither `log_arg` has not
been propagated. This means that the range-diff subprocess reverts
to its default behavior, which is to act like git-log(1) w.r.t. notes.
We need to fix this. But first lay the groundwork by converting
`log_arg` to a struct member; next we can simply use that member
in `log-tree.c` without having to thread it from `builtin/log.c`.
No functional changes.
Helped-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Kristoffer Haugsbakk <code@khaugsbakk.name>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Rename `other_arg` to `log_arg` in `range_diff_options` and
related places.
“Other argument” comes from bd361918 (range-diff: pass through --notes
to `git log`, 2019-11-20) which introduced Git notes handling to
git-range-diff(1) by passing that option on to git-log(1). And that kind
of name might be fine in a local context. However, it was initially
spread among multiple files, and is now[1] part of the
`range_diff_options` struct. It is, prima facie, difficult to guess what
“other” means, especially when just looking at the struct.
But with a little reading we find out that it is used for `--[no-]notes`
and `--diff-merges`, which are both passed on to git-log(1). We should
just rename it to reflect this role; `log_arg` suggests, along with the
`strvec` type, that it is used to pass extra arguments to git-log(1).
† 1: since f1ce6c19 (range-diff: combine all options in a single data
structure, 2021-02-05)
Suggested-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Kristoffer Haugsbakk <code@khaugsbakk.name>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
If, when the user edits a hunk, they change deletion lines into
context lines or vice versa, then the number of hunks that the edited
hunk can be split into may differ from the unedited hunk. This means
that so we should recalculate `hunk->splittable_into` after the hunk
has been edited. In practice users are unlikely to hit this bug as it
is doubtful that a user who has edited a hunk will split it afterwards.
Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When a hunk is split, each of the new hunks inherits whether it is
selected or not from the original hunk. If a selected hunk is split
all of the new hunks are marked as "selected" and the user is only
prompted with the first of the split hunks. The user is not asked
whether or not they wish to select the rest of the new hunks. This
means that if they wish to deselect any of the new hunks apart from
the first one they have to navigate back to the hunk they want to
deselect before they can deselect it. This is unfortunate as the user
is presumably splitting the original hunk because they only want to
select some sub-set of it.
Instead mark all the new hunks as "undecided" so that the user is
prompted whether they wish to select each one in turn. In the case
where the user only wants to change the selection of the first of
the split hunks they will now have to do more work re-selecting the
remaining split hunks. However, changing the selection of any of the
other newly created hunks is now much simpler as the user no-longer has
to navigate back to them in order to change their selected state.
Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
gitk uses classic (non-themed) spinboxes rather than the ttk variants.
Commit d93f1713b0 ("gitk: Use themed tk widgets", 2009-04-17) that added
ttk makes no mention of why ttk:spinboxes were omitted, but this leads
to an inconsistent interface. Let's use the ttk version.
Signed-off-by: Mark Levedahl <mlevedahl@gmail.com>
From user feedback, there was a request for examples, as well as a
comment that one person found "If git push [<repository>] without
any <refspec> argument is set to update some ref at the destination
with <src> with remote.<repository>.push configuration variable..."
impossible to understand.
To make the section easier to navigate, create a list of every possible
refspec form, with examples for each form as well as 2 forms which were
previously missing (patterns and negative refspecs).
Made a few changes to use more familiar language, but ultimately
refspecs are a pretty advanced feature so I've mostly left the
terminology alone.
Signed-off-by: Julia Evans <julia@jvns.ca>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Right now the rules for when a `git push` is allowed are buried at the
bottom of the description of `<refspec>`. Put them in their own section
so that we can reference them from `--force` and give some context for
why they exist.
Having the "PUSH RULES" section also lets us be a little bit more
specific with the rule in `--force`: we can just focus on the rule
for pushing for a branch (which is likely the one that's most relevant)
and leave the details about what happens when you push to a tag or a ref
that isn't a branch to the later section.
Signed-off-by: Julia Evans <julia@jvns.ca>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The `get_packed_git_mru()` function prepares the packfile store and then
returns its packfiles in most-recently-used order. Refactor it to accept
a packfile store instead of a repository to clarify its scope.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The `get_all_packs()` function prepares the packfile store and then
returns its packfiles. Refactor it to accept a packfile store instead of
a repository to clarify its scope.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The `get_packed_git()` function prepares the packfile store and then
returns its packfiles. Refactor it to accept a packfile store instead of
a repository to clarify its scope.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The `get_multi_pack_index()` function is declared and implemented in the
packfile subsystem, even though it really belongs into the multi-pack
index subsystem. The reason for this is likely that it needs to call
`packfile_store_prepare()`, which is not exposed by the packfile system.
In a subsequent commit we're about to add another caller outside of the
packfile system though, so we'll have to expose the function anyway.
Do so now already and move `get_multi_pack_index()` into the MIDX
subsystem.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We have a recurring pattern where we essentially perform an upsert of a
packfile in case it isn't yet known by the packfile store. The logic to
do so is non-trivial as we have to reconstruct the packfile's key, check
the map of packfiles, then create the new packfile and finally add it to
the store.
Introduce a new function that does this dance for us. Refactor callsites
to use it.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The `install_packed_git()` functions adds a packfile to a specific
object store. Refactor it to accept a packfile store instead of a
repository to clarify its scope.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In `reprepare_packed_git()` we perform a couple of operations:
- We reload alternate object directories.
- We clear the loose object cache.
- We reprepare packfiles.
While the logic is hosted in "packfile.c", it clearly reaches into other
subsystems that aren't related to packfiles.
Split up the responsibility and introduce `odb_reprepare()` which now
becomes responsible for repreparing the whole object database. The
existing `reprepare_packed_git()` function is refactored accordingly and
only cares about reloading the packfile store now.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The `prepare_packed_git()` function and its friends are responsible for
loading packfiles as well as the multi-pack index for a given object
database. Refactor these functions to accept a packfile store instead of
a repository to clarify their scope.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Reorder functions so that we can avoid a forward declaration of
`prepare_packed_git()`.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The object database tracks a cache of "kept" packfiles, which is used by
git-pack-objects(1) to handle cruft objects. With the introduction of
the `struct packfile_store` we have a better place to host this cache
though.
Move the cache accordingly.
This moves the last bit of packfile-related state from the object
database into the packfile store. Adapt the comment for the `packfiles`
pointer in `struct object_database` to reflect this.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The object database tracks the list of packfiles in most-recently-used
order, which is mostly used to favor reading from packfiles that contain
most of the objects that we're currently accessing. With the
introduction of the `struct packfile_store` we have a better place to
host this list though.
Move the list accordingly.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The object database tracks a map of packfiles by their respective paths,
which is used to figure out whether a given packfile has already been
loaded. With the introduction of the `struct packfile_store` we have a
better place to host this list though.
Move the map accordingly.
`pack_map_entry_cmp()` isn't used anywhere but in "packfile.c" anymore
after this change, so we convert it to a static function, as well. Note
that we also drop the `inline` hint: the function is used as a callback
function exclusively, and callbacks cannot be inlined.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The object database knows to skip re-initializing the list of packfiles
in case it's already been initialized. Whether or not that is the case
is tracked via a separate `initialized` bit that is stored in the object
database. With the introduction of the `struct packfile_store` we have a
better place to host this bit though.
Move it accordingly. While at it, convert the field into a boolean now
that we're allowed to use them in our code base.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The object database tracks the list of packfiles it currently knows
about. With the introduction of the `struct packfile_store` we have a
better place to host this list though.
Move the list accordingly. Extract the logic from `odb_clear()` that
knows to close all such packfiles and move it into the new subsystem, as
well.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Information about an object database's packfiles is currently
distributed across two different structures:
- `struct packed_git` contains the `next` pointer as well as the
`mru_head`, both of which serve to store the list of packfiles.
- `struct object_database` contains several fields that relate to the
packfiles.
So we don't really have a central data structure that tracks our
packfiles, and consequently responsibilities aren't always clear cut.
A consequence for the upcoming pluggable object databases is that this
makes it very hard to move management of packfiles from the object
database level down into the object database source.
Introduce a new `struct packfile_store` which is about to become the
single source of truth for managing packfiles. Right now this data
structure doesn't yet contain anything, but in subsequent patches we
will move all data structures that relate to packfiles and that are
currently contained in `struct object_database` into this new home.
Note that this is only a first step: most importantly, we won't (yet)
move the `struct packed_git::next` pointer around. This will happen in a
subsequent patch series though so that `struct packed_git` will really
only host information about the specific packfile it represents.
Further note that the new structure still sits at the wrong level at the
end of this patch series: as mentioned, it should eventually sit at the
level of the object database source, not at the object database level.
But introducing the packfile store now already makes it way easier to
eventually push down the now-selfcontained data structure by one level.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git subtree" (in contrib/) did not work correctly when splitting
squashed subtrees, which has been improved.
* cs/subtree-squash-split-fix:
contrib/subtree: fix split with squashed subtrees
Some among "git add -p" and friends ignored color.diff and/or
color.ui configuration variables, which is an old regression, which
has been corrected.
* jk/add-i-color:
contrib/diff-highlight: mention interactive.diffFilter
add-interactive: manually fall back color config to color.ui
add-interactive: respect color.diff for diff coloring
stash: pass --no-color to diff plumbing child processes
The "promisor-remote" capability mechanism has been updated to
allow the "partialCloneFilter" settings and the "token" value to be
communicated from the server side.
* cc/promisor-remote-capability:
promisor-remote: use string_list_split() in mark_remotes_as_accepted()
promisor-remote: allow a client to check fields
promisor-remote: use string_list_split() in filter_promisor_remote()
promisor-remote: refactor how we parse advertised fields
promisor-remote: use string constants for 'name' and 'url' too
promisor-remote: allow a server to advertise more fields
promisor-remote: refactor to get rid of 'struct strvec'
In an argc/argv pair, the entry for argv[argc] is generally NULL. You
can iterate by counting up to argc, or by looking for the NULL entry in
argv.
When we pass such a pair to setup_revisions(), it shrinks argc to
account for the options we consumed and returns the result to the
caller. But it doesn't touch the entries after the reduced argc. So
argv[argc] will be left pointing at some arbitrary entry rather than
NULL.
This isn't the source of any known bugs, since all callers are aware of
the limitation and act accordingly. But it's a possible gotcha that may
be easy to miss.
Let's set the new argv[argc] to NULL, taking care to free it if the
caller asked us to do so.
It is tempting to do likewise for all of the entries afterwards, too, as
some of them may also need to be freed (e.g., if coming from a strvec).
But doing so isn't entirely trivial, as we munge argc in the function
(e.g., when we find "--" and move all of the entries after it into the
prune_data list). It would be possible with some light refactoring, but
it's probably not worth it. Nobody should ever look at them (they are
beyond the revised argc and past the NULL argv entry) outside of strvec
cleanup, and setup_revisions_from_strvec() already handles this case.
There's one other interesting gotcha: many callers which do not want to
provide arguments just pass 0/NULL for argc/argv. We need to check for
this case before assigning the final NULL.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The previous commit converted callers of setup_revisions() with a strvec
to use the safer and easier _from_strvec() variant.
Let's now convert spots that don't directly have a strvec, but receive
an argc/argv pair that eventually comes from one. We'll instead pass the
strvec down to the point where we call setup_revisions().
That makes these functions slightly less flexible if they were to grow
other callers that don't use strvecs, but this rigidity is buying us
some safety. It is only safe to pass the free_removed_argv_elements
option to setup_revisions() if we know the elements of argv/argc are
allocated on the heap. That isn't communicated in the type system when
we are passed the bare elements. But if we get a strvec, we know that
the elements are allocated strings.
And at any rate, each of these modified functions has only a single
caller (that has a strvec), so the loss of flexibility is unlikely to
ever matter.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The previous commit introduced a wrapper to make using setup_revisions()
with a strvec easier and safer. It converted spots that were already
doing most of what the wrapper did.
Let's now convert spots where we were not setting up the
free_removed_argv_elements flag. As discussed in the previous commit,
this probably isn't fixing any bugs or leaks (since these sites wouldn't
trigger the re-shuffling of argv that causes them). This is mostly
future-proofing us against setup_revisions() becoming more aggressive
about its re-shuffling.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The setup_revisions() function was designed to take the argc/argv pair
from the operating system. But we sometimes construct our own argv using
a strvec and pass that in. There are a few gotchas that callers need to
deal with here:
1. You should always pass the free_removed_argv_elements option via
setup_revision_opt. Otherwise, entries may be leaked if
setup_revisions() re-shuffles options.
2. After setup_revisions() returns, the strvec state is odd. We get a
reduced argc from setup_revisions() telling us how many unknown
options were left in place. Entries after that in argv may be
retained, or may be NULL (depending on how the reshuffling
happened). But the strvec's "nr" field still represents the
original value, and some of the entries it thinks it is still
storing may be NULL. Callers must be careful with how they access
it.
Some callers deal with (1), but not all. In practice they are OK because
they do not pass any options that would cause setup_revisions() to
re-shuffle (namely unknown options which may be relayed from the user,
and the use of the "--" separator). But it's probably a good idea to
consistently pass this option anyway to future-proof ourselves against
the details of setup_revisions() changing.
No callers address (2), though I don't think there any visible bugs.
Most of them simply call strvec_clear() and never otherwise look at the
result. And in fact, if they naively set foo.nr to the argc returned by
setup_revisions(), that would cause leaks! Because setup_revisions()
does not free consumed options[1], we have to leave the "nr" field of
the strvec at its original value to find and free them during
strvec_clear().
So I don't think there are any bugs to fix here, but we can make things
safer and simpler for callers. Let's introduce a helper function that
sets the free_removed_argv_elements automatically and shrinks the strvec
to represent the retained options afterwards (taking care to free the
now-obsolete entries).
We'll start by converting all of the call-sites which use the
free_removed_argv_elements option. There should be no behavior change
for them, except that their "shrunken" entries are cleaned up
immediately, rather than waiting for a strvec_clear() call.
[1] Arguably setup_revisions() should be doing this step for us if we
told it to free removed options, but there are many existing callers
which will be broken if it did. Introducing this helper is a
possible first step towards that.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The setup_revisions() function takes an argc/argv pair and consumes
arguments from it, returning a reduced argc count to the caller. But it
may also overwrite entries within the argv array, as it shifts unknown
options to the front of argv (so they can be found in the range of
0..argc-1 after we return).
For a normal argc/argv coming from the operating system, this is OK.
We don't need to worry about memory ownership of the strings in those
entries. But some callers pass in allocated strings from a strvec, and
we do need to care about those.
We faced a similar issue in f92dbdbc6a (revisions API: don't leak memory
on argv elements that need free()-ing, 2022-08-02), which added an
option for callers to tell us that elements need to be freed. But the
implementation within setup_revisions() was incomplete. It only covered
the case of dropping "--", but not the movement of unknown options.
When we shift argv entries around, we should free the elements we are
about to overwrite, so they are not leaked. For example, in:
git stash show -p --invalid
we will pass this to setup_revisions():
argc = 3, argv[] = { "show", "-p", "--invalid", NULL }
which will then return:
argc = 2, argv[] = { "show", "--invalid", "--invalid", NULL }
overwriting the "-p" entry, which is leaked unless we free it at that
moment.
You can see in the output above another potential problem. We now have
two copies of the "--invalid" string. If the caller does not respect the
new argc when free-ing the strings via strvec_clear(), we'll get a
double-free. And git-stash suffers from this, and will crash with the
above command.
So it seems at first glance that the solution is to just assign the
reduced argc to the strvec.nr field in the caller. Then it would stop
after freeing only any copied entries. But that's not always right
either!
Remember that we are reducing "argc" to account for elements we've
consumed. So if there isn't an invalid option, we'd turn:
argc = 2, argv[] = { "show", "-p", NULL }
into:
argc = 1, argv[] = { "show", "-p", NULL }
In that case strvec_clear() must keep looking past the shortened argc we
return to find the original "-p" to free. It needs to use the original
argc to do that.
We can solve this by turning our argv writes into strict moves, not
copies. When we shuffle an unknown option to the front, we'll overwrite
its old position with NULL. That leaves an argv array that may have NULL
"holes" in it.
So in the "--invalid" example above we get:
argc = 2, argv[] = { "show", "--invalid", NULL, NULL }
but something like "git stash -p --invalid -p" would yield:
argc = 3, argv[] = { "show", "--invalid", NULL, "-p", NULL }
because we move "--invalid" to overwrite the first "-p", but the second
one is quietly consumed. But strvec_clear() can handle that fine (it
iterates over the "nr" field, and passing NULL to free() is OK).
To ease the implementation, I've introduced a helper function. It's a
little hacky because it must take a double-pointer to set the old
position to NULL. Which in turn means we cannot pass "&arg", our local
alias for the current entry we're parsing, but instead "&argv[i]", the
pointer in the original array. And to make it even more confusing, we
delegate some of this work to handle_revision_opt(), which is passed a
subset of the argv array, so is always working on "&argv[0]".
Likewise, because handle_revision_opt() only receives the part of argv
left to parse, it receives the array to accumulate unknown options as a
separate unkc/unkv pair. But we're always working on the same argv
array, so our strategy works fine. I suspect this would be a bit more
obvious (and avoid some pointer cleverness) if all functions saw the
full argv array and worked with positions within it (and our new helper
would take two positions, a src and dst). But that would involve
refactoring handle_revision_opt(). I punted on that, as what's here is
not too ugly and is all contained within revision.c itself.
The new test demonstrates that "git stash show -p --invalid" no longer
crashes with a double-free (because we move instead of copy). And it
passes with SANITIZE=leak because we free "-p" before overwriting.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In "git stash show", we do a first pass of parsing our command line
options by splitting them into revision args and stash args. These are
stored in strvecs, and we pass the revision args to setup_revisions().
But setup_revisions() may modify the argv we pass it, causing us to leak
some of the entries. In particular, if it sees a "--" string, that will
be dropped from argv. This is the same as other cases addressed by
f92dbdbc6a (revisions API: don't leak memory on argv elements that need
free()-ing, 2022-08-02), and we should fix it the same way: by passing
the free_removed_argv_elements option to setup_revisions().
The added test here is run only with SANITIZE=leak, without checking its
output, because the behavior of stash with "--" is a little odd:
1. Running "git stash show" will show --stat output. But running "git
stash show --" will show --patch.
2. I'd expect a non-option after "--" to be treated as a pathspec, so:
git stash show -p 1 -- foo
would look treat "1" as a stash (a synonym for stash@{1}) and
restrict the resulting diff to "foo". But it doesn't. We split the
revision/stash args without any regard to "--". So in the example
above both "1" and "foo" are stashes. Which is an error, but also:
git stash show -- foo
treats "foo" as a stash, not a pathspec.
These are both oddities that we may want to address (or may not, if we
want to retain historical quirks). But they are well outside the scope
of this patch. So for now we'll just let the tests confirm we aren't
leaking without otherwise expecting any behavior. If we later address
either of those points and end up with another test that covers "stash
show --", we can drop this leak-only test.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Update to 10e96bc (Merge pull request #127 from
pks-gitlab/pks-ci-improvements, 2025-09-22). This commit includes a
couple of changes:
- The GitHub CI has been updated to include a 32 bit CI job.
Furthermore, the jobs now compile with "-Werror" and more warnings
enabled.
- An issue was addressed where `uintptr_t` is not available on
NonStop [1].
- The clar selftests have been restructured so that it is now possible
to add small test suites more readily. This was done to add tests
for the above addressed issue, where we now use "%p" to print
pointers in a platform dependent way.
- An issue was addressed where the test output had a trailing
whitespace with certain output formats, which caused whitespace
issues in the test expectation files.
[1]: <01c101dc2842$38903640$a9b0a2c0$@nexbridge.com>
Reported-by: Randall S. Becker <rsbecker@nexbridge.com>
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
With `git config get --type=color` the user asks us to parse a specific
configuration key and turn the value into an ANSI color escape sequence.
The printed string can then for example be used as part of shell scripts
to reuse the same colors as Git.
Right now though we set up the auto-pager, which means that the string
may be written to the pager instead of directly to the terminal. This
behaviour is problematic for two reasons:
- Color codes are meant for direct terminal output; writing them into
a pager does not seem like a sensible thing to do without additional
text.
- It is inconsistent with `git config --get-color`, which never uses a
pager, despite the fact that we claim `git config get --type=color`
to be a drop-in replacement in git-config(1).
Fix this by disabling the pager when outputting color sequences.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Our documentation for git-config(1) has a section where it explains how
to parse and use colors as Git would configure them. In order to get the
ANSI color escape sequence to reset the colors to normal we recommend
the following command:
$ git config get --type=color --default="reset" ""
This command is not supposed to parse any configuration keys. Instead,
it is expected to parse the "reset" default value and turn it into a
proper ANSI color escape sequence.
It was reported though [1] that this command doesn't work:
$ git config get --type=color --default="reset" ""
error: key does not contain a section:
This error was introduced in 4e51389000 (builtin/config: introduce "get"
subcommand, 2024-05-06), where we introduced the "get" subcommand to
retrieve configuration values. The preimage of that commit used `git
config --get-color "" "reset"` instead, which still works.
This use case is really quite specific to parsing colors, as it wouldn't
make sense to give git-config(1) a default value and an empty config key
only to return that default value unmodified. But with `--type=color` we
don't return the value directly; we instead parse the value into an ANSI
escape sequence.
As such, we can easily special-case this one use case:
- If the provided config key is empty;
- the user is asking for a color code; and
- the user has provided a default value,
then we call `get_color()` directly. Do so to make the documented
command work as expected.
[1]: <aI+oQvQgnNtC6DVw@szeder.dev>
Reported-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When trying to parse an invalid color via `get_color()` we die. We're
about to introduce another caller in a subsequent commit though that has
its own error handling, so dying is a bit drastic there. Furthermore,
the only caller that we already have right now already knows to handle
errors in other branches that don't call `get_color()`.
Convert the function to instead return an error code to improve its
flexibility.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We have a couple of small style violations in t1300:
- An empty newline at the start of the test body.
- The test command is sometimes on the same line as the test name.
- The closing single-quote is sometimes on the same line as the last
command of the test.
Fix these.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
There are a bunch of tests in t1300 where we write the test expectation
handed over to `test_cmp ()` outside of the test body. This does not
match our modern test style, and there isn't really a reason why this
would need to happen outside of the test bodies.
Convert those to instead do so as part of the test itself. While at it,
normalize these tests to use `<<\EOF` for those that don't use variable
expansion and `<<-EOF` for those that aren't sensitive to indentation.
Note that there are two exceptions that we leave as-is for now since
they are reused across tests.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
On MacOS, a "wish" application started from the terminal opens in the
background, thus doesn't match user expectation that a newly-launched
application ought to be placed in the foreground. To address this
shortcoming, both gitk and git-gui use Apple Events to send a message to
"System Events" instructing it to foreground the "wish" application by
PID.
Unfortunately, MacOS 10.14 tightens restrictions on Apple Events,
requiring explicit granting of permission to control applications in
this fashion, and apparently such granting for "Automation" is not
allowed at all[1]. As a consequence gitk crashes outright at launch time
with a "Not authorized to send Apple events to System Events" error,
thus is entirely unusable on "Mojave".
In contrast, git-gui does not crash since it deliberately[2] catches and
ignores Apple Events errors. This does mean that git-gui will not
automatically become the foreground application on "Mojave", which is a
minor inconvenience but far better than crashing outright as gitk does.
Update gitk to catch and ignore Apple Events errors, mirroring git-gui's
behavior, to avoid this crash.
(Finding and implementing an alternate approach to foregrounding the
"wish" application on "Mojave" may be desirable but is outside the scope
of this crash fix.)
[1]: https://lore.kernel.org/git/D295145E-7596-4409-9681-D8ADBB9EBB0C@me.com/
[2]: https://lore.kernel.org/git/CABNJ2G+h3zh+=wLA0KHjUn8TsfhqUK1Kn-1_=6hnXVRJUPhuuA@mail.gmail.com/
Reported-by: Evgeny Cherpak <cherpake@me.com>
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Johannes Sixt <j6t@kdbg.org>
With stash.index=true, git-stash(1) command now tries to reinstate the
index by default in the "apply" and "pop" modes. Not doing so creates a
common trap [1], [2]: "git stash apply" is not the reverse of "git stash
push" because carefully staged indices are lost and have to be manually
recreated. OTOH, this mode is not always desirable and may create more
conflicts when applying stashes. As usual, "--no-index" will disable
this behavior if you set "stash.index".
[1]: https://lore.kernel.org/git/CAPx1GvcxyDDQmCssMjEnt6JoV6qPc5ZUpgPLX3mpUC_4PNYA1w@mail.gmail.com/
[2]: https://lore.kernel.org/git/c5a811ac-8cd3-c389-ac6d-29020a648c87@gmail.com/
Signed-off-by: D. Ben Knoble <ben.knoble+github@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
A subsequent commit will access a new config variable in the stash
subcommand implementations, which requires the variables to be declared
before the relevant functions. Prep with a pure refactoring change to
consolidate config-related globals with the rest of the globals.
Best-viewed-with: --color-moved
Signed-off-by: D. Ben Knoble <ben.knoble+github@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This is leftover from 787513027a (stash: Add --include-untracked option
to stash and remove all untracked files, 2011-06-24) when it was
converted in bbaa45c3aa (t3905: move all commands into test cases,
2021-02-08).
Signed-off-by: D. Ben Knoble <ben.knoble+github@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Skipping previous tests to work through only failing tests with
arguments like --run=4,122- causes some tests to fail because subdir
doesn't exist yet (it is created by a previous test; typically
"unstashing in a subdirectory"). Create it on demand for tests that need
it, but don't fail (-p) if the directory already exists.
Signed-off-by: D. Ben Knoble <ben.knoble+github@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add a test script, `t/t1463-refs-optimize.sh`, for the new `git refs
optimize` command.
This script acts as a simple driver, leveraging the shared test library
created in the preceding commit. It works by overriding the
`$pack_refs` variable to "refs optimize" and then sourcing the
shared library (`t/pack-refs-tests.sh`).
This approach ensures that `git refs optimize` is tested against the
entire comprehensive test suite of `git pack-refs`, verifying
that it acts as a compatible drop-in replacement.
Mentored-by: Patrick Steinhardt <ps@pks.im>
Mentored-by: shejialuo <shejialuo@gmail.com>
Signed-off-by: Meet Soni <meetsoni3017@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In preparation for adding tests for the new `git refs optimize` command,
refactor the existing t0601 test suite to make its logic shareable.
Move the core test logic from `t0601-reffiles-pack-refs.sh` into a new
`pack-refs-tests.sh` file. Inside this new script, replace hardcoded
calls to "pack-refs" with the `$pack_refs` variable.
The original `t0601-reffiles-pack-refs.sh` script now becomes a simple
"driver". It is responsible for setting the default value of the
variable and then sourcing the test library.
This new structure follows the established pattern used for sharing
tests between `git-for-each-ref` and `git-refs list` and prepares the
test suite for the `refs optimize` tests to be added in a subsequent
commit.
Mentored-by: Patrick Steinhardt <ps@pks.im>
Mentored-by: shejialuo <shejialuo@gmail.com>
Signed-off-by: Meet Soni <meetsoni3017@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
As part of the ongoing effort to consolidate reference handling,
introduce a new `optimize` subcommand. This command provides the same
functionality and exit-code behavior as `git pack-refs`, serving as its
modern replacement.
Implement `cmd_refs_optimize` by having it call the `pack_refs_core()`
helper function. This helper was factored out of the original
`cmd_pack_refs` in a preceding commit, allowing both commands to share
the same core logic as independent peers.
Add documentation for the new command. The man page leverages the shared
options file, created in a previous commit, by using the AsciiDoc
`include::` macro to ensure consistency with git-pack-refs(1).
Mentored-by: Patrick Steinhardt <ps@pks.im>
Mentored-by: shejialuo <shejialuo@gmail.com>
Signed-off-by: Meet Soni <meetsoni3017@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In preparation for adding documentation for `git refs optimize`, factor
out the common options from the `git-pack-refs` man page into a
shareable file `pack-refs-options.adoc` and update `git-pack-refs.adoc`
to use an `include::` macro.
This change is a pure refactoring and results in no change to the final
rendered documentation for `pack-refs`.
Mentored-by: Patrick Steinhardt <ps@pks.im>
Mentored-by: shejialuo <shejialuo@gmail.com>
Signed-off-by: Meet Soni <meetsoni3017@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The implementation of `git pack-refs` is monolithic within
`cmd_pack_refs()`, making it impossible to share its logic with other
commands. To enable code reuse for the upcoming `git refs optimize`
subcommand, refactor the core logic into a shared helper function.
Split the original `builtin/pack-refs.c` file into two parts:
- A new shared library file, `pack-refs.c`, which contains the
core option parsing and packing logic in a new `pack_refs_core()`
helper function.
- The original `builtin/pack-refs.c`, which is now a thin wrapper
responsible only for defining the `git pack-refs` command and
calling the shared helper.
A new `pack-refs.h` header is also introduced to define the public
interface for this shared logic.
Mentored-by: Patrick Steinhardt <ps@pks.im>
Mentored-by: shejialuo <shejialuo@gmail.com>
Signed-off-by: Meet Soni <meetsoni3017@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The `git pack-refs` command behaves generically, triggering a pack for
the 'files' backend and a compaction for the 'reftable' backend.
However, the name of the command and its corresponding API is
conceptually tied to the 'files' backend implementation.
To create a cleaner, more generic interface, refactor `git pack-refs` to
use the new `refs_optimize()` API. "Optimize" is a better semantic term
for this generic action.
This change allows `git pack-refs` to act as a backend-agnostic frontend
for reference optimization, and paves the way for the new `git refs
optimize` command to do the same.
Mentored-by: Patrick Steinhardt <ps@pks.im>
Mentored-by: shejialuo <shejialuo@gmail.com>
Signed-off-by: Meet Soni <meetsoni3017@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
To make the new generic `optimize` API fully functional, provide an
implementation for the 'reftable' reference backend.
For the reftable backend, the 'optimize' action is to compact its
tables. The existing `reftable_be_pack_refs()` function already provides
this logic, so the new `reftable_be_optimize()` function simply calls
it.
Wire up the new function to the `optimize` slot in the reftable
backend's virtual table.
Mentored-by: Patrick Steinhardt <ps@pks.im>
Mentored-by: shejialuo <shejialuo@gmail.com>
Signed-off-by: Meet Soni <meetsoni3017@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
With the generic `refs_optimize()` API now in place, provide the first
implementation for the 'files' reference backend. This makes the new API
functional for existing repositories and serves as the foundation for
migrating user-facing commands to the new architecture.
The implementation simply calls the existing `files_pack_refs()`
function, as 'packing' is the method used to optimize the files-based
reference store.
Wire up the new `files_optimize()` function to the `optimize` slot in
the files backend's virtual table.
Mentored-by: Patrick Steinhardt <ps@pks.im>
Mentored-by: shejialuo <shejialuo@gmail.com>
Signed-off-by: Meet Soni <meetsoni3017@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The existing `pack-refs` API is conceptually tied to the 'files'
backend, but its behavior is generic (e.g., it triggers compaction for
reftable). This naming is confusing.
Introduce a new generic refs_optimize() API that dispatches to a
backend-specific implementation via a new 'optimize' vtable method.
This lays the architectural groundwork for different reference backends
(like 'files' and 'reftable') to provide their own storage optimization
logic, which will be called from a single, generic entry point.
Mentored-by: Patrick Steinhardt <ps@pks.im>
Mentored-by: shejialuo <shejialuo@gmail.com>
Signed-off-by: Meet Soni <meetsoni3017@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
It is likely that those who came to Git after 3.0 switched the
default initial branch name to 'main' would still try to follow
tutorials that were written before 3.0 happened and with the
assumption that the tool would call the initial branch 'master'.
To help these new users after 3.0 boundary, let's retain one part of
the hint we will be giving before the default changes, namely, how
to rename the branch an unconfigured Git has created just once.
We do this without telling them how to permanently configure the
default name of the initial branch, and that design choice is very
much deliberate. The whole point of switching the default name was
because we did not want to force individual users to configure their
default branch name but while the hard wired default was 'master',
they _had_ to configure it away from 'master' in order to conform to
the recent norm, and a hint that tells them how to do so is useful.
But once the default is renamed to 'main', that no longer is true.
A narrower audience who are new users that follow an instruction
that assumes the initial branch name is 'master' would only need to
learn "here is how to change the branch name to match the tutorial
you are following in the repository you created for practice", and
"here is how you keep creating repositories with the first branch
with a name everybody hates" is unnecessary.
It also needs to be noted that the advise token to squelch the
message is the same advice.defaultBranchName as before, which is
also very much deliberate. The users who do have that configured
are those who _have_ been using Git since before 3.0, and they are
not the target audience for the new advice message. Reusing the
same advise token ensures that they do not have to turn the message
off.
Helped-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git range-diff" learned a way to limit the memory consumed by
O(N*N) cost matrix.
* pc/range-diff-memory-limit:
range-diff: add configurable memory limit for cost matrix
The clear_alloc_state() API function was not fully clearing the
structure for reuse, but since nobody reuses it, replace it with a
variant that frees the structure as well, making the callers simpler.
* ne/alloc-free-and-null:
alloc: fix dangling pointer in alloc_state cleanup
Adjust to the way newer versions of cURL selectivel enables tracing
options, so that our tests can continue to work.
* jk/curl-global-trace-components:
curl: add support for curl_global_trace() components
"git rev-parse --short" and friends failed to disambiguate two
objects with object names that share common prefix longer than 32
characters, which has been fixed.
* jc/longer-disambiguation-fix:
abbrev: allow extending beyond 32 chars to disambiguate
A corner case bug in "git log -L..." has been corrected.
* sg/line-log-boundary-fixes:
line-log: show all line ranges touched by the same diff range
line-log: fix assertion error
"git send-email" learned to drive "git imap-send" to store already
sent e-mails in an IMAP folder.
* ag/send-email-imap-sent:
send-email: enable copying emails to an IMAP folder without actually sending them
send-email: add ability to send a copy of sent emails to an IMAP folder
"core.commentChar=auto" that attempts to dynamically pick a
suitable comment character is non-workable, as it is too much
trouble to support for little benefit, and is marked as deprecated.
* pw/3.0-commentchar-auto-deprecation:
commit: print advice when core.commentString=auto
config: warn on core.commentString=auto
breaking-changes: deprecate support for core.commentString=auto
As the last commit deleted the only user of VERBATIM_MSG remove
it. This reverts remaining parts of commit f7d42ceec52 (rebase -i:
do leave commit message intact in fixup! chains, 2021-01-28) that
were not deleted by the last commit.
Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
If the user uses a prepare-commit-msg hook to add comments to the
commit message template and sets commit.cleanup to remove them when the
commit is created then the comments will not be removed when rebase
commits the final command in a chain of "fixup" commands[1]. This
happens because f7d42ceec52 (rebase -i: do leave commit message intact
in fixup! chains, 2021-01-28) started passing the VERBATIM_MSG flag
when committing the final command in a chain of "fixup" commands. That
change was added in response to a bug report[2] where the commit
message was being cleaned up when it should not be. The cause of that
bug was that before f7d42ceec52 the sequencer passed CLEANUP_MSG
when committing the final fixup. That commit should have simply
removed the CLEANUP_MSG flag, not changed it to VERBATIM_MSG. Using
VERBATIM_MSG ignores the user's commit.cleanup config when committing
the final fixup which means it behaves differently to an ordinary
"pick" command which respects commit.cleanup.
Fix this by not setting an explicit cleanup flag when committing the
final fixup which matches the way "pick" commands behave. The test
added in f7d42ceec52 is replaced with one that checks that "fixup"
and "pick" commands do not clean up the message when commit.cleanup
is not set and do clean up the message when it is set.
[1] https://lore.kernel.org/git/CA+itcS3DxbgpFy2aPRvHQvTAYE=dU0kfeDdidVwWLU=rBAWR4w@mail.gmail.com
[2] https://lore.kernel.org/git/CANVGpwZGbzYLMeMze64e_OU9p3bjyEgzC5thmNBr6LttBt+YGw@mail.gmail.com
Reported-by: Simon Cheng <cyqsimon@gmail.com>
Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The recently introduced new subcommand git-last-modified(1) runs into an
error in some scenarios. It then would exit with the message:
BUG: paths remaining beyond boundary in last-modified
This seems to happens for example when criss-cross merges are involved.
In that scenario, the function diff_tree_combined() gets called.
The function diff_tree_combined() copies the `struct diff_options` from
the input `struct rev_info` to override some flags. One flag is
`recursive`, which is always set to 1. This has been the case since the
inception of this function in af3feefa1d (diff-tree -c: show a merge
commit a bit more sensibly., 2006-01-24).
This behavior is incompatible with git-last-modified(1), when called
non-recursive (which is the default).
The last-modified machinery uses a hashmap for all the paths it wants to
get the last-modified commit for. Through log_tree_commit() the callback
mark_path() is called. The diff machinery uses diff_tree_combined()
internally, and due to it's recursive behavior the callback receives
entries inside subtrees, but not the subtree entries themselves. So a
directory is never expelled from the hashmap, and the BUG() statement
gets hit.
Because there are many callers calling into diff_tree_combined(), both
directly and indirectly, we cannot simply change it's behavior.
Instead, add a flag `no_recursive_diff_tree_combined` which supresses
the behavior of diff_tree_combined() to override `recursive` and set
this flag in builtin/last-modified.c.
Signed-off-by: Toon Claes <toon@iotcl.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This was written in e836757e14b (whatschanged: list it in
BreakingChanges document, 2025-05-12) which was on the same
topic that added the `--i-still-use-this` requirement.[1]
Maybe it was a work-in-progress comment/status.
[1]: jc/you-still-use-whatchanged
Signed-off-by: Kristoffer Haugsbakk <code@khaugsbakk.name>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The closest equivalent is `git log --raw --no-merges`.
Also change to “defaults” (implicit plural).
Signed-off-by: Kristoffer Haugsbakk <code@khaugsbakk.name>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Give the user a list of suggestions for what to do when they run a
deprecated command.
The first order of action will be to check the breaking changes
document;[1] this short error message says nothing about why this
command is deprecated, and in any case going into any kind of detail
might overwhelm the user.
Then they can find out if this has been discussed on the mailing list.
Then users who e.g. are using git-whatchanged(1) can learn that this is
arguably a plug-in replacement:
git log <opts> --raw --no-merges
Finally they are invited to send an email to the mailing list.
Also drop the “please add” part in favor of just using the “refusing”
die-message; these two would have been right after each other in this
new version.
Also drop “Thanks” since it now would require a new paragraph.
[1]: www.git-scm.com has a disclaimer for these internal documents that
says that “This information is specific to the Git project”. That’s
misleading in this particular case. But users are unlikely to get
discouraged from reading about why they (or their programs) cannot run a
command any more; it clearly concerns them.
Helped-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Kristoffer Haugsbakk <code@khaugsbakk.name>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The previous commit added tests for shadowing deprecated builtins.
Let’s make the test suite more complete by exercising a sample of
the builtins and in turn test the documentation for git-config(1):
To avoid confusion and troubles with script usage, aliases that hide
existing Git commands are ignored except for deprecated commands.
Signed-off-by: Kristoffer Haugsbakk <code@khaugsbakk.name>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
git-whatchanged(1) is deprecated and you need to pass
`--i-still-use-this` in order to force it to work as before.
There are two affected users, or usages:
1. people who use the command in scripts; and
2. people who are used to using it interactively.
For (1) the replacement is straightforward.[1] But people in (2) might
like the name or be really used to typing it.[3]
An obvious first thought is to suggest aliasing `whatchanged` to the
git-log(1) equivalent.[1] But this doesn’t work and is awkward since you
cannot shadow builtins via aliases.
Now you are left in an uncomfortable limbo; your alias won’t work until
the command is removed for good.
Let’s lift this limitation by allowing *deprecated* builtins to be
shadowed by aliases.
The only observed demand for aliasing has been for git-whatchanged(1),
not for git-pack-redundant(1). But let’s be consistent and treat all
deprecated commands the same.
[1]:
git log --raw --no-merges
With a minor caveat: you get different outputs if you happen to
have empty commits (no changes)[2]
[2]: https://lore.kernel.org/git/20250825085428.GA367101@coredump.intra.peff.net/
[3]: https://lore.kernel.org/git/BL3P221MB0449288C8B0FA448A227FD48833AA@BL3P221MB0449.NAMP221.PROD.OUTLOOK.COM/
Based-on-patch-by: Jeff King <peff@peff.net>
Signed-off-by: Kristoffer Haugsbakk <code@khaugsbakk.name>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We are about to complicate the command handling by allowing *deprecated*
builtins to be shadowed by aliases. We need to organize the code in
order to facilitate that.[1]
The code in the `while(1)` speculatively adds commands to the list
before finding out if it’s an alias. Let’s instead move it inside
`handle_alias(...)`—where it conceptually belongs anyway—and in turn
only run this logic when we have found an alias.[2]
[1]: We will do that with an additional call to `handle_alias(1)` inside
the loop. *Not* moving this code leaves a blind spot; we will miss
alias looping crafted via deprecated builtin names
[2]: Also rename the list to a more descriptive name
Based-on-patch-by: Jeff King <peff@peff.net>
Signed-off-by: Kristoffer Haugsbakk <code@khaugsbakk.name>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
With 145 builtin commands (according to `git --list-cmds=builtins`),
users are probably not keeping on top of which ones (if any) are
deprecated.
Let’s expand the experimental `--list-cmds`[1] to allow users and
programs to query for this information. We will also use this in an
upcoming commit to implement `is_deprecated_command`.
[1]: Using something which is experimental to query for deprecations is
perhaps not the most ideal approach, but it is simple to implement
and better than having to scan the documentation
Acked-by: Patrick Steinhardt <ps@pks.im>
Helped-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Kristoffer Haugsbakk <code@khaugsbakk.name>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
07572f220a8 (whatchanged: remove when built with WITH_BREAKING_CHANGES,
2025-05-12) set up the removal of git-whatchanged(1) when
`WITH_BREAKING_CHANGES` is active. Part of that work was removing it
from `commands` in `git.c`. But the Makefile still lists it as a
builtin. That leaves it in the limbo of being linked but not being
callable; you get the generic error about not being able to call it as
a *builtin*:
$ git whatchanged
fatal: cannot handle whatchanged as a builtin
instead of the expected:
$ git whatchanged
git: 'whatchanged' is not a git command. See 'git --help'.
Based-on-patch-by: Jeff King <peff@peff.net>
Signed-off-by: Kristoffer Haugsbakk <code@khaugsbakk.name>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
A '--signed-commits=<mode>' option is already available when using
`git fast-export` to decide what should be done at export time about
commit signatures. At import time though, there is no option, or
other way, in `git fast-import` to decide about commit signatures.
To remediate that, let's add a '--signed-commits=<mode>' option to
`git fast-import` too.
For now the supported <mode>s are the same as those supported by
`git fast-export`.
The code responsible for consuming a signature is refactored into
the import_one_signature() and discard_one_signature() functions,
which makes it easier to follow the logic and add new modes in the
future.
In the 'strip' and 'warn-strip' modes, we deliberately use
discard_one_signature() to discard the signature without parsing it.
This ensures that even malformed signatures, which would cause the
parser to fail, can be successfully stripped from a commit.
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The definition of 'enum sign_mode' as well as its parsing code are in
"builtin/fast-export.c". This was fine because `git fast-export` was the
only command with '--signed-tags=<mode>' or '--signed-commits=<mode>'
options.
In a following commit, we are going to add a similar option to `git
fast-import`, which will be simpler, easier and cleaner if we can reuse
the 'enum sign_mode' defintion and parsing code.
So let's move that definition and parsing code from
"builtin/fast-export.c" to "gpg-interface.{c,h}".
While at it, let's fix a small indentation issue with the arguments of
parse_opt_sign_mode().
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The previous commit added the necessary validation and checks for F/D
conflicts in the files backend when working on case insensitive systems.
There is still a possibility for D/F conflicts. This is a different from
the F/D since for F/D conflicts, there would not be a conflict during
the lock creation phase:
refs/heads/foo.lock
refs/heads/foo/bar.lock
However there would be a conflict when the locks are committed, since we
cannot have 'refs/heads/foo/bar' and 'refs/heads/foo'. These kinds of
conflicts are checked and resolved in
`refs_verify_refnames_available()`, so the previous commit ensured that
for case-insensitive filesystems, we would lowercase the inputs to that
function.
For D/F conflicts, there is a conflict during the lock creation phase
itself:
refs/heads/foo/bar.lock
refs/heads/foo.lock
As in `lock_raw_ref()` after creating the lock, we also check for D/F
conflicts. This can occur in case-insensitive filesystems when trying to
fetch case-conflicted references like:
refs/heads/Foo/new
refs/heads/foo
D/F conflicts can also occur in case-sensitive filesystems, when the
repository already contains a directory with a lock file
'refs/heads/foo/bar.lock' and trying to fetch 'refs/heads/foo'. This
doesn't concern directories containing garbage files as those are
handled on a higher level.
To fix this, simply categorize the error as a name conflict. Also remove
this reference from the list of valid refnames for availability checks.
By categorizing the error and removing it from the list of valid
references, batched updates now knows to reject such reference updates
and apply the other reference updates.
Fix a small typo in `ref_transaction_maybe_set_rejected()` while here.
Helped-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When using the files-backend on case-insensitive filesystems, there is
possibility of hitting F/D conflicts when creating references within a
single transaction, such as:
- 'refs/heads/foo'
- 'refs/heads/Foo/bar'
Ideally such conflicts are caught in `refs_verify_refnames_available()`
which is responsible for checking F/D conflicts within a given
transaction. This utility function is shared across the reference
backends. As such, it doesn't consider the issues of using a
case-insensitive file system, which only affects the files-backend.
While one solution would be to make the function aware of such issues,
this feels like leaking implementation details of file-backend specific
issues into the utility function. So opt for the more simpler option, of
lowercasing all references sent to this function when on a
case-insensitive filesystem and operating on the files-backend.
To do this, simply use a `struct strbuf` to convert the refname to
lowercase and append it to the list of refnames to be checked. Since we
use a `struct strbuf` and the memory is cleared right after, make sure
that the string list duplicates all provided string.
Without this change, the user would simply be left with a repository
with '.lock' files which were created in the 'prepare' phase of the
transaction, as the 'commit' phase would simply abort and not do the
necessary cleanup.
Reported-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When fetching references into a repository, if a lock for a particular
reference exists, then `lock_raw_ref()` throws:
- REF_TRANSACTION_ERROR_CASE_CONFLICT: when there is a conflict
because the transaction contains conflicting references while being
on a case-insensitive filesystem.
- REF_TRANSACTION_ERROR_GENERIC: for all other errors.
The latter causes the entire set of batched updates to fail, even in
case sensitive filessystems.
Instead, return a 'REF_TRANSACTION_ERROR_CREATE_EXISTS' error. This
allows batched updates to reject the individual update which conflicts
with the existing file, while updating the rest of the references.
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
During the 'prepare' phase of a reference transaction in the files
backend, we create the lock files for references to be created. When
using batched updates on case-insensitive filesystems, the entire
batched updates would be aborted if there are conflicting names such as:
refs/heads/Foo
refs/heads/foo
This affects all commands which were migrated to use batched updates in
Git 2.51, including 'git-fetch(1)' and 'git-receive-pack(1)'. Before
that, reference updates would be applied serially with one transaction
used per update. When users fetched multiple references on
case-insensitive systems, subsequent references would simply overwrite
any earlier references. So when fetching:
refs/heads/foo: 5f34ec0bfeac225b1c854340257a65b106f70ea6
refs/heads/Foo: ec3053b0977e83d9b67fc32c4527a117953994f3
refs/heads/sample: 2eefd1150e06d8fca1ddfa684dec016f36bf4e56
The user would simply end up with:
refs/heads/foo: ec3053b0977e83d9b67fc32c4527a117953994f3
refs/heads/sample: 2eefd1150e06d8fca1ddfa684dec016f36bf4e56
This is buggy behavior since the user is never informed about the
overrides performed and missing references. Nevertheless, the user is
left with a working repository with a subset of the references. Since
Git 2.51, in such situations fetches would simply fail without updating
any references. Which is also buggy behavior and worse off since the
user is left without any references.
The error is triggered in `lock_raw_ref()` where the files backend
attempts to create a lock file. When a lock file already exists the
function returns a 'REF_TRANSACTION_ERROR_GENERIC'. When this happens,
the entire batched updates, not individual operation, is aborted as if
it were in a transaction.
Change this to return 'REF_TRANSACTION_ERROR_CASE_CONFLICT' instead to
aid the batched update mechanism to simply reject such errors. The
change only affects batched updates since batched updates will reject
individual updates with non-generic errors. So specifically this would
only affect:
1. git fetch
2. git receive-pack
3. git update-ref --batch-updates
This bubbles the error type up to `files_transaction_prepare()` which
tries to lock each reference update. So if the locking fails, we check
if the rejection type can be ignored, which is done by calling
`ref_transaction_maybe_set_rejected()`.
As the error type is now 'REF_TRANSACTION_ERROR_CASE_CONFLICT',
the specific reference update would simply be rejected, while other
updates in the transaction would continue to be applied. This allows
partial application of references in case-insensitive filesystems when
fetching colliding references.
While the earlier implementation allowed the last reference to be
applied overriding the initial references, this change would allow the
first reference to be applied while rejecting consequent collisions.
This should be an okay compromise since with the files backend, there is
no scenario possible where we would retain all colliding references.
Let's also be more proactive and notify users on case-insensitive
filesystems about such problems by providing a brief about the issue
while also recommending using the reftable backend, which doesn't have
the same issue.
Reported-by: Joe Drew <joe.drew@indexexchange.com>
Helped-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
If I run
git send-email --compose --reply-to 'ME <my@address.net>' .....
and edit the intro message, then it will get two copies of the Reply-To
field. gmail.com rejects such messages.
This happens because send-email reads the edited message examining the
headers. For recognised headers the content is extracted to use in
constructing the final message and for possible inclusion in the patch
emails. Unrecognised headers are gathered (in @xh) to be passed through
uninterpreted.
Unfortunately "Reply-To" is not recognised in this process so it is
added to @xh as an uninterpreted header, but also generated from the
$reply_to variable in gen_header(), resulting in two copies
Add parsing to the loop in pre_process_file() to recognise a Reply-to
header and to store the result in $reply_to. This means that the
intro message will not get a second header and also means that
any changes made to the Reply-To header during editing will be
incorporated in the $reply_to variable and so included in all the
generated email messages.
Signed-off-by: NeilBrown <neil@brown.name>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The "git config --get-colorbool foo.bar" command not only digs in the
config to find the value of foo.bar, it evaluates the result using
want_color() to check the tty-ness of stdout.
But it stores the bool result of want_color() in the same git_colorbool
that we found in the config. This works in practice because the
git_colorbool enum is a superset of the bool values. But it is an oddity
from a type system perspective.
Let's instead store the result in a separate bool and use that.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Most of the diff code stores the decision about whether to show color as
a git_colorbool, and evaluates it at point-of-use with want_color().
This timing is important for reasons explained in daa0c3d971 (color:
delay auto-color decision until point of use, 2011-08-17).
The add-interactive code instead converts immediately to strict boolean
values using want_color(), and then evaluates those. This isn't wrong.
Even though we pass the bool values to diff_use_color(), which expects a
colorbool, the values are compatible. But it is unlike the rest of the
color code, and is questionable from a type-system perspective (but C's
typing between enums, ints, and bools is weak enough that the compiler
does not complain).
Let's switch it to the more usual way of calling want_color() at the
point of use.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The point of want_color() is to take in a git_colorbool enum value and
collapse it down to a single true/false boolean, letting UNKNOWN fall
back to the color.ui default and checking isatty() for AUTO.
Let's make that more clear in the type system by returning a bool rather
than an integer.
This sadly still does not help us much with compiler warnings for using
the two types interchangeably. But it helps make the intent more clear
to a human reader.
We still retain the idempotency of want_color(), because in C a bool
true/false converts to 1/0 when converted to an integer, which
corresponds to GIT_COLOR_ALWAYS and GIT_COLOR_NEVER. So you can store
the bool in a git_colorbool and get the right result (something a few
pieces of code still do, but which we'll clean up in further patches).
Note that we rely on this same bool/int conversion for
check_auto_color(). We cache its results in a tristate int with "-1" as
"not yet set", but we can assign to it (and return it) with implicit
conversions to/from bool.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We traditionally used "int" to store and pass around the values defined
by "enum git_colorbool" (which were originally just #define macros).
Using an int doesn't produce incorrect results, but using the actual
enum makes the intent of the code more clear.
It would be nice if the compiler could catch cases where we used the
enum and an int interchangeably, since it's very easy to accidentally
check the boolean true/false of a colorbool like:
if (branch_use_color)
This is wrong because GIT_COLOR_UNKNOWN and GIT_COLOR_AUTO evaluate to
true in C, even though we may ultimately decide not to use color. But C
is pretty happy to convert between ints and enums (even with various
-Wenum-* warnings). So this sadly doesn't protect us from such mistakes,
but it hopefully does make the code easier to read.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Merges contributions made from three different addresses:
- win@wincent.com (old address, initial contributions in 2007–2009)
- greg@hurrell.net (personal address matching full name, so this one is
the "forever" address; contributions made starting in 2018)
- greg.hurrell@datadoghq.com (current work address, used for recent
contributions)
Signed-off-by: Greg Hurrell <greg.hurrell@datadoghq.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When we see "%C(auto)" as a format placeholder, we evaluate the "color"
field of our pretty_print_context to decide whether we want color. The
auto_color field of format_commit_context then stores the boolean result
of want_color(), telling us the yes/no of whether we want color.
But the resulting field is passed to various functions which expect a
git_colorbool, like diff_get_color(), that will then pass it to
want_color() again. It's not wrong to do so, since want_color() is
idempotent. But it makes it harder to reason about the types, since we
sometimes confuse colorbools and strict booleans.
Let's instead store auto_color as the original colorbool itself. We'll
have to make sure it is passed through want_color() when it is
evaluated, but there is only one such spot (right next to where we
assign it!). Every other caller just ends up passing it to get
diff_get_color() either directly or through another helper.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In emit_hunk_header(), we evaluate ecbdata->color_diff both as a
git_colorbool, passing it to diff_get_color():
const char *reset = diff_get_color(ecbdata->color_diff, DIFF_RESET);
and as a strict boolean:
const char *reverse = ecbdata->color_diff ? GIT_COLOR_REVERSE : "";
At first glance this seems wrong. Usually we store the color decision as
a git_colorbool, so the second line would get confused by GIT_COLOR_AUTO
(which is boolean true, but may still mean we do not produce color).
However, the second line is correct because our caller sets color_diff
using want_color(), which collapses the colorbool to a strict true/false
boolean. The first line is _also_ correct because of the idempotence of
want_color(). Even though diff_get_color() will pass our true/false
value through want_color() again, the result will be left untouched.
But let's pass through the colorbool itself, which makes it more
consistent with the rest of the diff code. We'll need to then call
want_color() whenever we treat it as a boolean, but there is only such
spot (the one quoted above).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We pass the use_color parameter of fill_metainfo() as a strict boolean,
using:
want_color(o->use_color) && !pgm
to derive its value. But then inside the function, we pass it to
diff_get_color(), which expects one of the git_colorbool enum values,
and so feeds it to want_color() again.
Even though want_color() produces a strict 0/1 boolean, this doesn't
produce wrong results because want_color() is idempotent. Since
GIT_COLOR_ALWAYS and NEVER are defined as 1 and 0, and because
want_color() passes through those values, evaluating "want_color(foo)"
and "want_color(want_color(foo))" will return the same result.
But as part of a longer strategy to align the types we use for storing
these values, let's pass through the colorbool directly. To handle the
"&&" case here, we'll convert the presence of "pgm" into "NEVER", which
arguably makes the intent of the code more clear anyway.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We disable --color-moved if color is not in use at all. This happens in
diff_setup_done(), where we set options->color_moved to 0 if
options->use_color is not true. But a strict boolean check here is not
correct; use_color could be GIT_COLOR_UNKNOWN or GIT_COLOR_AUTO, both of
which evaluate to true, even though we may later decide not to show
colors.
We should be using want_color() to convert that git_colorbool into a
true boolean. As it turns out, this does not produce wrong output. Even
though we go to the trouble to detect the moved lines, ultimately we get
the color values from diff_get_color(), which does check want_color().
And so it returns the empty string for each color, and we "color" the
result with nothing.
So the output is correct, but there is a small but measurable
performance cost to doing the line detection. E.g., in git.git before
and after this patch (there are no colors shown because hyperfine
redirects output to /dev/null):
Benchmark 1: ./git.old log --no-merges -p --color-moved -1000
Time (mean ± σ): 1.019 s ± 0.013 s [User: 0.955 s, System: 0.064 s]
Range (min … max): 1.005 s … 1.045 s 10 runs
Benchmark 2: ./git.new log --no-merges -p --color-moved -1000
Time (mean ± σ): 982.9 ms ± 14.5 ms [User: 925.8 ms, System: 57.1 ms]
Range (min … max): 965.1 ms … 1003.2 ms 10 runs
Summary
./git.new log --no-merges -p --color-moved -1000 ran
1.04 ± 0.02 times faster than ./git.old log --no-merges -p --color-moved -1000
Note that the fix is not quite as simple as just calling want_color()
from diff_setup_done(). There's a subtle timing issue that goes back to
daa0c3d971 (color: delay auto-color decision until point of use,
2011-08-17), the commit that adds want_color() in the first place. As
discussed there, we must delay evaluating the colorbool value until all
pager setup is complete.
So instead, we'll leave the "color_moved" field intact in diff_setup_done(),
and modify the point where it is evaluated. Fortunately there is only
one such spot that controls whether we run any of the color-moved code
at all.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In diff_flush_patch_all_file_pairs(), we set o->emitted_symbols if and
only if o->color_moved is true. That causes the lower-level routines to
fill up o->emitted_symbols, which we then analyze in order to do the
actual colorizing.
But in that final step, we do:
if (o->emitted_symbols) {
if (o->color_moved) {
...actual coloring...
}
...clean up of emitted_symbols...
}
The inner "if" will always trigger, since we set emitted_symbols only
when doing color_moved (it is a little confusing that it is set inside
the diff_options struct, but that is for convenience of passing it to
the lower-level routines; we always clear it at the end of flushing,
since 48edf3a02a (diff: clear emitted_symbols flag after use,
2019-01-24)).
Let's simplify the code a bit by just dropping the inner "if" and
running its block unconditionally.
In theory the current code might be useful if another feature besides
color_moved setup and used emitted_symbols, but it would be easy to
refactor later to handle that. And in the meantime, this makes further
work in this area easier.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In show_line(), we check to see if colors are desired with just:
if (opt->color)
...we want colors...
But this is incorrect. The color field here is really a git_colorbool,
so it may be "true" for GIT_COLOR_UNKNOWN or GIT_COLOR_AUTO. Either of
those _might_ end up true eventually (once we apply default fallbacks
and check stdout's tty), but they may not. E.g.:
git grep foo | cat
will enter the conditional even though we're not going to show colors.
We should collapse it into a true boolean by calling want_color().
It turns out that this does not produce a user-visible bug. We do some
extra processing to isolate the matched portion of the line in order to
colorize it, but ultimately we pass it to our output_color() helper,
which does correctly check want_color(). So we end up with no colors.
But dropping the extra processing saves a measurable amount of time. For
example, running under hyperfine (which redirects to /dev/null, and thus
does not colorize):
Benchmark 1: ./git.old grep a
Time (mean ± σ): 58.7 ms ± 3.5 ms [User: 580.6 ms, System: 74.3 ms]
Range (min … max): 53.5 ms … 67.1 ms 48 runs
Benchmark 2: ./git.new grep a
Time (mean ± σ): 35.5 ms ± 0.9 ms [User: 276.8 ms, System: 73.8 ms]
Range (min … max): 34.3 ms … 39.3 ms 79 runs
Summary
./git.new grep a ran
1.65 ± 0.11 times faster than ./git.old grep a
That's a fairly extreme benchmark, just because it will come up with a
ton of small matches, but it shows that this really does matter.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The git_config_colorbool() function returns an integer which is always
one of the GIT_COLOR_* constants UNKNOWN, NEVER, ALWAYS, or AUTO. We
define these constants with macros, but let's switch to using an enum.
Even though the compiler does not strictly enforce enum/int conversions,
this should make the intent clearer to human readers. And as a bonus,
enum names are typically available to debuggers, making it more pleasant
to step through the code there.
This patch updates the return type of git_config_colorbool(), but holds
off on updating all of the callers. There's some trickiness to some of
them, and in the meantime it's perfectly fine to assign an enum into an
int.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Long ago Git's decision to show color for a subsytem was stored in a
tri-state variable: it could be true (1), false (0), or unknown (-1).
But since daa0c3d971 (color: delay auto-color decision until point of
use, 2011-08-17) we want to carry around a new state, "auto", which
bases the decision on the tty-ness of stdout (rather than collapsing
that "auto" state to a true/false immediately).
That commit introduced a set of GIT_COLOR_* defines to represent each
state: UNKNOWN, ALWAYS, NEVER, and AUTO. But it only used the AUTO
value, and left alone code using bare 0/1/-1 values. And of course since
then we've grown many new spots that use those bare values.
Let's switch all of these to use the named constants. That should make
the code a bit easier to read, as it is more obvious that we're
representing a color decision.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* jk/add-i-color:
contrib/diff-highlight: mention interactive.diffFilter
add-interactive: manually fall back color config to color.ui
add-interactive: respect color.diff for diff coloring
stash: pass --no-color to diff plumbing child processes
Transactions are managed via the {begin,end}_odb_transaction() function
in the object-file subsystem and its implementation is specific to the
files object source. Introduce odb_transaction_{begin,commit}() in the
odb subsystem to provide an eventual object source agnostic means to
manage transactions.
Update call sites to instead manage transactions through the odb
subsystem. Also rename {begin,end}_odb_transaction() functions to
object_file_transaction_{begin,commit}() to clarify the object source it
supports.
Signed-off-by: Justin Tobler <jltobler@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Update the names of several functions and types relocated from the
bulk-checkin subsystem for better clarity. Also drop
finish_tmp_packfile() as a standalone function in favor of embedding it
in flush_packfile_transaction() directly.
Signed-off-by: Justin Tobler <jltobler@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The bulk-checkin subsystem provides various functions to manage ODB
transactions. Apart from {begin,end}_odb_transaction(), these functions
are only used by the object-file subsystem to manage aspects of a
transaction implementation specific to the files object source.
Relocate all the transaction code in bulk-checkin to object-file. This
simplifies the exposed transaction interface by reducing it to only
{begin,end}_odb_transaction(). Function and type names are adjusted in
the subsequent commit to better fit the new location.
Signed-off-by: Justin Tobler <jltobler@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Object database transactions can be explicitly flushed via
flush_odb_transaction() without actually completing the transaction.
This makes the provided transactional interface a bit awkward. Now that
there are no longer any flush_odb_transaction() call sites, drop the
function to simplify the interface and further ensure that a transaction
is only finalized when end_odb_transaction() is invoked.
Signed-off-by: Justin Tobler <jltobler@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
With 23a3a303 (update-index: use the bulk-checkin infrastructure,
2022-04-04), object database transactions were added to
git-update-index(1) to facilitate writing objects in bulk. With
transactions, newly added objects are instead written to a temporary
object directory and migrated to the primary object database upon
transaction commit.
When the --verbose option is specified, the subsequent set of objects
written are explicitly flushed via flush_odb_transaction() prior to
reporting the update. Flushing the object database transaction migrates
pending objects to the primary object database without marking the
transaction as complete. This is done so objects are immediately visible
to git-update-index(1) callers using the --verbose option and that rely
on parsing verbose output to know when objects are written.
Due to how git-update-index(1) parses arguments, options that come after
a filename are not considered during the object update. Therefore, it
may not be known ahead of time whether the --verbose option is present
and thus object writes are considered transactional by default until a
--verbose option is parsed.
Flushing a transaction after individual object writes negates the
benefit of writing objects to a transaction in the first place.
Furthermore, the mechanism to flush a transaction without actually
committing is rather awkward. Drop the call to flush_odb_transaction()
in favor of ending the transaction altogether when the --verbose flag is
encountered. Subsequent object writes occur outside of a transaction and
are therefore immediately visible which matches the current behavior.
Signed-off-by: Justin Tobler <jltobler@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
ODB transactions support being nested. Only the outermost
{begin,end}_odb_transaction() start and finish a transaction. This
allows internal object write codepaths to be optimized with ODB
transactions without worrying about whether a transaction is already
active. When {begin,end}_odb_transaction() is invoked during an active
transaction, these operations are essentially treated as no-ops. This
can make the interface a bit awkward to use, as calling
end_odb_transaction() does not guarantee that a transaction is actually
ended. Thus, in situations where a transaction needs to be explicitly
flushed, flush_odb_transaction() must be used.
To remove the need for an explicit transaction flush operation via
flush_odb_transaction() and better clarify transaction semantics, drop
the transaction nesting mechanism in favor of begin_odb_transaction()
returning a NULL transaction value to signal it was a no-op, and
end_odb_transaction() behaving as a no-op when a NULL transaction value
is passed. This is safe for existing callers as the transaction value
wired to end_odb_transaction() already comes from
begin_odb_transaction() and thus continues the same no-op behavior when
a transaction is already pending. With this model, passing a pending
transaction to end_odb_transaction() ensures it is committed at that
point in time.
Signed-off-by: Justin Tobler <jltobler@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
With the current implementation of 'git sparse-checkout clean', we
notice that a file that was in a conflicted state does not get cleaned
up because of some internal details around the SKIP_WORKTREE bit.
This test is documenting the current behavior before we update it in the
following change.
Signed-off-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In my experience, the most-common reason that the sparse index must
expand to a full one is because there is some leftover file in a tracked
directory that is now outside of the sparse-checkout. The new 'git
sparse-checkout clean' command will find and delete these directories,
so point users to it when they hit the sparse index expansion advice.
Signed-off-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The 'git sparse-checkout clean' subcommand is focused on directories,
deleting any tracked sparse directories to clean up the worktree and
make the sparse index feature work optimally.
However, this directory-focused approach can leave users wondering why
those directories exist at all. In my experience, these files are left
over due to ignore or exclude patterns, Windows file handles, or
possibly merge conflict resolutions.
Add a new '--verbose' option for users to see all the files that are
being deleted (with '--force') or would be deleted (with '--dry-run').
Based on usage, users may request further context on this list of files for
states such as tracked/untracked, unstaged/staged/conflicted, etc.
Signed-off-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
A broken or malicious "git fetch" can say that it has the same
object for many many times, and the upload-pack serving it can
exhaust memory storing them redundantly, which has been corrected.
* ps/upload-pack-oom-protection:
upload-pack: don't ACK non-commits repeatedly in protocol v2
t5530: modernize tests
Fixes multiple crashes around midx write-out codepaths.
* ds/midx-write-fixes:
midx-write: simplify error cases
midx-write: reenable signed comparison errors
midx-write: use uint32_t for preferred_pack_idx
midx-write: use cleanup when incremental midx fails
midx-write: put failing response value back
midx-write: only load initialized packs
"repo info" learns a short-hand option "-z" that is the same as
"--format=nul", and learns to report the objects format used in the
repository.
* lo/repo-info-step-2:
repo: add the field objects.format
repo: add the flag -z as an alias for --format=nul
The bulk-checkin code used to depend on a file-scope static
singleton variable, which has been updated to pass an instance
throughout the callchain.
* jt/de-global-bulk-checkin:
bulk-checkin: use repository variable from transaction
bulk-checkin: require transaction for index_blob_bulk_checkin()
bulk-checkin: remove global transaction state
bulk-checkin: introduce object database transaction structure
When a remote tracking branch is deleted (e.g., via 'git push --delete
origin branch'), the headids array entry for that branch is removed, but
upstreamofref may still reference it. This causes gitk to show an error
and prevents the Tags and Heads view from opening.
Fix by checking that headids($upstreamofref($n)) exists before accessing
it in the refill_reflist function.
Signed-off-by: Michael Rappazzo <rappazzo@gmail.com>
Signed-off-by: Johannes Sixt <j6t@kdbg.org>
Instead of scanning for the remaining items to see if there are
still commits to be explored in the queue, use khash to remember
which items are still on the queue (an unacceptable alternative is
to reserve one object flag bits).
* rs/describe-with-lazy-queue-and-oidset:
describe: use oidset in finish_depth_computation()
Windows "real-time monitoring" interferes with the execution of
tests and affects negatively in both correctness and performance,
which has been disabled in Gitlab CI.
* ps/gitlab-ci-disable-windows-monitoring:
gitlab-ci: disable realtime monitoring to unbreak Windows jobs
"git refs exists" that works like "git show-ref --exists" has been
added.
* ms/refs-exists:
t: add test for git refs exists subcommand
t1422: refactor tests to be shareable
t1403: split 'show-ref --exists' tests into a separate file
builtin/refs: add 'exists' subcommand
Further code clean-up for multi-pack-index code paths.
* ps/object-store-midx-dedup-info:
midx: compute paths via their source
midx: stop duplicating info redundant with its owning source
midx: write multi-pack indices via their source
midx: load multi-pack indices via their source
midx: drop redundant `struct repository` parameter
odb: simplify calling `link_alt_odb_entry()`
odb: return newly created in-memory sources
odb: consistently use "dir" to refer to alternate's directory
odb: allow `odb_find_source()` to fail
odb: store locality in object database sources
Documentation for "git add" has been updated.
* je/doc-add:
doc: rephrase the purpose of the staging area
doc: git-add: simplify discussion of ignored files
doc: git-add: clarify intro & add an example
There is sometimes a need to visit every file within a directory,
recursively. The main example is remove_dir_recursively(), though it has
some extra flags that make it want to iterate over paths in a custom
way. There is also the fill_directory() approach but that involves an
index and a pathspec.
This change adds a new for_each_file_in_dir() method that will be
helpful in the next change.
Signed-off-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The 'git sparse-checkout clean' subcommand is somewhat similar to 'git
clean' in that it will delete files that should not be in the worktree.
The big difference is that it focuses on the directories that should not
be in the worktree due to cone-mode sparse-checkout. It also does not
discriminate in the kinds of files and focuses on deleting entire
directories.
However, there are some restrictions that would be good to bring over
from 'git clean', specifically how it refuses to do anything without the
'-f'/'--force' or '-n'/'--dry-run' arguments. The 'clean.requireForce'
config can be set to 'false' to imply '--force'.
Add this behavior to avoid accidental deletion of files that cannot be
recovered from Git.
Signed-off-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When users change their sparse-checkout definitions to add new
directories and remove old ones, there may be a few reasons why
directories no longer in scope remain (ignored or excluded files still
exist, Windows handles are still open, etc.). When these files still
exist, the sparse index feature notices that a tracked, but sparse,
directory still exists on disk and thus the index expands. This causes a
performance hit _and_ the advice printed isn't very helpful. Using 'git
clean' isn't enough (generally '-dfx' may be needed) but also this may
not be sufficient.
Add a new subcommand to 'git sparse-checkout' that removes these
tracked-but-sparse directories.
The implementation details provide a clear definition of what is happening,
but it is difficult to describe this without including the internal
implementation details. The core operation converts the index to a sparse
index (in memory if not already on disk) and then deletes any directories in
the worktree that correspond with a sparse directory entry in that sparse
index.
In the most common case, this means that a file will be removed if it is
contained within a directory that is both tracked and outside of the
sparse-checkout definition. However, there can be exceptions depending on
the current state of the index:
* If the worktree has a modification to a tracked, sparse file, then that
file's parent directories will be expanded instead of represented as
sparse directories. Siblings of those parent directories may be
considered sparse.
* If the user staged a sparse file with "git add --sparse", then that file
loses the SKIP_WORKTREE bit until the sparse-checkout is reapplied. Until
then, that file's parent directories are not represented as sparse
directory entries and thus will not be removed. Siblings of those parent
directories may be considered sparse. (There may be other reasons why
the SKIP_WORKTREE bit was removed for a file and this impact on the
sparse directories will apply to those as well.)
* If the user has a merge conflict outside of the sparse-checkout
definition, then those conflict entries prevent the parent directories
from being represented as sparse directory entries and thus are not
removed.
* The cases above present reasons why certain _file conditions_ will impact
which _directories_ are considered sparse. The list of tracked
directories that are outside of the sparse-checkout definition but not
represented as a sparse directory further reduces the list of files that
will be removed.
For these complicated reasons, the documentation details a potential list of
files that will be "considered for removal" instead of defining the list
concretely. The special cases can be handled by resolving conflicts,
committing staged changes, and running 'git sparse-checkout reapply' to
update the SKIP_WORKTREE bits as expected by the sparse-checkout definition.
It is important to make clear that this operation will remove ignored and
excluded files which would normally be ignored even by 'git clean -f' unless
the '-x' or '-X' option is provided. This is the most extreme method for
doing this, but it works when the sparse-checkout is in cone mode and is
expected to rescope based on directories, not files.
The current implementation always deletes these sparse directories
without warning. This is unacceptable for a released version, but those
features will be added in changes coming immediately after this one.
Note that this will not remove an untracked directory (or any of its
contents) if its parent is a tracked directory within the sparse-checkout
definition. This is required to prevent removing data created by tools that
perform caching operations for editors or build tools.
Thus, 'git sparse-checkout clean' is both more aggressive and more careful
than 'git clean -fx':
* It is more aggressive because it will remove _tracked_ files within the
sparse directories.
* It is less aggressive because it will leave _untracked_ files that are
not contained in sparse directories.
These special cases will be handled more explicitly in a future change that
expands tests for the 'git sparse-checkout clean' command. We handle some of
the modified, staged, and committed states including some impact on 'git
status' after cleaning.
Signed-off-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The logic for the 'git sparse-checkout' builtin uses the_repository all
over the place, despite some use of a repository struct in different
method parameters. Complete this removal of the_repository by using
'repo' when possible.
In one place, there was already a local variable 'r' that was set to
the_repository, so move that to a method parameter.
We cannot remove the USE_THE_REPOSITORY_VARIABLE declaration as we are
still using global constants for the state of the sparse-checkout.
Signed-off-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Our "documentation" CI jobs, unsurprisingly, performs a couple of tests
on our documentation. The job knows to not only test the documentation
generated by our Makefile, but also by Meson.
In the latter case with Meson we end up building the whole project,
including all of the binaries. This is of course quite excessive and a
waste of compute cycles, as we don't care about these binaries at all.
Fix this by using the new "docs" target that we introduced in the
preceding commit.
Reported-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Our documentation can be built with either Asciidoc or Asciidoctor as
backend. When Meson is configured to build documentation, then it will
automatically detect which of these tools is available and use them.
It's not obvious to the user though which of these backends is used
unless the user explicitly asks for one backend via `-Ddocs_backend=`.
Improve the status quo by printing the docs backend as part of the
"backends" summary.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Meson does not currently provide a target to compile documentation,
only. Instead, users needs to compile the whole project, which may be
way more than they really intend to do.
Introduce a new "docs" alias to plug this gap. This alias can be invoked
e.g. with `meson compile docs`.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In the Git 2.51 release cycle we've refactored the object database layer
to access objects via `struct object_database` directly. To make the
transition a bit easier we have retained some of the old-style functions
in case those were widely used.
Now that Git 2.51 has been released it's time to clean up though and
drop these old wrappers. Do so and adapt the small number of newly added
users to use the new functions instead.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Update clar to fcbed04 (Merge pull request #123 from
pks-gitlab/pks-sandbox-ubsan, 2025-09-10). The most significant changes
since the last version include:
- Fixed platform support for HP-UX.
- Fixes for how clar handles the `-q` flag.
- A couple of leak fixes for reported clar errors.
- A new `cl_invoke()` function that retains line information.
- New infrastructure to create temporary directories.
- Improved printing of error messages so that all lines are now
properly indented.
- Proper selftests for the clar.
Most of these changes are somewhat irrelevant to us, but neither do we
have to adjust to any of these changes, either. What _is_ interesting to
us though is especially the fixed support for HP-UX, and eventually we
may also want to use `cl_invoke()`.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
98ba49ccc2 (subtree: fix split processing with multiple subtrees
present, 2023-12-01) increases the performance of
git subtree split --prefix=subA
by ignoring subtree merges which are outside of `subA/`. It also
introduces a regression. Subtree merges that should be retained
are incorrectly ignored if they:
1. are nested under `subA/`; and
2. are merged with `--squash`.
For example, a subtree merged like:
git subtree merge --squash --prefix=subA/subB "$rev"
# ^^^^^^^^ ^^^^
is erroneously ignored during a split of `subA`. This causes
missing tree files and different commit hashes starting in
git v2.44.0-rc0.
The method:
should_ignore_subtree_split_commit REV
should test only a single commit REV, but the combination of
git log -1 --grep=...
actually searches all *parent* commits until a `--grep` match is
discovered.
Rewrite this method to test only one REV at a time. Extract commit
information with a single `git` call as opposed to three. The
`test` conditions for rejecting a commit remain unchanged.
Unit tests now cover nested subtrees.
Signed-off-by: Colin Stagner <ask+git@howdoi.land>
Acked-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
68061e34702 (fast-import: disallow "feature export-marks" by default,
2019-08-29) added the documentation for this option. The second
paragraph is a literal block but it looks like it should just be
a regular paragraph.
Signed-off-by: Kristoffer Haugsbakk <code@khaugsbakk.name>
Acked-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
From user feedback on this section: 3 users don't know what "tree-ish"
means and 3 users don't know what "pathspec" means. One user also says
that the section is very confusing and that they don't understand what
the "index" is.
From conversations on Mastodon, several users said that their impression
is that "the index" means the same thing as "HEAD". It would be good to
give those users (and other users who do not know what "index" means) a
hint as to its meaning.
Make this section more accessible to users who don't know what the terms
"pathspec", "tree-ish", and "index" mean by using more familiar language,
adding examples, and using simpler sentence structures.
Signed-off-by: Julia Evans <julia@jvns.ca>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
From user feedback: one user mentioned that "When the <tree-ish> (most
often a commit) is not given" is confusing since it starts with a
negative.
Restructuring so that `git checkout main file.txt` and
`git checkout file.txt` are separate items will help us simplify the
sentence structure a lot.
As a bonus, it appears that `-f` actually only applies to one of those
forms, so we can include fewer options, and now the structure of the
DESCRIPTION matches the SYNOPSIS.
Signed-off-by: Julia Evans <julia@jvns.ca>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
From user feedback: several users say they don't understand the use case
for `--detach`. It's probably not realistic to explain the use case for
detached HEAD state here, but we can improve the situation.
Explain how `git checkout --detach` is different from
`git checkout <branch>` instead of copying over the description from
`git checkout <branch>`, since `git checkout <branch>` will be a
familiar command to many readers.
Signed-off-by: Julia Evans <julia@jvns.ca>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
From user feedback: several users reported having trouble understanding
the difference between `-b` and `-B` ("I think it's because my brain
expects it to contrast with `-b`, but instead it starts off explaining
how they're the same").
Also, in `-B`, 2 users can't tell what the branch is reset *to*.
Simplify the sentence structure in the explanations of `-b` and `-B` and
add a little extra information (what `<start-point>` is, what the branch
is reset to).
Splitting up `-b` and `-B` into separate items helps simplify the
sentence structure since there's less "In this case...".
Replace the long "the branch is not reset/created unless "git checkout"
is successful..." with just "will fail", since we should generally
assume that Git will fail operations in a clean way and not leave
operations half-finished, and that cases where it does not fail cleanly
are the exceptions that the documentation should flag.
Signed-off-by: Julia Evans <julia@jvns.ca>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
From user feedback: several users commented that "Local modifications
to the files in the working tree are kept, so that they can be committed
to the <branch>." didn't seem accurate to them, since
`git checkout <branch>` will often fail.
One user also thought that "... and by pointing HEAD at the branch"
was something that _they_ had to do somehow ("How do I point HEAD at
a branch?") rather than a description of what the `git checkout`
operation is doing for them.
Explain when `git checkout <branch>` will fail and clarify that
"pointing HEAD at the branch" is part of what the command does.
6 users commented that the "You could omit <branch>..." section is
extremely confusing. Explain this in a much more direct way.
Signed-off-by: Julia Evans <julia@jvns.ca>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
There's no need to use the terms "pathspec" or "tree-ish" in the
ARGUMENT DISAMBIGUATION section, which are terms that (from user
feedback on this page) many users do not understand.
"tree-ish" is actually not accurate here: `git checkout` in this case
takes a commit-ish, not a tree-ish. So we can say "branch or commit"
instead of "tree-ish" which is both more accurate and uses more familiar
terms.
And now that the intro to the man pages mentions that `git checkout` has
"two main modes", it makes sense to refer to this disambiguation section
to understand how Git decides which one to use when there's an overlap
in syntax.
Signed-off-by: Julia Evans <julia@jvns.ca>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
From user feedback: in the first paragraph, 5 users reported not
understanding the terms "pathspec" and 1 user reported not understanding
the term "HEAD". Of the users who said they didn't know what "pathspec"
means, 3 said they couldn't understand what the paragraph was trying to
communicate as a result.
One user also commented that "If no pathspec was given..." makes
`git checkout <branch>` sounds like a special edge case, instead of
being one of the most common ways to use this core Git command.
It looks like the goal of this paragraph is to communicate that `git
checkout` has two different modes: one where you switch branches and one
where you just update your working directory files/index. So say that
directly, and use more familiar language (including examples) to say it.
Signed-off-by: Julia Evans <julia@jvns.ca>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
get_oid_with_context() allows specifying flags and reports object
details via a passed-in struct object_context. Some callers just want
to specify flags, but don't need any details back. Convert them to
repo_get_oid_with_flags(), which provides just that and frees them from
dealing with the context structure.
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* 'master' of https://github.com/j6t/git-gui:
git-gui: sync Makefiles with git.git
git-gui: fix error handling of Revert Changes command
git-gui--askyesno (mingw): use Git for Windows' icon, if available
git-gui--askyesno: allow overriding the window title
git gui: set GIT_ASKPASS=git-gui--askpass if not set yet
git-gui: provide question helper for retry fallback on Windows
git-gui: simplify using nice(1)
git-gui: simplify PATH de-duplication
* 'master' of https://github.com/j6t/gitk:
gitk: add README with usage, build, and contribution details
gitk: fix trackpad scrolling for Tcl/Tk 8.7+
gitk: use <Button-3> for ctx menus on macOS with Tcl 8.7+
As the tests are all run in separate repositories, set the branch
name to "master" when creating the repository for the tests where
the result depends on the branch name. In order to make it easier to
change the branch name in the future a helper function is used. This
reduces the number of tests that depend on the default branch name
being "master" and removes the last instance of a test file using
"GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=master".
Helped-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Remove the penultimate use of "GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=
master" in our test suite. We have slowly been removing these ever
since we started to switch the default branch name used in tests to
"main".
Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Remove one of the last remaining uses of
"TEST_GIT_DEFAULT_INITIAL_BRANCH= main" in the test suite. We have
been steadily be converting tests from using "master" as the default
branch name since the introduction of TEST_GIT_DEFAULT_INITIAL_BRANCH
in 704fed9ea22 (tests: start moving to a different default main branch
name, 2020-10-23) The changes here are purely mechanical replacing
"master" with "main"
Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Since 1296cbe4b46 (init: document `init.defaultBranch` better,
2020-12-11) "git-init.adoc" has advertised that the default name
of the initial branch may change in the future. The name "main"
is chosen to match the default used by the big Git forge web sites.
The advice printed when init.defaultBranch is not set is updated
to say that the default will change to "main" in Git 3.0. Building
with WITH_BREAKING_CHANGES enabled removes the advice and changes
the default branch name to "main". The code in guess_remote_head()
that looks for "refs/heads/master" is left unchanged as that is only
called when the remote server does not support the symref capability
in the v0 protocol or the symref extension to the ls-refs list in the
v2 protocol. Such an old server is more likely to be using "master"
as the default branch name.
With the exception of the "git-init.adoc" the documentation is left
unchanged. I had hoped to parameterize the name of the default branch
by using an asciidoc attribute. Unfortunately attribute expansion
is inhibited by backticks and we use backticks to mark up ref names
so that idea does not work. As the changes to git-init.adoc show
inserting ifdef's around each instance of the branch name "master"
is cumbersome and makes the documentation sources harder to read.
Apart from "git-init.adoc" there are some other files where "master" is
used as the name of the initial branch rather than as an example of a
branch name such as "user-manual.adoc" and "gitcore-tutorial.adoc". The
name appears a lot in those so updating it with ifdef's is not really
practical. We can update that document in the 3.0 release cycle. The
other documentation where master is used as an example branch name
can be gradually converted over time.
Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* jt/de-global-bulk-checkin:
bulk-checkin: use repository variable from transaction
bulk-checkin: require transaction for index_blob_bulk_checkin()
bulk-checkin: remove global transaction state
bulk-checkin: introduce object database transaction structure
A new command "git last-modified" has been added to show the closest
ancestor commit that touched each path.
* tc/last-modified:
last-modified: use Bloom filters when available
t/perf: add last-modified perf script
last-modified: new subcommand to show when files were last modified
"git ls-files <pathspec>..." should not necessarily have to expand
the index fully if a sparsified directory is excluded by the
pathspec; the code is taught to expand the index on demand to avoid
this.
* ds/ls-files-lazy-unsparse:
ls-files: conditionally leave index sparse
"git repack --path-walk" lost objects in some corner cases, which
has been corrected.
* ds/path-walk-repack-fix:
path-walk: create initializer for path lists
path-walk: fix setup of pending objects
Inspired by Ezekiel's recent effort to showcase Rust interface, the
hash function implementation used to hash lines have been updated
to the one used for ELF symbol lookup by Glibc.
* am/xdiff-hash-tweak:
xdiff: optimize xdl_hash_record_verbatim
xdiff: refactor xdl_hash_record()
Makefile tried to run multiple "cargo build" which would not work
very well; serialize their execution to work it around.
* da/cargo-serialize:
Makefile: build libgit-rs and libgit-sys serially
When the README for diff-highlight was written, there was no way to
trigger it for the `add -p` interactive patch mode. We've since grown a
feature to support that, but it was documented only on the Git side.
Let's also let people coming the other direction, from diff-highlight,
know that it's an option.
Suggested-by: Isaac Oscar Gariano <IsaacOscar@live.com.au>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Color options like color.interactive and color.diff should fall back to
the value of color.ui if they aren't set. In add-interactive, we check
the specific options (e.g., color.diff) via repo_config_get_value(),
which does not depend on the main command having loaded any color config
via the git_config() callback mechanism.
But then we call want_color() on the result; if our specific config is
unset then that function uses the value of git_use_color_default. That
variable is typically set from color.ui by the git_color_config()
callback, which is called by the main command in its own git_config()
callback function.
This works fine for "add -p", whose add_config() callback calls into
git_color_config(). But it doesn't work for other commands like
"checkout -p", which is otherwise unaware of color at all. People tend
not to notice because the default is "auto", and that's what they'd set
color.ui to as well. But something like:
git -c color.ui=false checkout -p
should disable color, and it doesn't.
This regression goes back to 0527ccb1b5 (add -i: default to the built-in
implementation, 2021-11-30). In the perl version we got the color config
from "git config --get-colorbool", which did the full lookup for us.
The obvious fix is for git-checkout to add a call to git_color_config()
to its own config callback. But we'd have to do so for every command
with this problem, which is error-prone. Let's see if we can fix it more
centrally.
It is tempting to teach want_color() to look up the value of
repo_config_get_value("color.ui") itself. But I think that would have
disastrous consequences. Plumbing commands, especially older ones, avoid
porcelain config like "color.*" by simply not parsing it in their config
callbacks. Looking up the value of color.ui under the hood would
undermine that.
Instead, let's do that lookup in the add-interactive setup code. We're
already demand-loading other color config there, which is probably fine
(even in a plumbing command like "git reset", the interactive mode is
inherently porcelain-ish). That catches all commands that use the
interactive code, whether they were calling git_color_config()
themselves or not.
Reported-by: Isaac Oscar Gariano <isaacoscar@live.com.au>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The old perl git-add--interactive.perl script used the color.diff config
option to decide whether to color diffs (and if not set, it fell back to
the value of color.ui via git-config's --get-colorbool option). When we
switched to the builtin version, this was lost: we respect only
color.ui. So for example:
git -c color.diff=false add -p
would color the diff, even when it should not.
The culprit is this line in add-interactive.c's parse_diff():
if (want_color_fd(1, -1))
That "-1" means "no config has been set", which causes it to fall back
to the color.ui setting. We should instead be passing the value of
color.diff. But the problem is that we never even parse that config
option!
Instead the builtin interactive code parses only the value of
color.interactive, which is used for prompts and other messages. One
could perhaps argue that this should cover interactive diff coloring,
too, but historically it did not. The perl script treated
color.interactive and color.diff separately. So we should grab the
values for both, keeping separate fields in our add_i_state variable,
rather than a single use_color field.
We also load individual color slots (e.g., color.interactive.prompt),
leaving them as the empty string when color is disabled. This happens
via the init_color() helper in add-interactive, which checks that
use_color field. Now that there are two such fields, we need to pass the
appropriate one for each color.
The colors are mostly easy to divide up; color.interactive.* follows
color.interactive, and color.diff.* follows color.diff. But the "reset"
color is tricky. It is used for both types of coloring, but the two can
be configured independently. So we introduce two separate reset colors,
and use each in the appropriate spot.
There are two new tests. The first enables interactive prompt colors but
disables color.diff. We should see a colored prompt but not a colored
diff, showing that we are now respecting color.diff (and not
color.interactive or color.ui).
The second does the opposite. We disable color.interactive but turn on
color.diff with a custom fragment color. When we split a hunk, the
interactive code has to re-color the hunk header, which lets us check
that we correctly loaded the color.diff.frag config based on color.diff,
not color.interactive.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
After a partial stash, we may clear out the working tree by capturing
the output of diff-tree and piping it into git-apply (and likewise we
may use diff-index to restore the index). So we most definitely do not
want color diff output from that diff-tree process. And it normally
would not produce any, since its stdout is not going to a tty, and the
default value of color.ui is "auto".
However, if GIT_PAGER_IN_USE is set in the environment, that overrides
the tty check, and we'll produce a colorized diff that chokes git-apply:
$ echo y | GIT_PAGER_IN_USE=1 git stash -p
[...]
Saved working directory and index state WIP on main: 4f2e2bb foo
error: No valid patches in input (allow with "--allow-empty")
Cannot remove worktree changes
Setting this variable is a relatively silly thing to do, and not
something most users would run into. But we sometimes do it in our tests
to stimulate color. And it is a user-visible bug, so let's fix it rather
than work around it in the tests.
The root issue here is that diff-tree (and other diff plumbing) should
probably not ever produce color by default. It does so not by parsing
color.ui, but because of the baked-in "auto" default from 4c7f1819b3
(make color.ui default to 'auto', 2013-06-10). But changing that is
risky; we've had discussions back and forth on the topic over the years.
E.g.:
https://lore.kernel.org/git/86D0A377-8AFD-460D-A90E-6327C6934DFC@gmail.com/.
So let's accept that as the status quo for now and protect ourselves by
passing --no-color to the child processes. This is the same thing we did
for add-interactive itself in 1c6ffb546b (add--interactive.perl: specify
--no-color explicitly, 2020-09-07).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Previous commits replaced some strbuf_split*() calls with calls to
string_list_split*() in "promisor-remote.c".
For consistency, let's also replace the strbuf_split_str() call in
mark_remotes_as_accepted() with a call to string_list_split(), as we
don't need the splitted strings to be managed by a `struct strbuf`.
Using the lighter-weight `string_list` API is enough for our needs.
While at it let's remove a useless call to `strbuf_strip_suffix()`.
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
A previous commit allowed a server to pass additional fields through
the "promisor-remote" protocol capability after the "name" and "url"
fields, specifically the "partialCloneFilter" and "token" fields.
Let's make it possible for a client to check if these fields match
what it expects before accepting a promisor remote.
We allow this by introducing a new "promisor.checkFields"
configuration variable. It should contain a comma or space separated
list of fields that will be checked.
By limiting the protocol to specific well-defined fields, we ensure
both server and client have a shared understanding of field
semantics and usage.
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
A previous commit introduced a new parse_one_advertised_remote()
function that takes a `const char *` argument. This function is called
from filter_promisor_remote() and parses all the fields for one remote.
This means that in filter_promisor_remote() we no longer need to split
the remote information that will be passed to
parse_one_advertised_remote() into an array of relatively heavy and
complex `struct strbuf`.
To use something lighter, let's then replace strbuf_split_str() with
string_list_split() in filter_promisor_remote() to parse the remote
information that is passed to parse_one_advertised_remote().
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In a follow up commit we are going to parse more fields, like a filter
and a token, coming from the server when it advertises promisor remotes
using the "promisor-remote" capability.
To prepare for this, let's refactor the code that parses the advertised
fields coming from the server into a new parse_one_advertised_remote()
function that will populate a `struct promisor_info` with the content
of the fields it parsed.
While at it, let's also pass this `struct promisor_info` to the
should_accept_remote() function, instead of passing it the parsed name
and url.
These changes will make it simpler to both parse more fields and access
the content of these parsed fields in follow up commits.
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
A previous commit started to define `promisor_field_filter` and
`promisor_field_token`, and used them instead of the
"partialCloneFilter" and "token" string literals.
Let's do the same for "name" and "url" to avoid repeating them
several times and for consistency with the other fields.
For skipping "name=" or "url=" in advertisements, let's introduce
a skip_field_name_prefix() helper function to keep parsing clean
and easy to understand.
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
For now the "promisor-remote" protocol capability can only pass "name"
and "url" information from a server to a client in the form
"name=<remote_name>,url=<remote_url>".
To allow clients to make more informed decisions about which promisor
remotes they accept, let's make it possible to pass more information
by introducing a new "promisor.sendFields" configuration variable.
On the server side, information about a remote `foo` is stored in
configuration variables named `remote.foo.<variable-name>`. To make
it clearer and simpler, we use `field` and `field name` like this:
* `field name` refers to the <variable-name> part of such a
configuration variable, and
* `field` refers to both the `field name` and the value of such a
configuration variable.
The "promisor.sendFields" configuration variable should contain a
comma or space separated list of field names that will be looked up
in the configuration of the remote on the server to find the values
that will be passed to the client.
Only a set of predefined field names are allowed. The only field
names in this set are "partialCloneFilter" and "token". The
"partialCloneFilter" field name specifies the filter definition used
by the promisor remote, and the "token" field name can provide an
authentication credential for accessing it.
For example, if "promisor.sendFields" is set to "partialCloneFilter",
and the server has the "remote.foo.partialCloneFilter" config
variable set to a value, then that value will be passed in the
"partialCloneFilter" field in the form "partialCloneFilter=<value>"
after the "name" and "url" fields.
A following commit will allow the client to use the information to
decide if it accepts the remote or not. For now the client doesn't do
anything with the additional information it receives.
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In a following commit, we will use the new 'promisor-remote' protocol
capability introduced by d460267613 (Add 'promisor-remote' capability
to protocol v2, 2025-02-18) to pass and process more information
about promisor remotes than just their name and url.
For that purpose, we will need to store information about other
fields, especially information that might or might not be available
for different promisor remotes. Unfortunately using 'struct strvec',
as we currently do, to store information about the promisor remotes
with one 'struct strvec' for each field like "name" or "url" does not
scale easily in that case. We would need one 'struct strvec' for each
new field, and then we would have to pass all these 'struct strvec'
around.
Let's refactor this and introduce a new 'struct promisor_info'.
It will only store promisor remote information in its members. For now
it has only a 'name' member for the promisor remote name and an 'url'
member for its URL. We will use a 'struct string_list' to store the
instances of 'struct promisor_info'. For each 'item' in the
string_list, 'item->string' will point to the promisor remote name and
'item->util' will point to the corresponding 'struct promisor_info'
instance.
Explicit members are used within 'struct promisor_info' for type
safety and clarity regarding the specific information being handled,
rather than a generic key-value store. We want to specify and document
each field and its content, so adding new members to the struct as
more fields are supported is fine.
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In git.git, commit 5309c1e9fb39 (Makefile: set default goals in
makefiles, 2025-02-15) touched two Makefiles in the git-git/ directory.
Import these changes, so that the trees can converge again with the
next merge of this repository into git.git.
Reported-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Johannes Sixt <j6t@kdbg.org>
* js/ask-yesno:
git-gui--askyesno (mingw): use Git for Windows' icon, if available
git-gui--askyesno: allow overriding the window title
git gui: set GIT_ASKPASS=git-gui--askpass if not set yet
git-gui: provide question helper for retry fallback on Windows
Signed-off-by: Johannes Sixt <j6t@kdbg.org>
When a client performs a fetch or clone they can optionally send "have"
lines to tell the server which objects they already have available
locally. These object IDs are stored by the server in an object array so
that it can remember any objects it doesn't have to include in the pack
sent to the client.
While there isn't any reason to do so, clients are free to send the same
"have" line repeatedly. git-upload-pack(1) already knows to handle this
well: every commit it has seen via a "have" line gets marked with the
`THEY_HAVE` flag, and if such a commit is seen repeatedly we know to not
process it another time. This also has the effect that we only store the
object ID once, only, in the `have_obj` array.
There is an edge case though: if the client sends an object ID that does
not refer to a commit we neither store nor check the `THEY_HAVE` flag.
This means that we repeatedly store the same object ID in our `have_obj`
array, with two consequences:
- In protocol v2 we deduplicate ACKs for commits, but not for any
other objects as we send ACKs for every object ID in the `have_obj`
array.
- The `have_obj` array can grow in size indefinitely with both
protocols.
The potentially-more-serious issue is the second one, as we basically
have a way for an adversary to allocate arbitrarily large buffers now.
Ultimately, this doesn't seem to be all that serious though: on my
machine, the growth of that array is at around 4MB/s, and after roughly
five minutes I was only at 1GB RSS. So this is concerning, but only
mildly so.
Fix this bug by storing the `THEY_HAVE` flag independent of the object
type so that we don't store duplicate object IDs in `have_obj` anymore.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Refactor tests to follow modern best practices:
- Merge together tests that set up and verify a single use case.
- Drop empty newlines at the beginning and end of test bodies.
- Don't change directories in the main test body.
- Remove an unused `D` variable.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The write_midx_internal() method uses gotos to jump to a cleanup section to
clear memory before returning 'result'. Since these jumps are more common
for error conditions, initialize 'result' to -1 and then only set it to 0
before returning with success. There are a couple places where we return
with success via a jump.
This has the added benefit that the method now returns -1 on error instead
of an inconsistent 1 or -1.
Signed-off-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Remove the remaining signed comparison warnings in midx-write.c so that
they can be enforced as errors in the future. After the previous change,
the remaining errors are due to iterator variables named 'i'.
The strategy here involves defining the variable within the for loop
syntax to make sure we use the appropriate bitness for the loop
sentinel. This matters in at least one method where the variable was
compared to uint32_t in some loops and size_t in others.
While adjusting these loops, there were some where the loop boundary was
checking against a uint32_t value _plus one_. These were replaced with
non-strict comparisons, but also the value is checked to not be
UINT32_MAX. Since the value is the number of incremental multi-pack-
indexes, this is not a meaningful restriction. The new die() is about
defensive programming more than it being realistically possible.
Signed-off-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
midx-write.c has the DISABLE_SIGN_COMPARE_WARNINGS macro defined for a
few reasons, but the biggest one is the use of a signed
preferred_pack_idx member inside the write_midx_context struct. The code
currently uses -1 to indicate an unset preferred pack but pack int ids
are normally handled as uint32_t. There are also a few loops that search
for the preferred pack by name and those iterators will need updates to
uint32_t in the next change.
For now, replace the use of -1 with a 'NO_PREFERRED_PACK' macro and an
equality check. The macro stores the max value of a uint32_t, so we
cannot store a preferred pack that appears last in a list of 2^32 total
packs, but that's expected to be unreasonable already. Furthermore, with
this change we end up extending the range from 2^31 possible packs to
2^32-1.
There are some careful things to worry about with initializing the
preferred pack in the struct and using that value when searching for a
preferred pack that was already incorrect but accidentally working when
the index was initialized to zero.
Signed-off-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The incremental mode of writing a multi-pack-index has a few extra
conditions that could lead to failure, but these are currently
short-ciruiting with 'return -1' instead of setting the method's
'result' variable and going to the cleanup tag.
Replace these returns with gotos to avoid memory issues when exiting
early due to error conditions.
Unfortunately, these error conditions are difficult to reproduce with
test cases, which is perhaps one reason why the memory loss was not
caught by existing test cases in memory tracking modes.
Signed-off-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This instance of setting the result to 1 before going to cleanup was
accidentally removed in fcb2205b77 (midx: implement support for writing
incremental MIDX chains, 2024-08-06). Build upon a test that already deletes
a packfile to verify that this error propagates to full command failure.
Signed-off-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The fill_packs_from_midx() method was refactored in fcb2205b77 (midx:
implement support for writing incremental MIDX chains, 2024-08-06) to
allow for preferred packfiles and incremental multi-pack-indexes.
However, this led to some conditions that can cause improperly
initialized memory in the context's list of packfiles.
The conditions caring about the preferred pack name or the incremental
flag are currently necessary to load a packfile. But the context is
still being populated with pack_info structs based on the packfile array
for the existing multi-pack-index even if prepare_midx_pack() isn't
called.
Add a new test that breaks under --stress when compiled with
SANITIZE=address. The chosen number of 100 packfiles was selected to get
the --stress output to fail about 50% of the time, while 50 packfiles
could not get a failure in most --stress runs.
The test case is marked as EXPENSIVE not only because of the number of
packfiles it creates, but because some CI environments were reporting
errors during the test that I could not reproduce, specifically around
being unable to open the packfiles or their pack-indexes.
When it fails under SANITIZE=address, it provides the following error:
AddressSanitizer:DEADLYSIGNAL
=================================================================
==3263517==ERROR: AddressSanitizer: SEGV on unknown address 0x000000000027
==3263517==The signal is caused by a READ memory access.
==3263517==Hint: address points to the zero page.
#0 0x562d5d82d1fb in close_pack_windows packfile.c:299
#1 0x562d5d82d3ab in close_pack packfile.c:354
#2 0x562d5d7bfdb4 in write_midx_internal midx-write.c:1490
#3 0x562d5d7c7aec in midx_repack midx-write.c:1795
#4 0x562d5d46fff6 in cmd_multi_pack_index builtin/multi-pack-index.c:305
...
This failure stack trace is disconnected from the real fix because the bad
pointers are accessed later when closing the packfiles from the context.
There are a few different aspects to this fix that are worth noting:
1. We return to the previous behavior of fill_packs_from_midx to not
rely on the incremental flag or existence of a preferred pack.
2. The behavior to scan all layers of an incremental midx is kept, so
this is not a full revert of the change.
3. We skip allocating more room in the pack_info array if the pack
fails prepare_midx_pack().
4. The method has always returned 0 for success and 1 for failure, but
the condition checking for error added a check for a negative result
for failure, so that is now updated.
5. The call to open_pack_index() is removed, but this is needed later
in the case of a preferred pack. That call is moved to immediately
before its result is needed (checking for the object count).
Signed-off-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When determining whether or not we want to merge a commit graph chain we
retrieve the graph that is to be merged via the context's repository.
With an upcoming change though it will become a bit more complex to
figure out the commit graph, which would lead to code duplication.
Prepare for this change by passing the graph that is to be merged as a
parameter.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The function `repo_find_commit_pos_in_graph()` takes a commit as input
and tries to figure out whether the given repository has a commit graph
that contains that specific commit. If so, it returns the corresponding
position of that commit inside the graph.
Right now though we only return the position, but not the actual graph
that the commit has been found in. This is sensible as repositories
always have the graph in `struct repository::objects::commit_graph`.
Consequently, the caller always knows where to find it.
But in a subsequent change we're going to move the graph into the object
sources. This would require callers of the function to loop through all
sources to find the relevant commit graph.
Refactor the code so that we instead return the commit-graph that the
commit has been found with.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When making use of commit graphs, one needs to first prepare them by
calling `prepare_commit_graph()`. Once that function was called and the
commit graph was prepared successfully, the caller is now expected to
access the graph directly via `struct object_database::commit_graph`.
In a subsequent change, we're going to move the commit graph pointer
from `struct object_database` into `struct odb_source`. With this
change, semantics will change so that we use the commit graph of the
first source that has one. Consequently, all callers that currently
deference the `commit_graph` pointer would now have to loop around the
list of sources to find the commit graph.
This would become quite unwieldy. So instead of shifting the burden onto
such callers, adapt `prepare_commit_graph()` to return the prepared
commit graph, if any. Like this, callers are expected to call that
function and then use the returned commit graph.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When filtering down revisions by paths we know to use bloom filters from
the commit graph, if we have any. The entry point for this is in
`check_maybe_different_in_bloom_filter()`, where we first verify that:
- We do have a commit graph.
- That the commit is contained therein by checking that we have a
proper generation number.
- And that the graph contains a bloom filter.
The first check is somewhat redundant though: if we don't have a commit
graph, then the second check would already tell us that we don't have a
generation number for the specific commit.
In theory this could be seen as a performance optimization to
short-circuit for scenarios where there is no commit graph. But in
practice this shouldn't matter: if there is no commit graph, then the
commit graph data slab would also be unpopulated and thus a lookup of
the commit should happen in constant time.
Drop the unnecessary check.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Our blaming subsystem knows to use bloom filters from commit graphs to
speed up the whole computation. The setup of this happens in
`setup_blame_bloom_data()`, where we first verify that we even have a
commit graph in the first place. This check is redundant though, as we
call `get_bloom_filter_settings()` immediately afterwards which, which
already knows to return a `NULL` pointer in case we don't have a commit
graph.
Drop the redundant check.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
All callers of clear_alloc_state() immediately free what they
cleared, so currently it does not hurt anybody that the
alloc_state is left in an unreusable state, but it is an
error-prone API. Replace it with a new function that clears but
in addition frees the structure, as well as NULLing the pointer
that points at it and adjust existing callers.
As it is a moral equivalent of FREE_AND_NULL(), except that what it
frees has internal structure that needs to be cleaned, allow the
helper to be called twice in a row, by making a call with a pointer
to a pointer variable that already is NULLed.
While at it, rename allocate_alloc_state() and name the new
function alloc_state_free_and_null(), to follow more closely the
function naming convention specified in the CodingGuidelines
(namely, functions about S are named with S_ prefix and then
verb).
Signed-off-by: ノウラ | Flare <nouraellm@gmail.com>
Helped-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Expose the expected type of the second parameter of extend_abbrev_len()
instead of casting a void pointer internally. Just a single caller
passes in a void pointer, the rest pass the correct type. Let the
compiler help keeping it that way.
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The flag `--show-object-format` from git-rev-parse is used for
retrieving the object storage format. This way, it is used for
querying repository metadata, fitting in the purpose of git-repo-info.
Add a new field `objects.format` to the git-repo-info subcommand
containing that information.
Mentored-by: Karthik Nayak <karthik.188@gmail.com>
Mentored-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Lucas Seiki Oshiro <lucasseikioshiro@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Other Git commands that have nul-terminated output (e.g. git-config,
git-status, git-ls-files) have a flag `-z` for using the null character
as the record separator.
Add the `-z` flag to git-repo-info as an alias for `--format=nul`,
making it consistent with the behavior of the other commands.
Mentored-by: Karthik Nayak <karthik.188@gmail.com>
Mentored-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Lucas Seiki Oshiro <lucasseikioshiro@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Before we were silently skipping all builtins that don't have a matching
.adoc file. This is overly loose and might skip documentation files
when it shouldn't, for example when there was a typo in the filename.
To ensure no new builtins are added without documentation, add an
allowlist: t0450/adoc-missing. In this file only builtin commands that
do *not* have a corresponding .adoc file shall be listed. If there is a
mismatch, fail the test. This should force future contributions to
either add an .adoc, or add the builtin name to the allowlist file.
Signed-off-by: Toon Claes <toon@iotcl.com>
[jc: squashed Patrick's "missing file fix" in]
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The documentation incorrectly referred to the extension without an 's'.
This fixes the typo for clarity.
Signed-off-by: Mikhail Malinouski <m.l.malinouski@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Depth computation can end early if all remaining commits are flagged.
The current code determines if that's the case by checking all queue
items each time it dequeues a flagged commit. This can cause
quadratic complexity.
We could simply count the flagged items in the queue and then update
that number as we add and remove items. That would provide a general
speedup, but leave one case where we have to scan the whole queue: When
we flag a previously seen, but unflagged commit. It could be on the
queue and then we'd have to decrease our count.
We could dedicate an object flag to track queue membership, but that
would leave less for candidate tags, affecting the results. So use a
hash table, specifically an oidset of commit hashes, to track that.
This avoids quadratic behaviour in all cases and provides a nice
performance boost over the previous commit, 08bb69d70f (describe: use
prio_queue_replace(), 2025-08-03):
Benchmark 1: ./git_08bb69d70f describe $(git rev-list v2.41.0..v2.47.0)
Time (mean ± σ): 855.3 ms ± 1.3 ms [User: 790.8 ms, System: 49.9 ms]
Range (min … max): 853.7 ms … 857.8 ms 10 runs
Benchmark 2: ./git describe $(git rev-list v2.41.0..v2.47.0)
Time (mean ± σ): 610.8 ms ± 1.7 ms [User: 546.9 ms, System: 49.3 ms]
Range (min … max): 608.9 ms … 613.3 ms 10 runs
Summary
./git describe $(git rev-list v2.41.0..v2.47.0) ran
1.40 ± 0.00 times faster than ./git_08bb69d70f describe $(git rev-list v2.41.0..v2.47.0)
Helped-by: Jeff King <peff@peff.net>
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add a test script, `t/t1462-refs-exists.sh`, for the `git refs exists`
command.
This script acts as a simple driver, leveraging the shared test library
created in the preceding commit. It works by overriding the
`$git_show_ref_exists` variable to "git refs exists" and then sourcing the
shared library (`t/show-ref-exists-tests.sh`).
This approach ensures that `git refs exists` is tested against the
entire comprehensive test suite of `git show-ref --exists`, verifying
that it acts as a compatible drop-in replacement.
Mentored-by: Patrick Steinhardt <ps@pks.im>
Mentored-by: shejialuo <shejialuo@gmail.com>
Signed-off-by: Meet Soni <meetsoni3017@gmail.com>
Acked-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In preparation for adding tests for the `git refs exists` command,
refactor the existing t1422 test suite to make its logic shareable.
Move the core test logic from `t1422-show-ref-exists.sh` to
`show-ref-exists-tests.sh` file. Inside this script, replace hardcoded
calls to "git show-ref --exists" with the `$git_show_ref_exists`
variable.
The original `t1422-show-ref-exists.sh` script now becomes a simple
"driver". It is responsible for setting the default value of the
variable and then sourcing the test library.
This structure follows an established pattern for sharing tests and
prepares the test suite for the `refs exists` tests to be added in a
subsequent commit.
Mentored-by: Patrick Steinhardt <ps@pks.im>
Mentored-by: shejialuo <shejialuo@gmail.com>
Signed-off-by: Meet Soni <meetsoni3017@gmail.com>
Acked-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The test file for git-show-ref(1), `t1403-show-ref.sh`, contains a group
of tests for the '--exists' flag. To improve organization and to prepare
for refactoring these tests to be shareable, move the '--exists' tests
and their corresponding setup logic into a self-contained test suite,
`t1422-show-ref-exists.sh`.
This is a pure code-movement refactoring with no change in test coverage
or behavior.
Mentored-by: Patrick Steinhardt <ps@pks.im>
Mentored-by: shejialuo <shejialuo@gmail.com>
Signed-off-by: Meet Soni <meetsoni3017@gmail.com>
Acked-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
As part of the ongoing effort to consolidate reference handling,
introduce a new `exists` subcommand. This command provides the same
functionality and exit-code behavior as `git show-ref --exists`, serving
as its modern replacement.
The logic for `show-ref --exists` is minimal. Rather than creating a
shared helper function which would be overkill for ~20 lines of code,
its implementation is intentionally duplicated here. This contrasts with
`git refs list`, where sharing the larger implementation of
`for-each-ref` was necessary.
Documentation for the new subcommand is also added to the `git-refs(1)`
man page.
Mentored-by: Patrick Steinhardt <ps@pks.im>
Mentored-by: shejialuo <shejialuo@gmail.com>
Signed-off-by: Meet Soni <meetsoni3017@gmail.com>
Acked-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* ps/object-store-midx-dedup-info:
midx: compute paths via their source
midx: stop duplicating info redundant with its owning source
midx: write multi-pack indices via their source
midx: load multi-pack indices via their source
midx: drop redundant `struct repository` parameter
odb: simplify calling `link_alt_odb_entry()`
odb: return newly created in-memory sources
odb: consistently use "dir" to refer to alternate's directory
odb: allow `odb_find_source()` to fail
odb: store locality in object database sources
The GitLab CI runners using Windows machines have realtime monitoring
via Windows Defender enabled by default. This has just now started to
cause issues in our CI jobs using Microsoft Visual Studio:
Program 'meson.exe' failed to run: Operation did not complete successfully because the file contains a virus or
potentially unwanted softwareAt line:356 char:1
+ meson setup build --vsenv -Dperl=disabled -Dbackend_max_links=1 -Dcre ...
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~.
At line:356 char:1
+ meson setup build --vsenv -Dperl=disabled -Dbackend_max_links=1 -Dcre ...
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+ CategoryInfo : ResourceUnavailable: (:) [], ApplicationFailedException
+ FullyQualifiedErrorId : NativeCommandFailed
The detected issue is more likely than not completely bogus, but it
breaks the jobs.
Fix the issue by disabling realtime monitoring. Besides unbreaking CI,
it also improves our build times a bit:
- Building Git goes from 26 to 22 minutes.
- Executing tests goes from ~1h for one slice of tests to ~30 minutes.
This is still painfully slow, but the issue here is that the Windows
runners on GitLab CI are quite underwhelming overall.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add a missed backtick to the end of a code segment so that it will be
rendered like preceding examples.
I deeply appreciate the thoroughness of this documentation. I noticed
the formatting discrepancy reading https://git-scm.com/docs/git-config.
Signed-off-by: Kyle E. Mitchell <kyle@kemitchell.com>
Acked-by: Jean-Noël AVILA <avila.jn@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Update the instruction to use of GGG in the MyFirstContribution
document to say that a GitHub PR could be made against `git/git`
instead of `gitgitgadget/git`.
* ds/doc-ggg-pr-fork-clarify:
doc: clarify which remotes can be used with GitGitGadget
Fix missing single-quote pairs in a documentation page.
* kh/doc-interpret-trailers-markup-fix:
doc: interpret-trailers: close all pairs of single quotes
The command Revert Changes has two different erroneous behaviors
depending on the Tcl version used.
The command uses a "chord" facility where different "notes" are
evaluated asynchronously and any error is reported after all of them
have finished. The intent is that a private namespace is used where
the notes can store the error state. Tcl 9 changed namespace handling
in a subtle way, as https://www.tcl-lang.org/software/tcltk/9.0.html
summarizes under "Notable incompatibilities":
Unqualified varnames resolved in current namespace, not global.
Note that in almost all cases where this causes a change, the
change is actually the removal of a latent bug.
And that's exactly what happens here.
- Under Tcl 9:
- When the command operates without any errors, the variable `err`
is never set. When the error handler wants to inspect `err` (in
the correct private namespace), it does not find it and a Tcl
error about an unset variable occurs. Incidentally, this is also
the case when the user cancels the operation with the option
"Do Nothing"!
On the other hand, when an error occurs during the operation, `err`
is set and found as intended.
Check for the existence of the variable `err` before the attempt to
read it.
- Under Tcl 8.6:
The error handler looks up `err` in the global namespace, which is
bogus and unintended. The variable is set due to the many
`catch ... err` that occur during startup in the global namespace.
- When the command operates without any errors, the error handler
finds the global `err`, which happens to be the empty string at
this point, and no error is reported.
On the other hand, when an error occurs during the operation, the
global `err` is set and found, so that an error is reported as
desired.
However, the value of `err` persists in the global namespace. When
the command is repeated, an error is reported again, even if there
was actually no error, and even "Do Nothing" was used to cancel
the operation.
Clear the global `err` before the operation begins.
The lingering error message is not a problem under Tcl 9, because a
prestine namespace is established every time the command is used.
This fixes https://github.com/j6t/git-gui/issues/21.
Helped-by: Igor Stepushchik
Signed-off-by: Johannes Sixt <j6t@kdbg.org>
Git does not really "store the contents of the next commit"
anywhere; rather, you the user use the index to prepare it.
Signed-off-by: Julia Evans <julia@jvns.ca>
[jc; made the change relative to what is already in 'next']
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When comparing large commit ranges (e.g., 250,000+ commits), range-diff
attempts to allocate an n×n cost matrix that can exhaust available
memory. For example, with 256,784 commits (n = 513,568), the matrix
would require approximately 256GB of memory (513,568² × 4 bytes),
causing either immediate segmentation faults due to integer overflow or
system hangs.
Add a memory limit check in get_correspondences() before allocating the
cost matrix. This check uses the total size in bytes (n² × sizeof(int))
and compares it against a configurable maximum, preventing both
excessive memory usage and integer overflow issues.
The limit is configurable via a new --max-memory option that accepts
human-readable sizes (e.g., "1G", "500M"). The default is 4GB for 64 bit
systems and 2GB for 32 bit systems. This allows comparing ranges of
approximately 32,000 (16,000) commits - generous for real-world use cases
while preventing impractical operations.
When the limit is exceeded, range-diff now displays a clear error
message showing both the requested memory size and the maximum allowed,
formatted in human-readable units for better user experience.
Example usage:
git range-diff --max-memory=1G branch1...branch2
git range-diff --max-memory=500M base..topic1 base..topic2
This approach was chosen over alternatives:
- Pre-counting commits: Would require spawning additional git processes
and reading all commits twice
- Limiting by commit count: Less precise than actual memory usage
- Streaming approach: Would require significant refactoring of the
current algorithm
This issue was previously discussed in:
https://lore.kernel.org/git/RFC-cover-v2-0.5-00000000000-20211210T122901Z-avarab@gmail.com/
Acked-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Paulo Casaretto <pcasaretto@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git describe <blob>" misbehaves and/or crashes in some corner
cases, which has been taught to exit with failure gracefully.
* jk/describe-blob:
describe: pass commit to describe_commit()
describe: handle blob traversal with no commits
describe: catch unborn branch in describe_blob()
describe: error if blob not found
describe: pass oid struct by const pointer
"git fetch" can clobber a symref that is dangling when the
remote-tracking HEAD is set to auto update, which has been
corrected.
* jk/no-clobber-dangling-symref-with-fetch:
refs: do not clobber dangling symrefs
t5510: prefer "git -C" to subshell for followRemoteHEAD tests
t5510: stop changing top-level working directory
t5510: make confusing config cleanup more explicit
Discord has been added to the first contribution documentation as
another way to ask for help.
* ds/doc-community-discord:
doc: add discord to ways of getting help
Code clean-ups.
* ps/reftable-libgit2-cleanup:
refs/reftable: always reload stacks when creating lock
reftable: don't second-guess errors from flock interface
reftable/stack: handle outdated stacks when compacting
reftable/stack: allow passing flags to `reftable_stack_add()`
reftable/stack: fix compiler warning due to missing braces
reftable/stack: reorder code to avoid forward declarations
reftable/writer: drop Git-specific `QSORT()` macro
reftable/writer: fix type used for number of records
Our 'git last-modified' performs a revision walk, and computes a diff at
each point in the walk to figure out whether a given revision changed
any of the paths it considers interesting.
When changed-path Bloom filters are available, we can avoid computing
many such diffs. Before computing a diff, we first check if any of the
remaining paths of interest were possibly changed at a given commit by
consulting its Bloom filter. If any of them are, we are resigned to
compute the diff.
If none of those queries returned "maybe", we know that the given commit
doesn't contain any changed paths which are interesting to us. So, we
can avoid computing it in this case.
Comparing the perf test results on git.git:
Test HEAD~ HEAD
------------------------------------------------------------------------------------
8020.1: top-level last-modified 4.49(4.34+0.11) 2.22(2.05+0.09) -50.6%
8020.2: top-level recursive last-modified 5.64(5.45+0.11) 5.62(5.30+0.11) -0.4%
8020.3: subdir last-modified 0.11(0.06+0.04) 0.07(0.03+0.04) -36.4%
Based-on-patch-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Toon Claes <toon@iotcl.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This just runs some simple last-modified commands. We already test
correctness in the regular suite, so this is just about finding
performance regressions from one version to another.
Based-on-patch-by: Jeff King <peff@peff.net>
Signed-off-by: Toon Claes <toon@iotcl.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Similar to git-blame(1), introduce a new subcommand
git-last-modified(1). This command shows the most recent modification to
paths in a tree. It does so by expanding the tree at a given commit,
taking note of the current state of each path, and then walking
backwards through history looking for commits where each path changed
into its final commit ID.
Based-on-patch-by: Jeff King <peff@peff.net>
Improved-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Toon Claes <toon@iotcl.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This provides a unified look-and-feel in Git for Windows.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Johannes Sixt <j6t@kdbg.org>
"Question?" is maybe not the most informative thing to ask. In the
absence of better information, it is the best we can do, of course.
However, Git for Windows' auto updater just learned the trick to use
git-gui--askyesno to ask the user whether to update now or not. And in
this scripted scenario, we can easily pass a command-line option to
change the window title.
So let's support that with the new `--title <title>` option.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Johannes Sixt <j6t@kdbg.org>
Make use of the new environment variable GIT_ASK_YESNO to support the
recently implemented fallback in case unlink, rename or rmdir fail for
files in use on Windows. The added dialog will present a yes/no question
to the the user which will currently be used by the windows compat layer
to let the user retry a failed file operation.
Signed-off-by: Heiko Voigt <hvoigt@hvoigt.net>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Johannes Sixt <j6t@kdbg.org>
The compatObjectFormat extension is used to hide an incomplete
feature that is not yet usable for any purpose other than
developing the feature further. Document it as such to discourage
its use by mere mortals.
* bc/doc-compat-object-format-not-working:
docs: note that extensions.compatobjectformat is incomplete
Under a race against another process that is repacking the
repository, especially a partially cloned one, "git fetch" may
mistakenly think some objects we do have are missing, which has
been corrected.
* jk/fetch-check-graph-objects-fix:
fetch-pack: re-scan when double-checking graph objects
"git log -L..." compared trees of multiple parents with the tree of the
merge result in an unnecessarily inefficient way.
* sg/line-log-merge-optim:
line-log: simplify condition checking for merge commits
line-log: initialize diff queue in process_ranges_ordinary_commit()
line-log: get rid of the parents array in process_ranges_merge_commit()
line-log: avoid unnecessary tree diffs when processing merge commits
The start_delayed_progress() function in the progress eye-candy API
did not clear its internal state, making an initial delay value
larger than 1 second ineffective, which has been corrected.
* js/progress-delay-fix:
progress: pay attention to (customized) delay time
Documentation for "git rebase" has been updated.
* je/doc-rebase:
doc: git-rebase: update discussion of internals
doc: git-rebase: move --onto explanation down
doc: git rebase: clarify arguments syntax
doc: git rebase: dedup merge conflict discussion
doc: git-rebase: start with an example
When running 'git ls-files' with a pathspec, the index entries get
filtered according to that pathspec before iterating over them in
show_files(). In 78087097b8 (ls-files: add --sparse option,
2021-12-22), this iteration was prefixed with a check for the '--sparse'
option which allows the command to output directory entries; this
created a pre-loop call to ensure_full_index().
However, when a user runs 'git ls-files' where the pathspec matches
directories that are recursively matched in the sparse-checkout, there
are not any sparse directories that match the pathspec so they would not
be written to the output. The expansion in this case is just a
performance drop for no behavior difference.
Replace this global check to expand the index with a check inside the
loop for a matched sparse directory. If we see one, then expand the
index and continue from the current location. This is safe since the
previous entries in the index did not have any sparse directories and
thus would remain stable in this expansion.
A test in t1092 confirms that this changes the behavior.
Signed-off-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In addition to the regular trace information produced by
CURLOPT_VERBOSE, recent curl versions can enable or disable tracing of
specific subsystems using a call to curl_global_trace().
This level of detail may or may not be useful for us in Git as mere
users of libcurl, but there's one case where we need it for a test. In
t5564, we set up a socks proxy, access it with GIT_TRACE_CURL set, and
expect to find socks-related messages in the output. This test is broken
in the release candidates for libcurl 8.16, as those socks messages are
no longer produced in the trace.
The problem bisects to curl's commit ab5e0bfddc (pytest: add SOCKS tests
and scoring, 2025-07-21). There the socks messages were moved from
generic infof() messages to the component-specific CURL_TRC_CF() system.
And so we do not see them by default, but only if "socks" is enabled as
a logging component.
Teach Git's http code to accept a component list from the
environment and pass it into curl_global_trace(). We can then use
that in the test to enable the correct component.
It should be safe to do so unconditionally. In older versions of curl
which don't support this call, setting the environment variable is a
noop. Likewise, any versions of curl which don't recognize the "socks"
component should silently ignore it. The manpage for curl_global_trace()
says this:
The config string is a list of comma-separated component names. Names
are case-insensitive and unknown names are ignored. The special name
"all" applies to all components. Names may be prefixed with '+' or '-'
to enable or disable detailed logging for a component.
The list of component names is not part of curl's public API. Names may
be added or disappear in future versions of libcurl. Since unknown
names are silently ignored, outdated log configurations does not cause
errors when upgrading libcurl. Given that, some names can be expected
to be fairly stable and are listed below for easy reference.
So this should let us make the test work on all versions without
worrying about confusing older (or newer) versions. For the same reason,
I've opted not to document this interface. This is deep internal voodoo
for which we can make no promises to users. In fact, I was tempted to
simply hard-code "socks" to let our test pass and not expose anything.
But I suspect a little run-time flexibility may come in handy in the
future when debugging or dealing with similar logging issues.
I also considered just putting "all" into such a hard-coded default. But
if you try it, you will see that many of the components are quite
verbose and likely not interesting. They would clutter up our trace
output if we enabled them by default.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
TIP 684 [1] introduced TouchpadScroll events in Tcl/Tk 8.7, separating
trackpad gestures from traditional MouseWheel events. This broke
trackpad scrolling in gitk where trackpads generate TouchpadScroll
events instead of MouseWheel events.
Fix that by adding TouchpadScroll event bindings for all scrollable
widgets following the TIP 684 specification. Implement a new
precisescrollval proc to handle the smaller delta values from
TouchpadScroll events, using appropriate scaling factors that seem
sensible on my MacBook.
Fixes https://github.com/j6t/gitk/issues/31.
[1]: https://core.tcl-lang.org/tips/doc/main/tip/684.md
Signed-off-by: Ruoyu Zhong <zhongruoyu@outlook.com>
"make -JN" with INCLUDE_LIBGIT_RS enabled causes cargo lock warnings
and can trigger ld errors during the build.
The build errors are caused by two inner "make" invocations getting
triggered concurrently: once inside of libgit-sys and another inside of
libgit-rs.
Make libgit-rs depend on libgit-sys so that "make" prevents them
from running concurrently. Apply the same logic to the test invocations.
Use cargo's "--manifest-path" option instead of "cd" in the recipes.
Signed-off-by: David Aguilar <davvid@gmail.com>
Acked-by: Kyle Lippincott <spectral@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Our codebase uses a lot of bit field variables, generally to mark
boolean type variables. While there is a formatting rule in the
'.clang-format', there is no guideline specified in the
'CodingGuidelines'.
Since the '.clang-format' is not yet enforced, let's also add a
guideline with the same rule as mentioned in the '.clang-format', which
is to not use any spaces around the colon, like so:
unsigned my_field:1;
unsigned other_field:1;
unsigned field_with_longer_name:1;
This would allow us not to modify the clang-format file, and more
importantly, discourage people from doing ugly alignment with spaces,
i.e.
unsigned my_field : 1;
unsigned other_field : 1;
unsigned field_with_longer_name : 1;
Helped-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Earlier recommendation by IETF with RFC 2595 was to deprecate
implicit TLS in preference for upgrade an initially unencrypted
connection with STARTTLS command. These days, however, IETF
recommends that connections be made using "Implicit TLS", in
preference to STARTTLS and the like, completely reversing their
earlier position, in RFC8314.
Update the GMail example to use the implicit TLS to match the
current recommendation at port 465.
Signed-off-by: Aditya Garg <gargaditya08@live.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add some advice on how to change the config settings when
"core.commentString=auto" or "core.commentChar=auto". The advice
includes instructions for clearing the config setting or setting a
fixed comment string. To try and be as specific as possible, the advice
is customized based on the user's config. If "core.commentString=auto"
is set in the system config and the user does not have write
access then the advice omits the instructions to clear the config
and recommends changing the global config instead. An alternative
approach would be to advise the user to run "git config --show-origin"
and leave them to figure out how to fix it themselves but that seems
rather unfriendly. As we're forcing them to update their config we
should try and make that as easy as possible.
In order to generate this advice we need to record each file where
either of the config keys is set and whether a key occurs more that
once in a given file. This lets us generate the list of commands to
remove all the keys and also tells us which key the "auto" setting
comes from.
As we want the user to update their config we do not provide a way
for this advice to be disabled other than changing the value of
"core.commentChar" or "core.commentString".
Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
As support for this setting was deprecated in the last commit print a
warning (or die when WITH_BREAKING_CHANGES is enabled) if it is set.
Avoid bombarding the user with warnings by only printing it (a) when
running commands that call "git commit" and (b) only once per command.
Some scaffolding is added to repo_read_config() to allow it to
detect deprecated config settings and warn about them. As both
"core.commentChar" and "core.commentString" set the comment
character we record which one of them is used and tailor the
warning message appropriately.
Note the odd combination of die_message() followed by die(NULL)
is to allow the next commit to insert a call to advise() in the middle.
Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When "core.commentString" is set to "auto" then "git commit" will
automatically select the comment character ensuring that it is not the
first character on any of the lines in the commit message. This was
introduced by commit 84c9dc2c5a2 (commit: allow core.commentChar=auto
for character auto selection, 2014-05-17). The motivation seems to be
to avoid commenting out lines from the existing message when amending
a commit that was created with a message from a file.
Unfortunately this feature does not work with:
* commit message templates that contain comments.
* prepare-commit-msg hooks that introduce comments.
* "git commit --cleanup=strip --edit -F <file>" which means that it
is incompatible with
- the "fixup" and "squash" commands of "git rebase -i" as the
comments added by those commands are then treated as part of
the commit message.
- the conflict comments added to the commit message by "git
cherry-pick", "git rebase" etc. as these comments are then
treated as part of the commit message.
It is also ignored by "git notes" when amending a note.
The issues with comments coming from a template, hook or file are a
consequence of the design of this feature and are therefore hard to
fix.
As the costs of this feature outweigh the benefits, deprecate it and
remove it in Git 3.0. If someone comes up with some patches that fix
all the issues in a maintainable way then I'd be happy to see this
change reverted.
The next commits will add a warning and some advice for users on how
they can update their config settings.
Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The compatibility object format is only implemented for loose objects,
not packed objects, so anyone attempting to push or fetch data into a
repository with this option will likely not see it work as expected. In
addition, the underlying storage of loose object mapping is likely to
change because the current format is inefficient and does not handle
important mapping information such as that of submodules.
It would have been preferable to initially document that this was not
yet ready for prime time, but we did not do so. We hinted at the fact
that this functionality is incomplete in the description, but did not
say so explicitly. Let's do so now: indicate that this feature is
incomplete and subject to change and that the option is not designed to
be used by end users.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Using one of the start_delayed_*() functions, clients of the progress
API can request that a progress meter is only shown after some time.
To do that, the implementation intends to count down the number of
seconds stored in struct progress by observing flag progress_update,
which the timer interrupt handler sets when a second has elapsed. This
works during the first second of the delay. But the code forgets to
reset the flag to zero, so that subsequent calls of display_progress()
think that another second has elapsed and decrease the count again
until zero is reached. Due to the frequency of the calls, this happens
without an observable delay in practice, so that the effective delay is
always just one second.
This bug has been with us since the inception of the feature. Despite
having been touched on various occasions, such as 8aade107dd84
(progress: simplify "delayed" progress API), 9c5951cacf5c (progress:
drop delay-threshold code), and 44a4693bfcec (progress: create
GIT_PROGRESS_DELAY), the short delay went unnoticed.
Copy the flag state into a local variable and reset the global flag
right away so that we can detect the next clock tick correctly.
Since we have not had any complaints that the delay of one second is
too short nor that GIT_PROGRESS_DELAY is ignored, people seem to be
comfortable with the status quo. Therefore, set the default to 1 to
keep the current behavior.
Signed-off-by: Johannes Sixt <j6t@kdbg.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
A new subcommand "git repo" gives users a way to grab various
repository characteristics.
* lo/repo-info:
repo: add the --format flag
repo: add the field layout.shallow
repo: add the field layout.bare
repo: add the field references.format
repo: declare the repo command
Remove dependency on the_repository and other globals from the
commit-graph code, and other changes unrelated to de-globaling.
* ps/commit-graph-wo-globals:
commit-graph: stop passing in redundant repository
commit-graph: stop using `the_repository`
commit-graph: stop using `the_hash_algo`
commit-graph: refactor `parse_commit_graph()` to take a repository
commit-graph: store the hash algorithm instead of its length
commit-graph: stop using `the_hash_algo` via macros
Test clean-up.
* dk/t7005-editor-updates:
t7005: sanitize test environment for subsequent tests
t7005: stop abusing --exec-path
t7005: use modern test style
Doc lint updates to encourage the newer and easier-to-use
`synopsis` format, with fixes to a handful of existing uses.
* ja/doc-lint-sections-and-synopsis:
doc lint: check that synopsis manpages have synopsis inlines
doc:git-for-each-ref: fix styling and typos
doc: check for absence of the form --[no-]parameter
doc: check for absence of multiple terms in each entry of desc list
doc: check well-formedness of delimited sections
doc: test linkgit macros for well-formedness
"git cmd --help-all" now works outside repositories.
* dk/help-all:
builtin: also setup gently for --help-all
parse-options: refactor flags for usage_with_options_internal
Revert back to “Git's” which was used before d30c5cc4592 (doc: convert
git-mergetool options to new synopsis style, 2025-05-25) accidentally
changed it.
Signed-off-by: Kristoffer Haugsbakk <code@khaugsbakk.name>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The fetch code tries to avoid asking the remote side for an object we
already have. It does this by traversing recent commits reachable from
our refs looking for matches. Commit 5d4cc78f72 (fetch-pack: die if in
commit graph but not obj db, 2024-11-05) introduced an extra check
there: if we think we have an object because it's in the commit graph,
we double-check that we actually have it in our object database with a
call to odb_has_object().
But that call does not pass any flags, and so the function won't call
reprepared_packed_git() if it does not find the object. That opens us up
to the usual race against some other process repacking the odb:
1. We scan the list of packs in objects/pack but haven't yet opened them.
2. Somebody else packs the object into a new pack (which we don't know
about), and deletes the old pack it was in.
3. Our odb_has_object() calls tries to open that old pack, but finds it
is gone. We declare that we don't have the object.
And this causes us to erroneously complain and abort the fetch, thinking
our commit-graph and object database are out of sync. Instead, we should
pass HAS_OBJECT_RECHECK_PACKED, which will add a new step:
4. We re-scan the pack directory again, find the new pack, and locate
the object.
Often the fetch code tries to avoid these kinds of re-scans if it's
likely that we won't have the object. If the other side has told us
about object X and we want to know if we have it, we'll skip the re-scan
(to avoid spending a lot of effort when there are many such objects). We
can accept the racy false negative in that case because the worst case
is that we ask the other side to send us the object.
But this is not one of those cases. These are objects which are
accessible from _our_ refs, and which we already found in the commit
graph file. We should have them, and if we don't, we'll die()
immediately. So the performance impact is negligible, and getting the
right answer is important.
There's no test here because it's inherently racy. In fact, I had
trouble even developing a minimal test. The problem seen in the wild can
be produced like this:
# Any git.git mirror which supports partial clones; I think this
# should work with any repo that contains submodules, but note that
# $obj below is specific to this repo
url=https://github.com/git/git.git
# This is a commit that is not at the tip of any branches (so after
# we have it, we'll still have some commits to fetch).
obj=cf6f63ea6bf35173e02e18bdc6a4ba41288acff9
git init
git fetch --filter=tree:0 $url $obj:refs/heads/foo
git checkout foo
git commit-graph write --reachable
git fetch $url
What happens here is that the initial fetch grabs that older commit (and
its ancestors) but no trees or blobs, and the subsequent checkout grabs
the necessary trees and blobs just for that commit. The final fetch
spawns a long sequence of child fetches due to fetch_submodules(), which
wants to check whether there have been any gitlink modifications which
should trigger a fetch of the related submodule (we'll leave aside the
irony that we did not even check out any submodules yet).
That series of fetches causes us to accumulate packs, which eventually
triggers background maintenance to run. That repacks all-into-one, and
the pack containing $obj goes away in favor of a new pack. And then the
fetch eventually fails with:
fatal: You are attempting to fetch cf6f63ea6bf35173e02e18bdc6a4ba41288acff9, which is in the commit graph file but
not in the object database.
In the scenario above, the race becomes likely because of the long
series of quick fetches. But I _think_ the bug is independent of partial
clones entirely, and you could run into the same thing with a single
fetch, some other process running "git repack" simultaneously, and a bit
of bad luck. I haven't been able to reproduce, though. I'm not sure if
that's because there's some mis-analysis above, or if the race window is
just small enough that it's hard to trigger.
At any rate, re-scanning here seems like an obviously correct thing to
do with no downside, and it does fix the partial-clone case shown above.
Reported-by: Дилян Палаузов <dilyan.palauzov@aegee.org>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
There are three tips how to compose a non-line-wrapped patch with
Thunderbird. The first one suggests use of an add-on. The one
referenced has long been superseded by a different one. Update the
link to the new one. Mention that additional configuration is
required to make the add-on work.
Signed-off-by: Johannes Sixt <j6t@kdbg.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The bulk-checkin subsystem depends on `the_repository`. Adapt functions
and call sites to access the repository through `struct odb_transaction`
instead. The `USE_THE_REPOSITORY_VARIBALE` is still required as the
`pack_compression_level` and `pack_size_limit_cfg` globals are still
used.
Also adapt functions using packfile state to instead access it through
the transaction. This makes some function parameters redundant and go
away.
Signed-off-by: Justin Tobler <jltobler@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The bulk-checkin subsystem provides a mechanism to write blobs directly
to a packfile via `index_blob_bulk_checkin()`. If there is an ongoing
transaction when invoked, objects written via this function are stored
in the same packfile. The packfile is not flushed until the transaction
itself is flushed. If there is no transaction, the single object is
written to a packfile and immediately flushed. This complicates
`index_blob_bulk_checkin()` as it cannot reliably use the provided
transaction to get the associated repository.
Update `index_blob_bulk_checkin()` to assume that a valid transaction is
always provided. Callers are now expected to ensure a transaction is set
up beforehand. With this simplification, `deflate_blob_bulk_checkin()`
is no longer needed as a standalone internal function and is combined
with `index_blob_bulk_checkin()`. The single call site in
`object-file.c:index_fd()` is updated accordingly. Due to how
`{begin,end}_odb_transaction()` handles nested transactions, a new
transaction is only created and committed if there is not already an
ongoing transaction.
Signed-off-by: Justin Tobler <jltobler@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Object database transactions in the bulk-checkin subsystem rely on
global state to track transaction status. Stop relying on global state
and instead store the transaction in the `struct object_database`.
Functions that operate on transactions are updated to now wire
transaction state.
Signed-off-by: Justin Tobler <jltobler@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Object database transaction state is stored across several global
variables in the bulk-checkin subsystem. Consolidate this state into a
single `struct odb_transaction` global. In a subsequent commit, the
transactional interfaces will be updated to wire this structure instead
of relying on a global variable.
Signed-off-by: Justin Tobler <jltobler@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The docs mostly point to using git/git as one's remote, however, when it
comes to Sending a PR to GitGitGadget section, the reader is told to use
gitgitgadget/git, with no mention of git/git, potentially leading to
some confusion.
Clarify that both gitgitgadget/git and git/git can be used, albeit with
some differences.
Signed-off-by: Daniele Sassoli <danielesassoli@gmail.com>
Acked-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The previous change fixed a bug in 'git repack -adf --path-walk' that
was due to an update to how path lists are initialized and missing some
important cases when processing the pending objects.
This change takes the three critical places where path lists are
initialized and combines them into a static method. This simplifies the
callers somewhat while also helping to avoid a missed update in the
future.
The other places where a path list (struct type_and_oid_list) is
initialized is for the following "fixed" lists:
* Tag objects.
* Commit objects.
* Root trees.
* Tagged trees.
* Tagged blobs.
These lists are created and consumed in different ways, with only the
root trees being passed into the logic that cares about the
"maybe_interesting" bit. It is appropriate to keep these uses separate.
Signed-off-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Users reported an issue where objects were missing from their local
repositories after a full repack using 'git repack -adf --path-walk'.
This was alarming and took a while to create a reproducer. Here, we fix
the bug and include a test case that would fail without this fix.
The root cause is that certain objects existed in the index and had no
second versions. These objects are usually blobs, though trees can be
included if a cache-tree exists. The issue is that the revision walk
adds these objects to the "pending" list and the path-walk API forgets
to mark the lists it creates at this point as "maybe_interesting". If
these paths only ever have a single version in the history of the repo
(including the current staged version) then the parent directory never
tries to add a new object to the list and mark the list as
"maybe_interesting". Thus, when walking the list later, the group is
skipped as it is expected that no objects are interesting. This happens
even when there are actually no UNINTERESTING objects at all! This is
based on the optimization enabled by the pack.useSparse=true config
option, which is the default.
Thus, we create a test case that demonstrates the many cases of this
issue for reproducibility:
1. File a/b/c has only one committed version.
2. Files a/i and x/y only exist as staged changes.
3. Tree x/ only exists in the cache-tree.
After performing a non-path-walk repack to force all loose objects into
packfiles, run a --path-walk repack followed by 'git fsck'. This fsck is
what fails with the following errors:
error: invalid object 100644 f2e41136... for 'a/b/c'
This is the dropped instance of the single-versioned a/b/c file.
broken link from tree cfda31d8...
to tree 3f725fcd...
This is the missing tree for the single-versioned a/b/ directory.
missing blob 0ddf2bae... (a/i)
missing blob 975fbec8... (x/y)
missing blob a60d869d... (file)
missing blob f2e41136... (a/b/c)
missing tree 3f725fcd... (a/b/)
dangling tree 5896d7e... (staged root tree)
Note that since the staged root tree is missing, the fsck output cannot
even report that the staged x/ tree is missing as well.
The core problem here is that the "maybe_interesting" member of 'struct
type_and_oid_list' is not initialized to '1'. This member was added in
6333e7ae0b (path-walk: mark trees and blobs as UNINTERESTING,
2024-12-20) in a way to help when creating packfiles for a small commit
range using the sparse path algorithm (enabled by pack.useSparse=true).
The idea here is that the list is marked as "maybe_interesting" if an
object is added that does not have the UNINTERESTING flag on it. Later,
this is checked again in case all objects in the list were marked
UNINTERESTING after that point in time. In this case, the algorithm
skips the list as there is no reason to visit it.
This leads to the problem where the "maybe_interesting" member was not
appropriately initialized when the list is created from pending objects.
Initializing this in the correct places fixes the bug.
To reduce risk of similar bugs around initializing this structure, a
follow-up change will make initializing lists use a shared method.
Signed-off-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In process_ranges_arbitrary_commit() the condition deciding whether
the given commit is not a merge, i.e. that it doesn't have more than
one parent, is head-scratchingly backwards, flip it.
Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
process_ranges_ordinary_commit() uses a local diff queue variable,
which it leaves uninitialized before passing its address to
queue_diffs(). This is not an issue, because at the end of that
function the contents of an other diff queue is moved into it by
simply overwriting whatever is in there, i.e. without reading any
uninitialized memory.
Still, seeing the uninitialized diff queue being passed around scared
me more than once, so out of caution let's make sure that it's
initialized.
Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We can easily iterate through the parents of a merge commit without
turning the list of parents into a dynamically allocated array of
parents, so let's do so. This way we can avoid a memory allocation
for each processed merge commit, though its effect on runtime seems to
be unmeasurable.
Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In process_ranges_merge_commit(), the line-level log first creates an
array of diff queues by iterating over all parents of a merge commit
and computing a tree diff for each. Then in a second loop it iterates
over those diff queues, and if it finds that none of the interesting
paths were modified in one of them, then it will return early. This
means that when none of the interesting paths were modified between a
merge and its first parent, then the tree diff between the merge and
its second (Nth...) parent was computed in vain.
Unify these two loops, so when it iterates over all parents of a merge
commit, then it first computes the tree diff between the merge and
that particular parent and then processes the resulting diff queue
right away. This way we can spare some tree diff computing, thereby
speeding up line-level log in repositories with mergy history:
# git.git, 25.8% of commits are merges:
Benchmark 1: ./git_v2.51.0 -C ~/src/git log -L:'lookup_commit(':commit.c v2.51.0
Time (mean ± σ): 1.001 s ± 0.009 s [User: 0.906 s, System: 0.095 s]
Range (min … max): 0.991 s … 1.023 s 10 runs
Benchmark 2: ./git -C ~/src/git log -L:'lookup_commit(':commit.c v2.51.0
Time (mean ± σ): 445.5 ms ± 3.4 ms [User: 358.8 ms, System: 84.3 ms]
Range (min … max): 440.1 ms … 450.3 ms 10 runs
Summary
'./git -C ~/src/git log -L:'lookup_commit(':commit.c v2.51.0' ran
2.25 ± 0.03 times faster than './git_v2.51.0 -C ~/src/git log -L:'lookup_commit(':commit.c v2.51.0'
# linux.git, 7.5% of commits are merges:
Benchmark 1: ./git_v2.51.0 -C ~/src/linux.git log -L:build_restore_work_registers:arch/mips/mm/tlbex.c v6.16
Time (mean ± σ): 3.246 s ± 0.007 s [User: 2.835 s, System: 0.409 s]
Range (min … max): 3.232 s … 3.255 s 10 runs
Benchmark 2: ./git -C ~/src/linux.git log -L:build_restore_work_registers:arch/mips/mm/tlbex.c v6.16
Time (mean ± σ): 2.467 s ± 0.014 s [User: 2.113 s, System: 0.353 s]
Range (min … max): 2.455 s … 2.505 s 10 runs
Summary
'./git -C ~/src/linux.git log -L:build_restore_work_registers:arch/mips/mm/tlbex.c v6.16' ran
1.32 ± 0.01 times faster than './git_v2.51.0 -C ~/src/linux.git log -L:build_restore_work_registers:arch/mips/mm/tlbex.c v6.16'
And since now each iteration computes a tree diff and processes its
result, there is no reason to store the diff queues for each merge
parent anymore, so replace that diff queue array with a loop-local
diff queue variable. With this change the static free_diffqueues()
helper function in 'line-log.c' has no more callers left, remove it.
Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Commit d277e89f87fda01daa1e1a35fc1f7118678faa1f added special handling
on macOS (OS X) that makes button 2 the right mouse button. As per TIP
474 [1], Tcl 8.7 has swapped buttons 2 and 3 such that button 3 is made
the right mouse button as in other platforms. Therefore, the logic
should be updated to use button 3 on macOS with Tcl 8.7+.
[1]: https://core.tcl-lang.org/tips/doc/main/tip/474.md
Signed-off-by: Ruoyu Zhong <zhongruoyu@outlook.com>
- make it clearer that we're talking about a multistep process
- give a more technically accurate description how rebase works with the
merge backend.
- condense the explanation of how git rebase skips commits with the same
textual changes into a single bullet point and remove the explanatory
diagram. Lots of things which are more complicated are already being
explained without a diagram.
- remove the explanation of how exactly `--fork-point` and `--root`
work since that information is in the OPTIONS section
- put all discussion of `ORIG_HEAD` inside the note
Signed-off-by: Julia Evans <julia@jvns.ca>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
There's a very clear explanation with examples of using --onto which is
currently buried in the very long DESCRIPTION section. This moves it to
its own section, so that we can reference the explanation from the
`--onto` option by name.
Signed-off-by: Julia Evans <julia@jvns.ca>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Remove duplicate explanation of `git rebase <upstream> <branch>` which
is already explained above.
Signed-off-by: Julia Evans <julia@jvns.ca>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Previously there were two explanations, this combines them both into a
single explanation.
Signed-off-by: Julia Evans <julia@jvns.ca>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
- Start with an example that mirrors the example in the `git-merge` man
page, to make it easier for folks to understand the difference between
a rebase and a merge.
- Mention that rebase can combine or reorder commits
Signed-off-by: Julia Evans <julia@jvns.ca>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Various options to "git diff" that makes comparison ignore certain
aspects of the differences (like "space changes are ignored",
"differences in lines that match these regular expressions are
ignored") did not work well with "--name-only" and friends.
* ly/diff-name-only-with-diff-from-content:
diff: ensure consistent diff behavior with ignore options
Code clean-up.
* ac/deglobal-fmt-merge-log-config:
builtin/fmt-merge-msg: stop depending on 'the_repository'
environment: remove the global variable 'merge_log_config'
"git diff --no-index" run inside a subdirectory under control of a
Git repository operated at the top of the working tree and stripped
the prefix from the output, and oddballs like "-" (stdin) did not
work correctly because of it. Correct the set-up by undoing what
the set-up sequence did to cwd and prefix.
* jc/diff-no-index-in-subdir:
diff: --no-index should ignore the worktree
"git jump" (in contrib/) fails to parse the diff header correctly
when a file has a space in its name, which has been corrected.
* gh/git-jump-pathname-with-sp:
git-jump: make `diff` work with filenames containing spaces
The "list" subcommand of "git refs" acts as a front-end for
"git for-each-ref".
* ms/refs-list:
t: add test for git refs list subcommand
t6300: refactor tests to be shareable
builtin/refs: add list subcommand
builtin/for-each-ref: factor out core logic into a helper
builtin/for-each-ref: align usage string with the man page
doc: factor out common option
Revision traversal limited with pathspec, like "git log dir/*",
used to ignore changed-paths Bloom filter when the pathspec
contained wildcards; now they take advantage of the filter when
they can.
* ly/changed-path-traversal-with-magic-pathspec:
bloom: enable bloom filter with wildcard pathspec in revision traversal
Various bugs about rename handling in "ort" merge strategy have
been fixed.
* en/ort-rename-fixes:
merge-ort: fix directory rename on top of source of other rename/delete
merge-ort: fix incorrect file handling
merge-ort: clarify the interning of strings in opt->priv->path
t6423: fix missed staging of file in testcases 12i,12j,12k
t6423: document two bugs with rename-to-self testcases
merge-ort: drop unnecessary temporary in check_for_directory_rename()
merge-ort: update comments to modern testfile location
Test shuffling.
* ua/t1517-short-help-tests:
t5304: move `prune -h` test from t1517
t5200: move `update-server-info -h` test from t1517
t/t1517: automate `git subcmd -h` tests outside a repository
"git push" had a code path that led to BUG() but it should have
been a die(), as it is a response to a usual but invalid end-user
action to attempt pushing an object that does not exist.
* dl/push-missing-object-error:
remote.c: convert if-else ladder to switch
remote.c: remove BUG in show_push_unqualified_ref_name_error()
t5516: remove surrounding empty lines in test bodies
Arrays of strbuf is often a wrong data structure to use, and
strbuf_split*() family of functions that create them often have
better alternatives.
Update several code paths and replace strbuf_split*().
* jc/strbuf-split:
trace2: do not use strbuf_split*()
trace2: trim_trailing_newline followed by trim is a no-op
sub-process: do not use strbuf_split*()
environment: do not use strbuf_split*()
config: do not use strbuf_split()
notes: do not use strbuf_split*()
merge-tree: do not use strbuf_split*()
clean: do not use strbuf_split*() [part 2]
clean: do not pass the whole structure when it is not necessary
clean: do not use strbuf_split*() [part 1]
clean: do not pass strbuf by value
wt-status: avoid strbuf_split*()
string_list_split*() family of functions have been extended to
simplify common use cases.
* jc/string-list-split:
string-list: split-then-remove-empty can be done while splitting
string-list: optionally omit empty string pieces in string_list_split*()
diff: simplify parsing of diff.colormovedws
string-list: optionally trim string pieces split by string_list_split*()
string-list: unify string_list_split* functions
string-list: align string_list_split() with its _in_place() counterpart
string-list: report programming error with BUG
"git describe" has been optimized by using better data structure.
* rs/describe-with-prio-queue:
describe: use prio_queue_replace()
describe: use prio_queue
"git remote rename origin upstream" failed to move origin/HEAD to
upstream/HEAD when origin/HEAD is unborn and performed other
renames extremely inefficiently, which has been corrected.
* ps/remote-rename-fix:
builtin/remote: only iterate through refs that are to be renamed
builtin/remote: rework how remote refs get renamed
builtin/remote: determine whether refs need renaming early on
builtin/remote: fix sign comparison warnings
refs: simplify logic when migrating reflog entries
refs: pass refname when invoking reflog entry callback
"git refs migrate" to migrate the reflog entries from a refs
backend to another had a handful of bugs squashed.
* ps/reflog-migrate-fixes:
refs: fix invalid old object IDs when migrating reflogs
refs: stop unsetting REF_HAVE_OLD for log-only updates
refs/files: detect race when generating reflog entry for HEAD
refs: fix identity for migrated reflogs
ident: fix type of string length parameter
builtin/reflog: implement subcommand to write new entries
refs: export `ref_transaction_update_reflog()`
builtin/reflog: improve grouping of subcommands
Documentation/git-reflog: convert to use synopsis type
During interactive rebase, using 'drop' on a merge commit lead to
an error, which was incorrect.
* js/rebase-i-allow-drop-on-a-merge:
rebase -i: permit 'drop' of a merge commit
git-gui invokes some long running commands using "nice git $cmd" if nice
is found and works, otherwise just "git $cmd". The current code is more
complex than needed; let's simplify it.
Signed-off-by: Mark Levedahl <mlevedahl@gmail.com>
Signed-off-by: Johannes Sixt <j6t@kdbg.org>
git-gui since 8fe7861c51 ("git-gui: assure PATH has only absolute
elements.", 2025-04-11) uses a list to maintain order and a dict to
detect duplicated elements without quadratic complexity. But, Tcl's
dict explicitly maintains keys in the order first added, thus the list
is not needed. Simplify the code.
Signed-off-by: Mark Levedahl <mlevedahl@gmail.com>
Signed-off-by: Johannes Sixt <j6t@kdbg.org>
When 399b1984 (config: include file if remote URL matches a glob,
2022-01-18) added the 'hasconfig:remote.*.url:<URL>' condition to be
used in the "includeIf.<condition>.path" configuration, the keyword
was added with an extra colon in the documentation.
The section that documents these condition begins with this preamble:
The condition starts with a keyword followed by a colon and some data
whose format and meaning depends on the keyword. Supported keywords
are:
which makes it clear that the colon that comes between the condition
keyword (e.g. "gitdir") and the parameter (aka "some data") is not
a part of the keyword.
Lose the extra colon. Also rewrite description of all keywords to
clarify that "some data" does not directly follow "keyword", and the
colon is not a part of keyword.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* lo/repo-info:
repo: add the --format flag
repo: add the field layout.shallow
repo: add the field layout.bare
repo: add the field references.format
repo: declare the repo command
Asciidoc.py and Asciidoctor do not process the '+' verbatim the same way. A
span is detected when the format sign (here '+')is preceded by a non-word
character. It seems that '{nbsp}' is considered a non-word sign by
Asciidoc.py, but not by Asciidoctor.
Using a double format-sign opens 'unconstrained' span, independent on the
preceding character in both engines.
The '+' sign is used instead of the backtick '`' because it is not processed
as synopsis in asciidoc.py. Unfortunately, the post-processing of verbatim
synopsis in asciidoctor cannot be bypassed and formatting of the parentheses
is forced in syntax sign instead of keywords, unless a proper grammar
analyzer is used.
Signed-off-by: Jean-Noël Avila <jn.avila@free.fr>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When line-level log is invoked with more than one disjoint line range
in the same file, and one of the commits happens to change that file
such that one diff range modifies more than one line range, then
changes to all modified line ranges should be shown, but only the
changes in the first modified line range are:
$ git log --oneline -p
80ca903 (HEAD -> master) Initial
diff --git a/file b/file
new file mode 100644
index 0000000..00935f1
--- /dev/null
+++ b/file
@@ -0,0 +1,10 @@
+Line 1
+Line 2
+Line 3
+Line 4
+Line 5
+Line 6
+Line 7
+Line 8
+Line 9
+Line 10
$ git log --oneline -L1,2:file -L4,5:file -L7,8:file
80ca903 (HEAD -> master) Initial
diff --git a/file b/file
--- /dev/null
+++ b/file
@@ -0,0 +1,2 @@
+Line 1
+Line 2
The line-log-specific diff printer is already clever enough to handle
the case when one line range covers multiple diff ranges, but the
possibility of one diff range touching multiple disjoint line ranges
was apparently overlooked.
Add the necessary condition to dump_diff_hacky_one() to handle this case
as well, and show all modified line ranges:
$ git log --oneline -L1,2:file -L4,5:file -L7,8:file
0f9a5b4 (HEAD -> master) Initial
diff --git a/file b/file
--- /dev/null
+++ b/file
@@ -0,0 +1,2 @@
+Line 1
+Line 2
@@ -0,0 +4,2 @@
+Line 4
+Line 5
@@ -0,0 +7,2 @@
+Line 7
+Line 8
This bug was already present in the initial line-log implementation
added in 2da1d1f6f (Implement line-history search (git log -L),
2013-03-28). Interestingly, that commit already contained a canned
test case covering a similar scenario:
"-L '/long f/',/^}/:a.c -L /main/,/^}/:a.c simple"
This test case looks for two line ranges in the same file, and both
trace back disjointly to the test repository's inital commit,
therefore changes to both line ranges should have been shown for the
initial commit, but only changes for the first line range are shown.
So this test case should have failed from the very beginning, but it
never did, because, unfortunately, the canned expected result is
incorrect, as it doesn't include changes for the second line range.
A similar test with a similarly incorrect canned expected result was
added later in 209618860c (log -L: fix overlapping input ranges,
2013-04-05).
Correct these two canned expected results to contain the changes for
the second line range for the initial commit as well.
Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When line-level log is invoked with more than one disjoint line range
in the same file, and one of the commits happens to change that file
such that:
- the last line of a line range R(n) immediately preceeds the first line
modified or added by a hunk H, and
- subtracting the number of lines added by hunk H from the start and
end of the subsequent line range R(n+1) would result in a range
overlapping with line range R(n),
then git aborts with an assertion error, because those overlapping
line ranges violate the invariants:
$ git log --oneline -p
73e4e2f (HEAD -> master) Add lines 6 7 8 9 10
diff --git a/file b/file
index 572d5d9..00935f1 100644
--- a/file
+++ b/file
@@ -3,3 +3,8 @@ Line 2
Line 3
Line 4
Line 5
+Line 6
+Line 7
+Line 8
+Line 9
+Line 10
66e3561 Add lines 1 2 3 4 5
diff --git a/file b/file
new file mode 100644
index 0000000..572d5d9
--- /dev/null
+++ b/file
@@ -0,0 +1,5 @@
+Line 1
+Line 2
+Line 3
+Line 4
+Line 5
$ git log --oneline -L3,5:file -L7,8:file
git: line-log.c:73: range_set_append: Assertion `rs->nr == 0 || rs->ranges[rs->nr-1].end <= a' failed.
Aborted (core dumped)
The line-log machinery encodes line and diff ranges internally as
[start, end) pairs, i.e. include 'start' but exclude 'end', and line
numbering starts at 0 (as opposed to the -LX,Y option, where it starts
at 1, IOW the parameter -L3,5 is represented internally as { start =
2, end = 5 }).
The reason for this assertion error and some related issues is that
there are a couple of places where 'end' is mistakenly considered to
be part of the range:
- When a commit modifies an interesting path, the line-log machinery
first checks which diff range (i.e. hunk) modify any line ranges.
This is done in diff_ranges_filter_touched(), where the outer loop
iterates over the diff ranges, and in each iteration the inner
loop advances the line ranges supposedly until the current line
range ends at or after the current diff range starts, and then the
current diff and line ranges are checked for overlap.
For HEAD in the above example the first line range [2, 5) ends
just before the diff range [5, 10) starts, so the inner loop
should advance, and then the second line range [6, 8) and the diff
range should be checked for overlap.
Unfortunately, the condition of the inner loop mistakenly
considers 'end' as part of the line range, and, seeing the diff
range starting at 5 and the line range ending at 5, it doesn't
skip the first range. Consequently, the diff range and the first
line range are checked for overlap, and after that the outer loop
runs out of diff ranges, and then the processing goes on in the
false belief that this commit didn't touch any of the interesting
line ranges.
The line-log machinery later shifts the line ranges to account for
any added/removed lines in the diff ranges preceeding each line
range. This leaves the first line range intact, but attempts to
shift the second line range [6, 8) by 5 lines towards the
beginning of the file, resulting in [1, 3), triggering the
assertion error, because the two overlapping line ranges violate
the invariants.
Fix that loop condition in diff_ranges_filter_touched() to not
treat 'end' as part of the line range.
- With the above fix the assertion error is gone... but, alas, we
now get stuck in an endless loop!
This happens in range_set_difference(), where a couple of nested
loops iterate over the line and diff ranges, and a condition is
supposed to break the middle loop when the current line range ends
before the current diff range, so processing could continue with
the next line range.
For HEAD in the above example the first line range [2, 5) ends
just before the diff range [5, 10) starts, so this condition
should trigger and break the middle loop.
Unfortunately, just like in the case of the assertion error, this
conditions mistakenly considers 'end' as part of the line range,
and, seeing the line range ending at 5 and the diff range starting
at 5, it doesn't break the loop, which will then go on and on.
Fix this condition in range_set_difference() to not treat 'end' as
part of the line range.
- With the above fix the endless loop is gone... but, alas, the
output is now wrong, as it shows both line ranges for HEAD, even
though the first line range is not modified by that commit:
$ git log --oneline -L3,5:file -L7,8:file
73e4e2f (HEAD -> master) Add lines 6 7 8 9 10
diff --git a/file b/file
--- a/file
+++ b/file
@@ -3,3 +3,3 @@
Line 3
Line 4
Line 5
@@ -6,0 +7,2 @@
+Line 7
+Line 8
66e3561 Add lines 1 2 3 4 5
diff --git a/file b/file
--- /dev/null
+++ b/file
@@ -0,0 +3,3 @@
+Line 3
+Line 4
+Line 5
In dump_diff_hacky_one() a couple of nested loops are responsible
for finding and printing the modified line ranges: the big outer
loop iterates over all line ranges, and the first inner loop skips
over the diff ranges that end before the start of the current line
range. This is followed by a condition checking whether the
current diff range starts after the end of the current line range,
which, when fulfilled, continues and advances the outer loop to
the next line range.
For HEAD in the above example the first line range [2, 5) ends
just before the diff range [5, 10), so this condition should
trigger, and the outer loop should advance to the second line
range.
Unfortunately, just like in the previous cases, this condition
mistakenly considers 'end' as part of the line range, and, seeing
the first line range ending at 5 and the diff range starting at 5,
it doesn't continue to advance the outher loop, but goes on to
show the (unmodified) first line range.
Fix this condition to not treat 'end' as part of the line range,
just like in the previous cases.
After all this the command in the above example finally finishes and
produces the right output:
$ git log --oneline -L3,5:file -L7,8:file
73e4e2f (HEAD -> master) Add lines 6 7 8 9 10
diff --git a/file b/file
--- a/file
+++ b/file
@@ -6,0 +7,2 @@
+Line 7
+Line 8
66e3561 Add lines 1 2 3 4 5
diff --git a/file b/file
--- /dev/null
+++ b/file
@@ -0,0 +3,3 @@
+Line 3
+Line 4
+Line 5
Add a canned test similar to the above example, with the line ranges
adjusted to the test repository's history.
Reported-by: Evgeni Chasnovski <evgeni.chasnovski@gmail.com>
Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Discord is a great way of receiving help for members of the community
that are not on the mailing list or not familiar with Libera.
Adding it to the official documentation will aid discoverability of it.
The link is the same as the one at https://git-scm.com/community.
Signed-off-by: Daniele Sassoli <danielesassoli@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
There's a call in describe_commit() to lookup_commit_reference(), but we
don't check the return value. If it returns NULL, we'll segfault as we
immediately dereference the result.
In practice this can never happen, since all callers pass an oid which
came from a "struct commit" already. So we can make this more obvious
by just taking that commit struct in the first place.
Reported-by: Cheng <prophecheng@stu.pku.edu.cn>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When describing a blob, we traverse from HEAD, remembering each commit
we saw, and then checking each blob to report the containing commit.
But if we haven't seen any commits at all, we'll segfault (we store the
"current" commit as an oid initialized to the null oid, causing
lookup_commit_reference() to return NULL).
This shouldn't be able to happen normally. We always start our traversal
at HEAD, which must be a commit (a property which is enforced by the
refs code). But you can trigger the segfault like this:
blob=$(echo foo | git hash-object -w --stdin)
echo $blob >.git/HEAD
git describe $blob
We can instead catch this case and return an empty result, which hits
the usual "we didn't find $blob while traversing HEAD" error.
This is a minor lie in that we did "find" the blob. And this even hints
at a bigger problem in this code: what if the traversal pointed to the
blob as _not_ part of a commit at all, but we had previously filled in
the recorded "current commit"? One could imagine this happening due to a
tag pointing directly to the blob in question.
But that can't happen, because we only traverse from HEAD, never from
any other refs. And the intent of the blob-describing code is to find
blobs within commits.
So I think this matches the original intent as closely as we can (and
again, this segfault cannot be triggered without corrupting your
repository!).
The test here does not use the formula above, which works only for the
files backend (and not reftables). Instead we use another loophole to
create the bogus state using only Git commands. See the comment in the
test for details.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Gitk is now maintained by Johannes Sixt and the repository can be
cloned from a new URL. b59358100c20 (Update the official repo of
gitk, 2024-12-24) could have updated this instance in the manual,
too, but the opportunity was missed. Update it now. Do give credit
to Paul Mackerras as the inventor of the program.
Signed-off-by: Johannes Sixt <j6t@kdbg.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When given an expected "before" state, the ref-writing code will avoid
overwriting any ref that does not match that expected state. We use the
null oid as a sentinel value for "nothing should exist", and likewise
that is the sentinel value we get when trying to read a ref that does
not exist.
But there's one corner case where this is ambiguous: dangling symrefs.
Trying to read them will yield the null oid, but there is potentially
something of value there: the dangling symref itself.
For a normal recursive write, this is OK. Imagine we have a symref
"FOO_HEAD" that points to a ref "refs/heads/bar" that does not exist,
and we try to write to it with a create operation like:
oid=$(git rev-parse HEAD) ;# or whatever
git symbolic-ref FOO_HEAD refs/heads/bar
echo "create FOO_HEAD $oid" | git update-ref --stdin
The attempt to resolve FOO_HEAD will actually resolve "bar", yielding
the null oid. That matches our expectation, and the write proceeds. This
is correct, because we are not writing FOO_HEAD at all, but writing its
destination "bar", which in fact does not exist.
But what if the operation asked not to dereference symrefs? Like this:
echo "create FOO_HEAD $oid" | git update-ref --no-deref --stdin
Resolving FOO_HEAD would still result in a null oid, and the write will
proceed. But it will overwrite FOO_HEAD itself, removing the fact that
it ever pointed to "bar".
This case is a little esoteric; we are clobbering a symref with a
no-deref write of a regular ref value. But the same problem occurs when
writing symrefs. For example:
echo "symref-create FOO_HEAD refs/heads/other" |
git update-ref --no-deref --stdin
The "create" operation asked us to create FOO_HEAD only if it did not
exist. But we silently overwrite the existing value.
You can trigger this without using update-ref via the fetch
followRemoteHEAD code. In "create" mode, it should not overwrite an
existing value. But if you manually create a symref pointing to a value
that does not yet exist (either via symbolic-ref or with "remote add
-m"), create mode will happily overwrite it.
Instead, we should detect this case and refuse to write. The correct
specification to overwrite FOO_HEAD in this case is to provide an
expected target ref value, like:
echo "symref-update FOO_HEAD refs/heads/other ref refs/heads/bar" |
git update-ref --no-deref --stdin
Note that the non-symref "update" directive does not allow you to do
this (you can only specify an oid). This is a weakness in the update-ref
interface, and you'd have to overwrite unconditionally, like:
echo "update FOO_HEAD $oid" | git update-ref --no-deref --stdin
Likewise other symref operations like symref-delete do not accept the
"ref" keyword. You should be able to do:
echo "symref-delete FOO_HEAD ref refs/heads/bar"
but cannot (and can only delete unconditionally). This patch doesn't
address those gaps. We may want to do so in a future patch for
completeness, but it's not clear if anybody actually wants to perform
those operations. The symref update case (specifically, via
followRemoteHEAD) is what I ran into in the wild.
The code for the fix is relatively straight-forward given the discussion
above. But note that we have to implement it independently for the files
and reftable backends. The "old oid" checks happen as part of the
locking process, which is implemented separately for each system. We may
want to factor this out somehow, but it's beyond the scope of this
patch. (Another curiosity is that the messages in the reftable code are
marked for translation, but the ones in the files backend are not. I
followed local convention in each case, but we may want to harmonize
this at some point).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
These tests set config within a sub-repo using (cd two && git config),
and then a separate test_when_finished outside the subshell to clean it
up. We can't use test_config to do this, because the cleanup command it
registers inside the subshell would be lost. Nor can we do it before
entering the subshell, because the config has to be set after some other
commands are run.
Let's switch these tests to use "git -C" for each command instead of a
subshell. That lets us use test_config (with -C also) at the appropriate
part of the test. And we no longer need the manual cleanup command.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Several tests in t5510 do a bare "cd subrepo", not in a subshell. This
changes the working directory for subsequent tests. As a result, almost
every test has to start with "cd $D" to go back to the top-level.
Our usual style is to do per-test environment changes like this in a
subshell, so that tests can assume they are starting at the top-level
$TRASH_DIRECTORY.
Let's switch to that style, which lets us drop all of that extra
path-handling.
Most cases can switch to using a subshell, but in a few spots we can
simplify by doing "git init foo && git -C foo ...". We do have to make
sure that we weren't intentionally touching the environment in any code
which was moved into a subshell (e.g., with a test_when_finished), but
that isn't the case for any of these tests.
All of the references to the $D variable can go away, replaced generally
with $PWD or $TRASH_DIRECTORY (if we use it inside a chdir'd subshell).
Note in one test, "fetch --prune prints the remotes url", we make sure
to use $(pwd) to get the Windows-style path on that platform (for the
other tests, the exact form doesn't matter).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Several tests set a config variable in a sub-repo we chdir into via a
subshell, like this:
(
cd "$D" &&
cd two &&
git config foo.bar baz
)
But they also clean up the variable with a when_finished directive
outside of the subshell, like this:
test_when_finished "git config unset foo.bar"
At first glance, this shouldn't work! The cleanup clause cannot be run
from the subshell (since environment changes there are lost by the time
the test snippet finishes). But since the cleanup command runs outside
the subshell, our working directory will not have been switched into
"two".
But it does work. Why?
The answer is that an earlier test does a "cd two" that moves the whole
test's working directory out of $TRASH_DIRECTORY and into "two". So the
subshell is a bit of a red herring; we are already in the right
directory! That's why we need the "cd $D" at the top of the shell, to
put us back to a known spot.
Let's make this cleanup code more explicitly specify where we expect the
config command to run. That makes the script more robust against running
a subset of the tests, and ultimately will make it easier to refactor
the script to avoid these top-level chdirs.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
- Mention the --force option earlier
- Remove the explanation of shell globbing vs git's internal glob
system, since users are confused by it and there's a clearer
discussion in the EXAMPLES section.
Signed-off-by: Julia Evans <julia@jvns.ca>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
- Add a basic example of how "git add" is normally used
- It's not technically true that you *must* use the `add` command to
add changes before running `git commit`, because `git commit -a`
exists. Instead say that you *can* use the `add` command.
- Mention early on that "index" is another word for "staging area",
since Git very rarely uses the word "index" in its output
(`git status`) uses the term "staged", and many Git users are
unfamiliar with the term "index"
- Remove "It typically adds" (it's not clear what "typically" means),
and instead mention that `git add -p` can be used to add
partial contents
- Currently the introduction is somewhat repetitive ("to prepare the
content staged for the next commit" ... "this snapshot that is taken
as the contents of the next commit."), replace with a single sentence
("The "index" [...] is where Git stores the contents of the next
commit.")
Signed-off-by: Julia Evans <julia@jvns.ca>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The changes added by 39fc408562 (t/t1517: automate `git subcmd -h` tests
outside a repository, 2025-08-08) to automatically loop over all "main"
Git commands will, when run against an installed build using
GIT_TEST_INSTALLED rather than the build in the build directory, include
some extra git-gui commands that are installed by `make install`, or
credential helpers that might be installed manually from the contrib
directories. These fail the test, so record them as such.
Signed-off-by: Adam Dinwoodie <adam@dinwoodie.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When describing a blob, we search for it by traversing from HEAD. We do
this by feeding the name HEAD to setup_revisions(). But if we are on an
unborn branch, this will fail with a confusing message:
$ git describe $blob
fatal: ambiguous argument 'HEAD': unknown revision or path not in the working tree.
Use '--' to separate paths from revisions, like this:
'git <command> [<revision>...] -- [<file>...]'
It is OK for this to be an error (we cannot find $blob in an empty
traversal, so we'd eventually complain about that). But the error
message could be more helpful.
Let's resolve HEAD ourselves and pass the resolved object id to
setup_revisions(). If resolving fails, then we can print a more useful
message.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
If describe_blob() does not find the blob in question, it returns an
empty strbuf, and we print an empty line. This differs from
describe_commit(), which always either returns an answer or calls die()
itself. As the blob function was bolted onto the command afterwards, I
think its behavior is not intentional, and it is just a bug that it does
not report an error.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We pass a "struct object_id" to describe_blob() by value. This isn't
wrong, as an oid is composed only of copy-able values. But it's unusual;
typically we pass structs by const pointer, including object_ids. Let's
do so.
It similarly makes sense for us to hold that pointer in the callback
data (rather than yet another copy of the oid).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
xdl_hash_record_verbatim uses modified djb2 hash with XOR instead of ADD
for combining. The ADD-based variant is used as the basis of the modern
("GNU") symbol lookup scheme in ELF. Glibc dynamic loader received an
optimized version of this hash function thanks to Noah Goldstein [1].
Switch xdl_hash_record_verbatim to additive hashing and implement
an optimized loop following the scheme suggested by Noah.
Timing 'git log --oneline --shortstat v2.0.0..v2.5.0' under perf, I got
version | cycles, bn | instructions, bn
---------------------------------------
A 6.38 11.3
B 6.21 10.89
C 5.80 9.95
D 5.83 8.74
---------------------------------------
A: baseline (git master at e4ef0485fd78)
B: plus 'xdiff: refactor xdl_hash_record()'
C: and plus this patch
D: with 'xdiff: use xxhash' by Phillip Wood
The resulting speedup for xdl_hash_record_verbatim itself is about 1.5x.
[1] https://inbox.sourceware.org/libc-alpha/20220519221803.57957-6-goldstein.w.n@gmail.com/
Signed-off-by: Alexander Monakov <amonakov@ispras.ru>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add the --format flag to git-repo-info. By using this flag, the users
can choose the format for obtaining the data they requested.
Given that this command can be used for generating input for other
applications and for being read by end users, it requires at least two
formats: one for being read by humans and other for being read by
machines. Some other Git commands also have two output formats, notably
git-config which was the inspiration for the two formats that were
chosen here:
- keyvalue, where the retrieved data is printed one per line, using =
for delimiting the key and the value. This is the default format,
targeted for end users.
- nul, where the retrieved data is separated by NUL characters, using
the newline character for delimiting the key and the value. This
format is targeted for being read by machines.
Helped-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Helped-by: Junio C Hamano <gitster@pobox.com>
Helped-by: Justin Tobler <jltobler@gmail.com>
Helped-by: Eric Sunshine <sunshine@sunshineco.com>
Mentored-by: Karthik Nayak <karthik.188@gmail.com>
Mentored-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Lucas Seiki Oshiro <lucasseikioshiro@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This commit is part of the series that introduces the new subcommand
git-repo-info.
The flag `--is-shallow-repository` from git-rev-parse is used for
retrieving whether the repository is shallow. This way, it is used for
querying repository metadata, fitting in the purpose of git-repo-info.
Then, add a new field `layout.shallow` to the git-repo-info subcommand
containing that information.
Helped-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Helped-by: Junio C Hamano <gitster@pobox.com>
Helped-by: Justin Tobler <jltobler@gmail.com>
Helped-by: Eric Sunshine <sunshine@sunshineco.com>
Mentored-by: Karthik Nayak <karthik.188@gmail.com>
Mentored-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Lucas Seiki Oshiro <lucasseikioshiro@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This commit is part of the series that introduces the new subcommand
git-repo-info.
The flag --is-bare-repository from git-rev-parse is used for retrieving
whether the current repository is bare. This way, it is used for
querying repository metadata, fitting in the purpose of git-repo-info.
Then, add a new field layout.bare to the git-repo-info subcommand
containing that information.
Helped-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Helped-by: Junio C Hamano <gitster@pobox.com>
Helped-by: Justin Tobler <jltobler@gmail.com>
Helped-by: Eric Sunshine <sunshine@sunshineco.com>
Mentored-by: Karthik Nayak <karthik.188@gmail.com>
Mentored-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Lucas Seiki Oshiro <lucasseikioshiro@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This commit is part of the series that introduces the new subcommand
git-repo-info.
The flag `--show-ref-format` from git-rev-parse is used for retrieving
the reference format (i.e. `files` or `reftable`). This way, it is
used for querying repository metadata, fitting in the purpose of
git-repo-info.
Add a new field `references.format` to the repo-info subcommand
containing that information.
Helped-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Helped-by: Junio C Hamano <gitster@pobox.com>
Helped-by: Justin Tobler <jltobler@gmail.com>
Helped-by: Eric Sunshine <sunshine@sunshineco.com>
Mentored-by: Karthik Nayak <karthik.188@gmail.com>
Mentored-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Lucas Seiki Oshiro <lucasseikioshiro@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Currently, `git rev-parse` covers a wide range of functionality not
directly related to parsing revisions, as its name suggests. Over time,
many features like parsing datestrings, options, paths, and others
were added to it because there wasn't a more appropriate command
to place them.
Create a new Git command called `repo`. `git repo` will be the main
command for obtaining the information about a repository (such as
metadata and metrics).
Also declare a subcommand for `repo` called `info`. `git repo info`
will bring the functionality of retrieving repository-related
information currently returned by `rev-parse`.
Add the required documentation and build changes to enable usage of
this subcommand.
Helped-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Helped-by: Junio C Hamano <gitster@pobox.com>
Helped-by: Justin Tobler <jltobler@gmail.com>
Helped-by: Eric Sunshine <sunshine@sunshineco.com>
Mentored-by: Karthik Nayak <karthik.188@gmail.com>
Mentored-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Lucas Seiki Oshiro <lucasseikioshiro@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
As part of 9bbc981c6f2 (t/unit-tests: finalize migration of
reftable-related tests, 2025-07-24), the explicit list of
`UNIT_TEST_PROGRAMS` was turned into a wildcard pattern-derived list.
Let's do the same in the CMake definition.
This fixes build errors with symptoms like this:
CMake Error at CMakeLists.txt:132 (string):
string sub-command REPLACE requires at least four arguments.
Call Stack (most recent call first):
CMakeLists.txt:1037 (parse_makefile_for_scripts)
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Edit: We are continuing to follow the existing PO file convention, which
includes filenames but strips out line numbers from the file-location
comments. This standard was set by our former lead, Jordi Mas, and we
are maintaining it for project-wide consistency.
Signed-off-by: Mikel Forcada <mikel.forcada@gmail.com>
Signed-off-by: Jiang Xin <worldhello.net@gmail.com>
Many of the commit-graph related functions take in both a repository and
the object database source (directly or via `struct commit_graph`) for
which we are supposed to load such a commit-graph. In the best case this
information is simply redundant as the source already contains a
reference to its owning object database, which in turn has a reference
to its repository. In the worst case this information could even
mismatch when passing in a source that doesn't belong to the same
repository.
Refactor the code so that we only pass in the object database source in
those cases.
There is one exception though, namely `load_commit_graph_chain_fd_st()`,
which is responsible for loading a commit-graph chain. It is expected
that parts of the commit-graph chain aren't located in the same object
source as the chain file itself, but in a different one. Consequently,
this function doesn't work on the source level but on the database level
instead.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
There's still a bunch of uses of `the_repository` in "commit-graph.c",
which we want to stop using due to it being a global variable. Refactor
the code to stop using `the_repository` in favor of the repository
provided via the calling context.
This allows us to drop the `USE_THE_REPOSITORY_VARIABLE` macro.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Stop using `the_hash_algo` as it implicitly relies on `the_repository`.
Instead, we either use the hash algo provided via the context or, if
there is no such hash algo, we use `the_repository` explicitly. Such
uses will be removed in subsequent commits.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Refactor `parse_commit_graph()` so that it takes a repository instead of
taking repository settings. On the one hand this allows us to get rid of
instances where we access `the_hash_algo` by using the repository's hash
algorithm instead. On the other hand it also allows us to move the call
of `prepare_repo_settings()` into the function itself.
Note that there's one small catch, as the commit-graph fuzzer calls this
function directly without having a fully functional repository at hand.
And while the fuzzer already initializes `the_repository` with relevant
info, the call to `prepare_repo_settings()` would fail because we don't
have a fully-initialized repository.
Work around the issue by also settings `settings.initialized` to pretend
that we've already read the settings.
While at it, remove the redundant `parse_commit_graph()` declaration in
the fuzzer. It was added together with aa658574bf (commit-graph, fuzz:
add fuzzer for commit-graph, 2019-01-15), but as we also declared the
same function in "commit-graph.h" it wasn't ever needed.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The commit-graph stores the length of the hash algorithm it uses. In
subsequent commits we'll need to pass the whole hash algorithm around
though, which we currently don't have access to.
Refactor the code so that we store the hash algorithm instead of only
its size.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We have two macros `GRAPH_DATA_WIDTH` and `GRAPH_MIN_SIZE` that compute
hash-dependent sizes. They do so by using the global `the_hash_algo`
variable though, which we want to get rid of over time.
Convert these macros into functions that accept the hash algorithm as
input parameter. Adapt callers accordingly.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
0bdaa12169b (git-count-objects.txt: describe each line in -v output,
2013-02-08) forgot to include `packs`.
Signed-off-by: Daniele Sassoli <danielesassoli@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When you have two or more objects with object names that share more
than 32 letters in an SHA-1 repository, find_unique_abbrev() fails
to show disambiguation.
To see how many leading letters of a given full object name is
sufficiently unambiguous, the algorithm starts from a initial
length, guessed based on the estimated number of objects in the
repository, and see if another object that shares the prefix, and
keeps extending the abbreviation. The loop stops at GIT_MAX_RAWSZ,
which is counted as the number of bytes, since 5b20ace6 (sha1_name:
unroll len loop in find_unique_abbrev_r(), 2017-10-08); before that
change, it extended up to GIT_SHA1_HEXSZ, which meant to stop at the
end of hexadecimal SHA-1 object name.
Because the hexadecimal object name passed to the function is
NUL-terminated, and this fact is used to correctly terminate the
loop that scans for the first difference earlier in the function,
use it to make sure we do not increment the .cur_len member beyond
the end of the string.
Noticed-by: Jon Forrest <nobozo@gmail.com>
Helped-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Some of the editor tests manipulate the environment or config in ways
that affect future tests, but those modifications are visible to future
tests and create a footgun for them.
Use test_config, subshells, single-command environment overrides, and
test helpers to automatically undo environment and config modifications
once finished.
Best-viewed-with: --ignore-all-space
Signed-off-by: D. Ben Knoble <ben.knoble+github@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We want the editors in this test on PATH, so put them there.
Signed-off-by: D. Ben Knoble <ben.knoble+github@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Tests in t7005 mask Git error codes and do not use our nice test
helpers. Improve that, move some code into the setup test, and drop a
few old-style blank lines while at it.
Best-viewed-with: --ignore-all-space
Signed-off-by: D. Ben Knoble <ben.knoble+github@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
`git imap-send` was built on the idea of copying emails to an IMAP folder
like drafts, and sending them later using an email client. Currently
the only way to do it is by piping output of `git format-patch` to IMAP
send.
Add another way to do it by using `git send-email` with the
`--use-imap-only` or `sendmail.useImapOnly` option. This allows users to
use the advanced features of `git send-email` like tweaking Cc: list
programmatically, compose the cover letter, etc. and then send the well
formatted emails to an IMAP folder using `git imap-send`.
While at it, use `` instead of '' for --smtp-encryption ssl in help
section of `git send-email`.
Signed-off-by: Aditya Garg <gargaditya08@live.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Some email providers like Apple iCloud Mail do not support sending a copy
of sent emails to the "Sent" folder if SMTP server is used. As a
workaround, various email clients like Thunderbird which rely on SMTP,
use IMAP to send a copy of sent emails to the "Sent" folder. Something
similar can be done if sending emails via `git send-email`, by using
the `git imap-send` command to send a copy of the sent email to an IMAP
folder specified by the user.
Add this functionality to `git send-email` by introducing a new
configuration variable `sendemail.imapfolder` and command line option
`--imap-folder` which specifies the IMAP folder to send a copy of the
sent emails to. If specified, a copy of the sent emails will be sent
by piping the emails to `git imap-send` command, after all emails are
sent via SMTP and the SMTP server has been closed.
Signed-off-by: Aditya Garg <gargaditya08@live.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The synopsis section has an extra closing bracket, like this:
[--filter=<filter>] [--also-filter-submodules]]
The extra one is not the one at the end of this line; it is the one
after "...=<filter>".
The "--also-filter-submodules" option was added by f05da2b4 (clone,
submodule: pass partial clone filters to submodules, 2022-02-04).
Because it makes sense only when used with the "--filter=<filter>"
option, these two options are enclosed in a pair of brackets. The
extra one was added by 76880f05 (doc: git-clone: apply new
documentation formatting guidelines, 2024-03-29) by mistake.
Remove the extra and incorrect closing bracket, so that the line
reads:
[--filter=<filter> [--also-filter-submodules]]
Signed-off-by: Knut Harald Ryager <e-k-nut@hotmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When creating a new addition via either `reftable_stack_new_addition()`
or its convenince wrapper `reftable_stack_add()` we:
1. Create the "tables.list.lock" file.
2. Verify that the current version of the "tables.list" file is
up-to-date.
3. Write the new table records if so.
By default, the second step would cause us to bail out if we see that
there has been a concurrent write to the stack that made our in-memory
copy of the stack out-of-date. This is a safety mechanism to not write
records to the stack based on outdated information.
The downside though is that concurrent writes may now cause us to bail
out, which is not a good user experience. In addition, this isn't even
necessary for us, as Git knows to perform all checks for the old state
of references under the lock. (Well, in all except one case: when we
expire the reflog we first create the log iterator before we create the
lock, but this ordering is fixed as part of this commit.)
Consequently, most writers pass the `REFTABLE_STACK_NEW_ADDITION_RELOAD`
flag. The effect of this flag is that we reload the stack after having
acquired the lock in case the stack is out-of-date. This plugs the race
with concurrent writers, but we continue performing the verifications of
the expected old state to catch actual conflicts in the references we
are about to write.
Adapt the remaining callsites that don't yet pass this flag to do so.
While at it, drop a needless manual reload.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The `flock` interface is implemented as part of "reftable/system.c" and
thus needs to be implemented by the integrator between the reftable
library and its parent code base. As such, we cannot rely on any
specific implementation thereof.
Regardless of that, users of the `flock` subsystem rely on `errno` being
set to specific values. This is fragile and not documented anywhere and
doesn't really make for a good interface.
Refactor the code so that the implementations themselves are expected to
return reftable-specific error codes. Our implementation of the `flock`
subsystem already knows to do this for all error paths except one.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When we compact the reftable stack we first acquire the lock for the
"tables.list" file and then reload the stack to check that it is still
up-to-date. This is done by calling `stack_uptodate()`, which knows to
return zero in case the stack is up-to-date, a positive value if it is
not and a negative error code on unexpected conditions.
We don't do proper error checking though, but instead we only check
whether the returned error code is non-zero. If so, we simply bubble it
up the calling stack, which means that callers may see an unexpected
positive value.
Fix this issue by translating to `REFTABLE_OUTDATED_ERROR` instead.
Handle this situation in `reftable_addition_commit()`, where we perform
a best-effort auto-compaction.
All other callsites of `stack_uptodate()` know to handle a positive
return value and thus don't need to be fixed.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The `reftable_stack_add()` function is a simple wrapper to lock the
stack, add records to it via a callback and then commit the
result. One problem with it though is that it doesn't accept any flags
for creating the addition. This makes it impossible to automatically
reload the stack in case it was modified before we managed to lock the
stack.
Add a `flags` field to plug this gap and pass it through accordingly.
For now this new flag won't be used by us, but it will be used by
libgit2.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
While perfectly legal, older compiler toolchains complain when
zero-initializing structs that contain nested structs with `{0}`:
/home/libgit2/source/deps/reftable/stack.c:862:35: error: suggest braces around initialization of subobject [-Werror,-Wmissing-braces]
struct reftable_addition empty = REFTABLE_ADDITION_INIT;
^~~~~~~~~~~~~~~~~~~~~~
/home/libgit2/source/deps/reftable/stack.c:707:33: note: expanded from macro 'REFTABLE_ADDITION_INIT'
#define REFTABLE_ADDITION_INIT {0}
^
We had the discussion around whether or not we want to handle such bogus
compiler errors in the past already [1]. Back then we basically decided
that we do not care about such old-and-buggy compilers, so while we
could fix the issue by using `{{0}}` instead this is not the preferred
way to handle this in the Git codebase.
We have an easier fix though: we can just drop the macro altogether and
handle initialization of the struct in `reftable_stack_addition_init()`.
Callers are expected to call this function already, so this change even
simplifies the calling convention.
[1]: https://lore.kernel.org/git/20220710081135.74964-1-sunshine@sunshineco.com/T/
Suggested-by: Carlo Arenas <carenas@gmail.com>
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We have a couple of forward declarations in the stack-related code of
the reftable library. These declarations aren't really required, but are
simply caused by unfortunate ordering.
Reorder the code and remove the forward declarations.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The reftable writer accidentally uses the Git-specific `QSORT()` macro.
This macro removes the need for the caller to provide the element size,
but other than that it's mostly equivalent to `qsort()`.
Replace the macro accordingly to make the library usable outside of Git.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Both `reftable_writer_add_refs()` and `reftable_writer_add_logs()`
accept an array of records that should be added to the new table.
Callers of this function are expected to also pass the number of such
records to the function to tell it how many such records it is supposed
to write.
But while all callers pass in a `size_t`, which is a sensible choice,
the function in fact accepts an `int` as argument, which is less so. Fix
this.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When switching manpages to the synopsis style, the description lists of
options need to be switched to inline synopsis for proper formatting. This
is done by enclosing the option name in double backticks, e.g. `--option`.
Signed-off-by: Jean-Noël Avila <jn.avila@free.fr>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This commit fixes the synopsis syntax and changes the wording of a few
descriptions to be more consistent with the rest of the documentation.
It is a prepartion for the next commit that checks that synopsis style is
applied consistently across a manual page.
Signed-off-by: Jean-Noël Avila <jn.avila@free.fr>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
For better searchability, this commit adds a check to ensure that parameters
expressed in the form of `--[no-]parameter` are not used in the
documentation. In the place of such parameters, the documentation should
list two separate parameters: `--parameter` and `--no-parameter`.
Signed-off-by: Jean-Noël Avila <jn.avila@free.fr>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
For simplifying automated translation of the documentation, it is better to
only present one term in each entry of a description list of options. This
is because most of these terms can automatically be marked as
notranslatable.
Also, due to portability issues, the script generate-configlist.sh can no
longer insert newlines in the output. However, the result is that it no
longer correctly handles multiple terms in a single entry of definition
lists.
As a result, we now check that these entries do not exist in the
documentation.
Reviewed-by: Collin Funk <collin.funk1@gmail.com>
Signed-off-by: Jean-Noël Avila <jn.avila@free.fr>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Having an empty line before each delimited sections is not required by
asciidoc, but it is a safety measure that prevents generating malformed
asciidoc when generating translated documentation.
When a delimited section appears just after a paragraph, the asciidoc
processor checks that the length of the delimited section header is
different from the length of the paragraph. If it is not, the asciidoc
processor will generate a title. In the original English documentation, this
is not a problem because the authors always check the output of the asciidoc
processor and fix the length of the delimited section header if it turns out
to be the same as the paragraph length. However, this is not the case for
translations, where the authors have no way to check the length of the
delimited section header or the output of the asciidoc processor. This can
lead to a section title that is not intended.
Indeed, this test also checks that titles are correctly formed, that is,
the length of the underline is equal to the length of the title (otherwise
it would not be a title but a section header).
Finally, this test checks that the delimited section are terminated within
the same file.
Signed-off-by: Jean-Noël Avila <jn.avila@free.fr>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Some readers of man pages have reported that they found
malformed linkgit macros in the documentation (absence or bad
spelling).
Signed-off-by: Jean-Noël Avila <jn.avila@free.fr>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
With the preceding commits we started to always have the object database
source available when we load, write or access multi-pack indices. With
this in place we can change how MIDX paths are computed so that we don't
have to pass in the combination of a hash algorithm and object directory
anymore, but only the object database source.
Refactor the code accordingly.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Multi-pack indices store some information that is redundant with their
owning source:
- The locality bit that tracks whether the source is the primary
object source or an alternate.
- The object directory path the multi-pack index is located in.
- The pointer to the owning parent directory.
All of this information is already contained in `struct odb_source`. So
now that we always have that struct available when loading a multi-pack
index we have it readily accessible.
Drop the redundant information and instead store a pointer to the object
source.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Similar to the preceding commit, refactor the writing side of multi-pack
indices so that we pass in the object database source where the index
should be written to.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
To load a multi-pack index the caller is expected to pass both the
repository and the object directory where the multi-pack index is
located. While this works, this layout has a couple of downsides:
- We need to pass in information reduntant with the owning source,
namely its object directory and whether the source is local or not.
- We don't have access to the source when loading the multi-pack
index. If we had that access, we could store a pointer to the owning
source in the MIDX and thus deduplicate some information.
- Multi-pack indices are inherently specific to the object source and
its format. With the goal of pluggable object backends in mind we
will eventually want the backends to own the logic of reading and
writing multi-pack indices. Making the logic work on top of object
sources is a step into that direction.
Refactor loading of multi-pack indices accordingly.
This surfaces one small problem though: git-multi-pack-index(1) and our
MIDX test helper both know to read and write multi-pack-indices located
in a different object directory. This issue is addressed by adding the
user-provided object directory as an in-memory alternate.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
There are a couple of functions that take both a `struct repository` and
a `struct multi_pack_index`. This provides redundant information though
without much benefit given that the multi-pack index already has a
pointer to its owning repository.
Drop the `struct repository` parameter from such functions. While at it,
reorder the list of parameters of `fill_midx_entry()` so that the MIDX
comes first to better align with our coding guidelines.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Callers of `link_alt_odb_entry()` are expected to pass in three
different paths:
- The (potentially relative) path of the object directory that we're
about to add.
- The base that should be used to resolve a relative object directory
path.
- The resolved path to the object database's objects directory.
Juggling those three paths makes the calling convention somewhat hard to
grok at first.
As it turns out, the third parameter is redundant: we always pass in the
resolved path of the object database's primary source, and we already
pass in the database itself. So instead, we can resolve that path in the
function itself.
One downside of this is that one caller of `link_alt_odb_entry()` calls
this function in a loop, so we were able to resolve the directory a
single time, only. But ultimately, we only ever end up with a rather
limited number of alternates anyway, so the extra couple of cycles we
save feels more like a micro optimization.
Refactor the code accordingly.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Callers have no trivial way to obtain the newly created object database
source when adding it to the in-memory list of alternates. While not yet
needed anywhere, a subsequent commit will want to obtain that pointer.
Refactor the function to return the source to make it easily accessible.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The functions that add an alternate object directory to the object
database are somewhat inconsistent in how they call the paramater that
refers to the directory path: in our headers we refer to it as "dir",
whereas in the implementation we often call it "reference" or "entry".
Unify this and consistently call the parameter "dir". While at it,
refactor `link_alt_odb_entry()` to accept a C string instead of a
`struct strbuf` as parameter to clarify that we really only need the
path and nothing else.
Suggested-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When trying to locate a source for an unknown object directory we will
die right away. In subsequent patches we will add new callsites though
that want to handle this situation gracefully instead.
Refactor the function to return a `NULL` pointer if the source could not
be found and adapt the callsites to die instead. Introduce a new wrapper
`odb_find_source_or_die()` that continues to die in case the source
could not be found.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Object database sources are classified either as:
- Local, which means that the source is the repository's primary
source. This is typically ".git/objects".
- Non-local, which is everything else. Most importantly this includes
alternates and quarantine directories.
This locality is often computed ad-hoc by checking whether a given
object source is the first one. This works, but it is quite roundabout.
Refactor the code so that we store locality when creating the sources in
the first place. This makes it both more accessible and robust.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Refactor builtin/fmt-merge-msg.c to remove the dependancy on the global
'the_repository'. Remove the 'UNUSED' macro from the 'struct repository'
parameter and replace 'git_config()' with 'repo_config()' so that
configuration is read from the passed repository. Also, add a test to
make sure that "git fmt-merge-msg -h" can be called outside a
repository.
Mentored-by: Christian Couder <christian.couder@gmail.com>
Mentored-by: Ghanshyam Thakkar <shyamthakkar001@gmail.com>
Signed-off-by: Ayush Chandekar <ayu.chandekar@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The global variable 'merge_log_config', set via the "merge.log" or
"merge.summary" settings, is only used in 'cmd_fmt_merge_msg()' and
'cmd_merge()' to adjust the 'shortlog_len' variable.
Remove 'merge_log_config' globally and localize it in
'cmd_fmt_merge_msg()' and 'cmd_merge()'. Set its value by passing it in
'fmt_merge_msg_config()' by passing its pointer to the function via the
callback parameter.
This change is part of an ongoing effort to eliminate global variables,
improve modularity and help libify the codebase.
Mentored-by: Christian Couder <christian.couder@gmail.com>
Mentored-by: Ghanshyam Thakkar <shyamthakkar001@gmail.com>
Signed-off-by: Ayush Chandekar <ayu.chandekar@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In diff.c, we output a trailing "\t" at the end of any filename that
contains a space:
case DIFF_SYMBOL_FILEPAIR_PLUS:
meta = diff_get_color_opt(o, DIFF_METAINFO);
reset = diff_get_color_opt(o, DIFF_RESET);
fprintf(o->file, "%s%s+++ %s%s%s\n", diff_line_prefix(o), meta,
line, reset,
strchr(line, ' ') ? "\t" : "");
break;
That is, for a file "foo.txt", `git diff --no-prefix` will emit:
+++ foo.txt
but for "foo bar.txt" it will emit:
+++ foo bar.txt\t
This in turn leads `git-jump` to produce a quickfix format like this:
foo bar.txt\t:1:1:contents
Because no "foo bar.txt\t" file actually exists on disk, opening it in
Vim will just land the user in an empty buffer.
This commit takes the simple approach of unconditionally stripping any
trailing tab. Consider the following three examples:
1. For file "foo", Git will emit "foo".
2. For file "foo bar", Git will emit "foo bar\t".
3. For file "foo\t", Git will emit "\"foo\t\"".
4. For file "foo bar\t", Git will emit "\"foo bar\t\"".
Before this commit, `git-jump` correctly handled only case "1".
After this commit, `git-jump` correctly handles cases "1" and "2". In
reality, these are the only cases people are going to run into with any
regularity, and the other two are rare edge cases, which probably aren't
worth the effort to support unless somebody actually complains about
them.
Signed-off-by: Greg Hurrell <greg.hurrell@datadoghq.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When traversing commits, a pathspec item can be used to limit the
traversal to commits that modify the specified paths. And the
commit-graph includes a Bloom filter to exclude commits that definitely
did not modify a given pathspec item. During commit traversal, the
Bloom filter can significantly improve performance. However, it is
disabled if the specified pathspec item contains wildcard characters
or magic signatures.
For performance reason, enable Bloom filter even if a pathspec item
contains wildcard characters by filtering only the non-wildcard part of
the pathspec item.
The function of pathspec magic signature is generally to narrow down
the path specified by the pathspecs. So, enable Bloom filter when
the magic signature is "top", "glob", "attr", "--depth" or "literal".
"exclude" is used to select paths other than the specified path, rather
than serving as a filtering function, so it cannot be used together with
the Bloom filter. Since Bloom filter is not case insensitive even in
case insensitive system (e.g. MacOS), it cannot be used together with
"icase" magic.
With this optimization, we get some improvements for pathspecs with
wildcards or magic signatures. First, in the Git repository we see these
modest results:
git log -100 -- "t/*"
Benchmark 1: new
Time (mean ± σ): 20.4 ms ± 0.6 ms
Range (min … max): 19.3 ms … 24.4 ms
Benchmark 2: old
Time (mean ± σ): 23.4 ms ± 0.5 ms
Range (min … max): 22.5 ms … 24.7 ms
git log -100 -- ":(top)t"
Benchmark 1: new
Time (mean ± σ): 16.2 ms ± 0.4 ms
Range (min … max): 15.3 ms … 17.2 ms
Benchmark 2: old
Time (mean ± σ): 18.6 ms ± 0.5 ms
Range (min … max): 17.6 ms … 20.4 ms
But in a larger repo, such as the LLVM project repo below, we get even
better results:
git log -100 -- "libc/*"
Benchmark 1: new
Time (mean ± σ): 16.0 ms ± 0.6 ms
Range (min … max): 14.7 ms … 17.8 ms
Benchmark 2: old
Time (mean ± σ): 26.7 ms ± 0.5 ms
Range (min … max): 25.4 ms … 27.8 ms
git log -100 -- ":(top)libc"
Benchmark 1: new
Time (mean ± σ): 15.6 ms ± 0.6 ms
Range (min … max): 14.4 ms … 17.7 ms
Benchmark 2: old
Time (mean ± σ): 19.6 ms ± 0.5 ms
Range (min … max): 18.6 ms … 20.6 ms
Helped-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Lidong Yan <yldhome2d2@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The act of giving "--no-index" tells Git to pretend that the current
directory is not under control of any Git index or repository, so
even when you happen to be in a Git controlled working tree, where
in that working tree should not matter.
But the start-up sequence tries to discover the top of the working
tree and chdir(2)'s there, even before Git passes control to the
subcommand being run. When diff_no_index() starts running, it
starts at a wrong (from the end-user's point of view who thinks
"git diff --no-index" is merely a better version of GNU diff)
directory, and the original directory the user started the command
is at "prefix".
Because the paths given from argv[] have already been adjusted to
account for this path shuffling by prepending the prefix, and
showing the resulting path by stripping the prefix, the effect of
these nonsense operations (nonsense in the context of "--no-index",
that is) is usually not observable.
Except for special cases like "-", where it is not preprocessed by
prepending the prefix.
Instead of papering over by adding more special cases only to cater
to the no-index codepath in the generic code, drive the diff
machinery more faithfully to what is going on. If the user started
"git diff --no-index" in directory X/Y/Z in a working tree
controlled by Git, and the start up sequence of Git chdir(2)'ed up
to directory X and left Y/Z in the prefix, revert the effect of the
start up sequence by chdir'ing back to Y/Z and emptying the prefix.
Reported-by: Gregoire Geis <opensource@gregoirege.is>
Helped-by: Ramsay Jones <ramsay@ramsayjones.plus.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
3a54f5bd5d (merge/pull: add the "--compact-summary" option, 2025-06-12)
added the option --compact-summary to both merge and pull. It takes no
no argument, but for merge it got an argument help string. Remove it,
since it is unnecessary.
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
dabecb9db2 (for-each-ref: introduce a '--start-after' option,
2025-07-15) added the option --start-after and referred to its argument
as "marker" in documentation and usage string, but not in the option's
short help. Use "marker" there as well for consistency and brevity.
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Commit 6fd1106aa4 ("t3700: Skip a test with backslashes in pathspec",
2009-03-13) introduced the BSLASHPSPEC prerequisite. This prerequisite
allows tests to check for systems that can use backslashes in pathspecs
(e.g. to escape glob special characters). On windows (and cygwin), this
does not work because backslashes are used as directory separators, and
git eagerly converts them to forward slashes.
This test file uses the FUNNYNAMES prerequisite to skip this test file
on windows, despite not really being appropriate for this test, which
does not hold on cygwin. The FUNNYNAMES prerequisite is set when the
system can create files with embedded quotes ("), tabs or newlines in
the name. Since cygwin can satisfy FUNNYNAMES, but not BSLASHPSPEC, this
leads to test failures on cygwin.
In order to skip these tests on cygwin, replace the FUNNYNAMES prerequisite
with BSLASHPSPEC, so that this test file is skipped on both windows and
cygwin. While here, fix a few test titles as well.
Signed-off-by: Ramsay Jones <ramsay@ramsayjones.plus.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Git experts often check the help summary of a command to make sure they
spell options right when suggesting advice to colleagues. Further, they
might check hidden options when responding to queries about deprecated
options like git-rebase(1)'s "preserve merges" option. But some commands
don't support "--help-all" outside of a git directory. Running (for
example)
git rebase --help-all
outside a directory fails in "setup_git_directory", erroring with the
localized form of
fatal: not a git repository (or any of the parent directories): .git
Like 99caeed05d (Let 'git <command> -h' show usage without a git dir,
2009-11-09), we want to show the "--help-all" output even without a git
dir. Make "--help-all" where we expect "-h" to mean
"setup_git_directory_gently", and interpose early in the natural place
("show_usage_with_options_if_asked").
Do the same for usage callers with show_usage_if_asked.
The exception is merge-recursive, whose help block doesn't use newer
APIs.
Best-viewed-with: --ignore-space-change
Signed-off-by: D. Ben Knoble <ben.knoble+github@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When reading or editing calls to usage_with_options_internal, it is
difficult to tell what trailing "0, 0", "0, 1", "1, 0" arguments mean
(NB there is never a "1, 1" case).
Give the flags readable names to improve call-sites without changing any
behavior.
Signed-off-by: D. Ben Knoble <ben.knoble+github@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* ua/t1517-short-help-tests:
t5304: move `prune -h` test from t1517
t5200: move `update-server-info -h` test from t1517
t/t1517: automate `git subcmd -h` tests outside a repository
b27be108c89 (doc: git-log: convert log config to new doc format,
2025-07-07) intended to convert a paragraph describing the different
options for `log.decorate` into a description list. But the literal
block syntax was used by mistake.
Signed-off-by: Kristoffer Haugsbakk <code@khaugsbakk.name>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When using Meson, builds are out-of-tree and $GIT_BUILD_DIR gets set to
the path where the build output is landing. To locate the Documentation
sources, test 't0450' was using that path.
Modify test 't0450' to use `$GIT_SOURCE_DIR/Documentation` to find the
documentation sources.
Signed-off-by: Toon Claes <toon@iotcl.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
For better readability, convert the if-else ladder into a switch
statement.
Suggested-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Denton Liu <liu.denton@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When "git push <remote> <src>:<dst>" does not spell out the
destination side of the ref fully, and when <src> is not given
as a reference but an object name, the code tries to give advice
messages based on the type of that object.
The type is determined by calling odb_read_object_info() and
signalled by its return value. The code however reported a
programming error with BUG() when this function said that there
is no such object, which happens when the object name is given
as a full hexadecimal (if the object name is given as a partial
hexadecimal or an non-existing ref, the function would have died
without returning, so this BUG() wouldn't have triggered). This
is wrong. It is an ordinary end-user mistake to give an object
name that does not exist and treated as such.
An example of the error message produced is as follows:
error: The destination you provided is not a full refname (i.e.,
starting with "refs/"). We tried to guess what you meant by:
- Looking for a ref that matches 'branch' on the remote side.
- Checking if the <src> being pushed ('0000000000000000000000000000000000000001')
is a ref in "refs/{heads,tags}/". If so we add a corresponding
refs/{heads,tags}/ prefix on the remote side.
Neither worked, so we gave up. You must fully qualify the ref.
BUG: remote.c:1221: '0000000000000000000000000000000000000001' should be commit/tag/tree/blob, is '-1'
fatal: the remote end hung up unexpectedly
Aborted (core dumped)
Helped-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Denton Liu <liu.denton@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This style with the empty lines in test bodies was from when the test
suite was being developed. Remove the empty lines to match the modern
test style.
Signed-off-by: Denton Liu <liu.denton@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In git-diff, options like `-w` and `-I<regex>`, two files are considered
equivalent under the specified "ignore" rules, even when they are not
bit-for-bit identical. For options like `--raw`, `--name-status`,
and `--name-only`, git-diff deliberately compares only the SHA values
to determine whether two files are equivalent, for performance reasons.
As a result, a file shown in `git diff --name-status` may not appear
in `git diff --patch`.
To quickly determine whether two files are equivalent, add a helper
function diff_flush_patch_quietly() in diff.c. Add `.dry_run` field in
`struct diff_options`. When `.dry_run` is true, builtin_diff() returns
immediately upon finding any change. Call diff_flush_patch_quietly()
to determine if we should flush `--raw`, `--name-only` or `--name-status`
output.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Lidong Yan <yldhome2d2@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
t1517 is now focused on testing subcommands outside a repository.
Move the in-repo `-h` test for `prune` to t5304, which covers
this command.
Suggested-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Usman Akinyemi <usmanakinyemi202@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
t1517 is now focused on testing subcommands outside a repository.
Move the in-repo `-h` test for `update-server-info` to t5200,
which covers this command.
Suggested-by: Patrick Steinhardt <ps@pks.im>
Helped-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Usman Akinyemi <usmanakinyemi202@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Replace manual `-h` tests with a loop over all subcommands using
`git --list-cmds=main`. This ensures consistent coverage of `-h`
behavior outside a repo and future-proofs the test by covering
new commands automatically.
Known exceptions are skipped or marked as expected failures.
Suggested-by: Patrick Steinhardt <ps@pks.im>
Helped-by: Junio C Hamano <gitster@pobox.com>
Helped-by: D. Ben Knoble <ben.knoble+github@gmail.com>
Signed-off-by: Usman Akinyemi <usmanakinyemi202@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When you are doing a tree-diff, there are basically two options: do not
recurse into subtrees at all, or recurse indefinitely. While most
callers would want to always recurse and see full pathnames, some may
want the efficiency of looking only at a particular level of the tree.
This is currently easy to do for the top-level (just turn off
recursion), but you cannot say "show me what changed in subdir/, but do
not recurse".
This patch adds a max-depth parameter which is measured from the closest
pathspec match, so that you can do:
git log --raw --max-depth=1 -- a/b/c
and see the raw output for a/b/c/, but not those of a/b/c/d/
(instead of the raw output you would see for a/b/c/d).
Co-authored-by: Toon Claes <toon@iotcl.com>
Signed-off-by: Toon Claes <toon@iotcl.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The within_depth() function is used to check whether pathspecs limited
by a max-depth parameter are acceptable. It takes a path to check, a
maximum depth, and a "base" depth. It counts the components in the
path (by counting slashes), adds them to the base, and compares them to
the maximum.
However, if the base does not have any slashes at all, we always return
`true`. If the base depth is 0, then this is correct; no matter what the
maximum is, we are always within it. However, if the base depth is
greater than 0, then we might return an erroneous result.
This ends up not causing any user-visible bugs in the current code. The
call sites in dir.c always pass a base depth of 0, so are unaffected.
But tree_entry_interesting() uses this function differently: it will
pass the prefix of the current entry, along with a `1` if the entry is a
directory, in essence checking whether items inside the entry would be
of interest. It turns out not to make a difference in behavior, but the
reasoning is complex.
Given a tree like:
file
a/file
a/b/file
walking the tree and calling tree_entry_interesting() will yield the
following results:
(with max_depth=0):
file: yes
a: yes
a/file: no
a/b: no
(with max_depth=1):
file: yes
a: yes
a/file: yes
a/b: no
So we have inconsistent behavior in considering directories interesting.
If they are at the edge of our depth but at the root, we will recurse
into them, but then find all of their entries uninteresting (e.g., in
the first case, we will look at "a" but find "a/*" uninteresting). But
if they are at the edge of our depth and not at the root, then we will
not recurse (in the second example, we do not even bother entering
"a/b").
This turns out not to matter because the only caller which uses
max-depth pathspecs is cmd_grep(), which only cares about blob entries.
From its perspective, it is exactly the same to not recurse into a
subtree, or to recurse and find that it contains no matching entries.
Not recursing is merely an optimization.
It is debatable whether tree_entry_interesting() should consider such an
entry interesting. The only caller does not care if it sees the tree
itself, and can benefit from the optimization. But if we add a
"max-depth" limiter to regular diffs, then a diff with
DIFF_OPT_TREE_IN_RECURSIVE would probably want to show the tree itself,
but not what it contains.
This patch just fixes within_depth(), which means we consider such
entries uninteresting (and makes the current caller happy). If we want
to change that in the future, then this fix is still the correct first
step, as the current behavior is simply inconsistent.
This has the effect the function tree_entry_interesting() now behaves
like following on the first example:
(with max_depth=0):
file: yes
a: no
a/file: no
a/b: no
Meaning we won't step in "a/" no more to realize all "a/*" entries are
uninterested, but we stop at the tree entry itself.
Based-on-patch-by: Jeff King <peff@peff.net>
Signed-off-by: Toon Claes <toon@iotcl.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In commit 25e5e2bf85 (combine-diff: support format_callback,
2011-08-19), the combined-diff code learned how to make a multi-sourced
`diff_filepair` to pass to a diff callback. When we create each
filepair, we do not bother to fill in many of the fields, because they
would make no sense (e.g. there can be no rename score or broken_pair
flag because we do not go through the diffcore filters). However, we did
not even bother to zero them, leading to random values. Let's make sure
everything is blank with xcalloc(), just as the regular diff code does.
We would potentially want to set the `status` flag to
something non-zero, but it is not clear to what. Possibly a
new DIFF_STATUS_COMBINED would make sense, as this is not
strictly a modification, nor does it fit any other category.
Since it is not yet clear what callers would want, this
patch simply leaves it as `0`, the same empty flag that is
seen when `diffcore_std` is not used at all.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Toon Claes <toon@iotcl.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
At GitHub, we've got a real-world repository that has been triggering
failures of the form:
git: merge-ort.c:3007: process_renames: Assertion `newinfo && !newinfo->merged.clean' failed.
which comes from the line:
VERIFY_CI(newinfo);
Unfortunately, this one has been quite complex to unravel, and is a
bit complex to explain. So, I'm going to carefully try to explain each
relevant piece needed to understand the fix, then carefully build up
from a simple testcase to some of the relevant testcases.
== New special case we need to consider ==
Rename pairs in the diffcore machinery connect the source path of a
rename with the destination path of a rename. Since we have rename
pairs to consider on both sides of history since the merge base,
merging has to consider a few special cases of possible overlap:
A) two rename pairs having the same target path
B) two rename pairs having the same source path
C) the source path of one rename pair being the target path of a
different rename pair
Some of these came up often enough that we gave them names:
A) a rename/rename(2to1) conflict (looks similar to an add/add conflict)
B) a rename/rename(1to2) conflict, which represents the same path being
renamed differently on the two sides of history
C) not yet named
merge-ort is well-prepared to handle cases (A) and (B), as was
merge-recursive (which was merge-ort's predecessor). Case (C) was
briefly considered during the years of merge-recursive maintenance,
but the full extent of support it got was a few FIXME/TODO comments
littered around the code highlighting some of the places that would
probably need to be fixed to support it. When I wrote merge-ort I
ignored case (C) entirely, since I believed that case (C) was only
possible if we were to support break detection during merges. Not
only had break detection never been supported by any merge algorithm,
I thought break detection wasn't worth the effort to support in a
merge algorithm. However, it turns out that case (C) can be triggered
without break detection, if there's enough moving pieces.
Before I dive into how to trigger case (C) with directory renames plus
other renames, it might be helpful to use a simpler example with break
detection first. And before we get to that it may help to explain
some more basics of handling renames in the merge algorithm. So, let
me first backup and provide a quick refresher on each of
* handling renames
* what break detection would mean, if supported in merging
* handling directory renames
From there, I'll build up from a basic directory rename detection case
to one that triggers a failure currently.
== Handling renames ==
In the merge machinery when we have a rename of a path A -> B,
processing that rename needs to remove path A, and make sure that path B
has the relevant information. Note that if the content was also
modified on both sides, this may mean that we have 3 different stages
that need to be stored at path B instead of having some stored at path
A.
Having all stages stored at path B makes it much easier for users to
investigate and resolve the content conflict associated with a renamed
path. For example:
* "git status" doesn't have to figure out how to list paths A & B and
attempt to connect them for users; it can just list path B.
* Users can use "git ls-files -u B" (instead of trying to find the
previous name of the file so they can list both, i.e. "git ls-files
-u A B")
* Users can resolve via "git add B" (without needing to "git rm A")
== What break detection would mean ==
If break detection were supported, we might have cases where A -> B
*and* C -> A, meaning that both rename pairs might believe they need to
update A. In particular, the processing of A -> B would need to be
careful to not clear out all stages of A and mark it resolved, while
both renames would need to figure out which stages of A belong with A
and which belong with B, so that both paths have the right stages
associated with them.
merge-ort (like merge-recursive before it) makes no attempt to handle
break detection; it runs with break detection turned off. It would
need to be retrofitted to handle such cases.
== Directory rename detection ==
If one side of history renames directory D/ -> E/, and the other side of
history adds new files to D/, then directory rename detection notices
and suggests moving those new files to E/. A similar thing is done for
paths renamed into D/, causing them to be transitively renamed into E/.
The default in the merge machinery is to report a conflict whenever a
directory rename might modify the location of a path, so that users can
decide whether they wanted the original path or the
directory-rename-induced location. However, that means the default
codepath still runs through all the directory rename detection logic, it
just supplements it with providing conflict notices when it is done.
== Building up increasingly complex testcases ==
I'll start with a really simple directory rename example, and then
slowly add twists that explain new pieces until we get to the
problematic cases:
=== Testcase 1 ===
Let's start with a concrete example, where particular files/directories of
interest that exist or are changed on each side are called out:
Original: <nothing of note>
our side: rename B/file -> C/file
their side: rename C/ -> A/
For this case, we'd expect to see the original B/file appear not at
C/file but at A/file.
(We would also expect a conflict notice that the user will want to
choose between C/file and A/file, but I'm going to ignore conflict
notices from here on by assuming merge.directoryRenames is set to
`true` rather than `conflict`; the only difference that assumption
makes is whether that makes the merge be considered to be conflicted
and whether it prints a conflict notice; what is written to the index
or working directory is unchanged.)
=== Testcase 2 ===
Modify testcase 1 by having A/file exist from the start:
Original: A/file exists
our side: rename B/file -> C/file
their side: rename C/ -> A/
In such a case, to avoid user confusion at what looks kind of like an
add/add conflict (even though the original path at A/file was not added
by either side of the merge), we turn off directory rename detection for
this path and print a "in the way" warning to the user:
CONFLICT (implicit dir rename): Existing file/dir ... in the way ...
The testcases in section 5 of t6423 explore these in more detail.
=== Testcase 3 ===
Let's modify testcase 1 in a slightly different way: have A/file be
added by their side rather than it already existing.
Original: <nothing of note>
our side: rename B/file -> C/file
their side: rename C/ -> A/
add A/file
In this case, the directory rename detection basically transforms our
side's original B/file -> C/file into a B/file -> A/file, and so we
get a rename/add conflict, with one version of A/file coming from the
renamed file, and another coming from the new A/file, each stored as
stages 2 and 3 in conflicts. This kind of add/add conflict is perhaps
slightly more complex than a regular add/add conflict, but with the
printed messages it makes sense where it came from and we have
different stages of the file to work with to resolve the conflict.
=== Testcase 4 ===
Let's do something similar to testcase 3, but have the opposite side of
history add A/file:
Original: <nothing of note>
our side: rename B/file -> C/file
add A/file
their side: rename C/ -> A/
Now if we allow directory rename detection to modify C/file to A/file,
then we also get a rename/add conflict, but in this case we'd need both
higher order stages being recorded on side 2, which makes no sense. The
index can't store multiple stage 2 entries, and even if we could, it
would probably be confusing for users to work with. So, similar to what
we do when there was an A/file in the original version, we simply turn
off directory rename detection for cases like this and provide the "in
the way" CONFLICT notice to the user.
=== Testcase 5 ===
We're slowly getting closer. Let's mix it up by having A/file exist at
the beginning but not exist on their side:
original: A/file exists
our side: rename B/file -> C/file
their side: rename C/ -> A/
rename A/file -> D/file
For this case, you could say that since A/file -> D/file, it's no longer
in the way of C/file being moved by directory rename detection to
A/file. But that would give us a case where A/file is both the source
and the target of a rename, similar to break detection, which the code
isn't currently equipped to handle.
This is not yet the case that causes current failures; to the current
code, this kind of looks like testcase 4 in that A/file is in the way
on our side (since A/file was in the original and was umodified by our
side). So, it results in a "in the way" notification with directory
rename detection being turned off for A/file so that B/file ends up at
C/file.
Perhaps the resolution could be improved in the future, but our "in
the way" checks prevented such problems by noticing that A/file exists
on our side and thus turns off directory rename detection from
affecting C/file's location. So, while the merge result could be
perhaps improved, the fact that this is currently handled by giving
the user an "in the way" message gives the user a chance to resolve
and prevents the code from tripping itself up.
=== Testcase 6 ===
Let's modify testcase 5 a bit more, to also delete A/file on our side:
original: A/file exists
our side: rename B/file -> C/file
delete A/file
their side: rename C/ -> A/
rename A/file -> D/file
Now the "in the way" logic doesn't detect that there's an A/file in
the way (neither side has an A/file anymore), so it's fine to
transitively rename C/file further to A/file...except that we end up
with A/file being both the source of one rename, and the target of a
different rename. Each rename pair tries to handle the resolution of
the source and target paths of its own rename. But when we go to
process the second rename pair in process_renames(), we do not expect
either the source or the destination to be marked as handled already;
so, when we hit the sanity checks that these are not handled:
VERIFY_CI(oldinfo);
VERIFY_CI(newinfo);
then one of these is going to throw an assertion failure since the
previous rename pair already marked both of its paths as handled.
This will give us an error of the form:
git: merge-ort.c:3007: process_renames: Assertion `newinfo && !newinfo->merged.clean' failed.
This is the failure we're currently triggering, and it fundamentally
depends on:
* a path existing in the original
* that original path being removed or renamed on *both* sides
* some kind of directory rename moving some *other* path into that
original path
This was added as testcase 12q in t6423.
=== Testcase 7 ===
Bonus bug found while investigating!
Let's go back to the comparison between testcases 2 & 3, and set up a
file present on their side that we need to consider:
Original: A/file exists
our side: rename B/file -> C/file
rename A/file -> D/file
their side: rename C/ -> A/
Here, there is no A/file in the way on our side like testcase 4.
There is an A/file present on their side like testcase 3, which was an
add/add conflict, but that's associated with the file be renamed to
D/file. So, that really shouldn't be an add/add conflict because we
instead want all modes of the original A/file to be transported to
D/file.
Unfortunately, the current code kind of treats it like an add/add
conflict instead...but even worse. There is also a valid mode for
A/file in the original, which normally goes to stage 1. However, an
add/add conflict should be represented in the index with no mode at
stage 1 (for the original side), only modes at stages 2 and 3 (for our
and their side), so for an add/add we'd expect that mode for A/file in
the original version to be cleared out (or be transported to D/file).
Unfortunately, the code currently leaves not only the stage 3 entry
for A/file intact, it also leaves the stage 1 entry for A/file. This
results in `git ls-files -u A/file` output of the form:
100644 d00491fd7e5bb6fa28c517a0bb32b8b506539d4d 1 A/file
100644 0cfbf08886fca9a91cb753ec8734c84fcbe52c9f 2 A/file
100644 d00491fd7e5bb6fa28c517a0bb32b8b506539d4d 3 A/file
This would likely cause users to believe this isn't an add/add
conflict; rather, this would lead them to believe that A/file was only
modified on our side and that therefore it should not have been a
conflict in the first place. And while resolving the conflict in
favor of our side is the correct resolution (because stages 1 and 3
should have been cleared out in the first place), this is certainly
likely to cause confusion for anyone attempting to investigate why
this path was marked as conflicted.
This was added as testcase 12p in t6423.
== Attempted solutions that I discarded ==
1) For each side of history, create a strset of the sources of each
rename on the other side of history. Then when using directory
renames to modify existing renames, verify that we aren't renaming
to a source of another rename.
Unfortunately, the "relevant renames" optimization in merge-ort
means we often don't detect renames -- we just see a delete and an
add -- which is easy to forget and makes debugging testcases harder,
but it also turns out that this solution in insufficient to solve
the related problems in the area (more on that below).
2) Modify the code to be aware of the possibility of renaming to
the source of another side's rename, and make all the conflict
resolution logic for each case (including existing
rename/rename(2to1) and rename/rename(1to2) cases) handle the
additional complexity. It turns out there was much more code to
audit than I wanted, for a really niche case. I didn't like how
many changes were needed, and aborted.
== Solution ==
We do not want the stages of unrelated files appearing at the same path
in the index except when dealing with an add/add conflict. While we
previously handled this for stages 2 & 3, we also need to worry about
stage 1. So check for a stage 1 index entry being in the way of a
directory rename.
However, if we can detect that the stage 1 index entry is actually from
a related file due to a directory-rename-causes-rename-to-self
situation, then we can allow the stage 1 entry to remain.
From this wording, you may note that it's not just rename cases that
are a problem; bugs could be triggered with directory renames vs simple
adds. That leads us to...
== Testcases 8+ ==
Another bonus bug, found via understanding our final solutions (and the
failure of our first attempted solution)!
Let's tweak testcase 7 a bit:
Original: A/file exists
our side: delete A/file
add -> C/file
their side: delete A/file
rename C/ -> A/
Here, there doesn't seem to be a big problem. Sure C/file gets modified
via the directory rename of C/ -> A/ so that it becomes A/file, but
there's no file in the way, right? Actually, here we have a problem
that the stage 1 entry of A/file would be combined with the stage 2
entry of C/file, and make it look like a modify/delete conflict.
Perhaps there is some extra checking that could be added to the code to
make it attempt to clear out the stage 1 entry of A/file, but the
various rename-to-self-via-directory-rename testcases make that a bit
more difficult. For now, it's easier to just treat this as a
path-in-the-way situation and not allow the directory rename to modify
C/file.
That sounds all well and good, but it does have an interesting side
effect. Due to the "relevant renames" optimizations in merge-ort (i.e.
only detect the renames you need), 100% renames whose files weren't
modified on the other side often go undetected. This means that if we
modify this testcase slightly to:
Original: A/file exists
our side: A/file -> C/file
their side: rename C/ -> A/
Then although this looks like where the directory rename just moves
C/file back to A/file and there's no problem, we may not detect the
A/file -> C/file rename. Instead it will look like a deletion of A/file
and an addition of C/file. The directory rename then appears to be
moving C/file to A/file, which is on top of an "unrelated" file (or at
least a file it doesn't know is related). So, we will report
path-in-the-way conflicts now in cases where we didn't before. That's
better than silently and accidentally combining stages of unrelated
files and making them look like a modify/delete; users can investigate
the reported conflict and simply resolve it.
This means we tweak the expected solution for testcases 12i, 12j, and
12k. (Those three tests are basically the same test repeated three
times, but I was worried when I added those that subtle differences in
parent/child, sibling/sibling, and toplevel directories might mess up
how rename-to-self testcases actually get handled.)
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We have multiple bugs here -- accidental silent file deletion,
accidental silent file retention for files that should be deleted,
and incorrect number of entries left in the index.
The series merged at commit d3b88be1b450 (Merge branch
'en/merge-dir-rename-corner-case-fix', 2021-07-16) introduced testcase
12i-12k in t6423 which checked for rename-to-self cases, and fixed bugs
that merge-ort and merge-recursive had with these testcases. At the
time, I noted that merge-ort had one bug for these cases, while
merge-recursive had two. It turns out that merge-ort did in fact have
another bug, but the "relevant renames" optimizations were masking it.
If we modify testcase 12i from t6423 to modify the file in the commit
that renames it (but only modify it enough that it can still be detected
as a rename), then we can trigger silent deletion of the file.
Tweak testcase 12i slightly to make the file in question have more than
one line in it. This leaves the testcase intact other than changing the
initial contents of this one file. The purpose of this tweak is to
minimize the changes between this testcase and a new one that we want to
add. Then duplicate testcase 12i as 12i2, changing it so that it adds a
single line to the file in question when it is renamed; testcase 12i2
then serves as a testcase for this merge-ort bug that I previously
overlooked.
Further, commit 98a1a00d5301 (t6423: add a testcase causing a failed
assertion in process_renames, 2025-03-06), fixed an issue with
rename-to-self but added a new testcase, 12n, that only checked for
whether the merge ran to completion. A few commits ago, we modified
this test to check for the number of entries in the index -- but noted
that the number was wrong. And we also noted a
silently-keep-instead-of-delete bug at the same time in the new testcase
12n2.
In summary, we have the following bugs with rename-to-self cases:
* silent deletion of file expected to be kept (t6423 testcase 12i2)
* silent retention of file expected to be removed (t6423 testcase 12n2)
* wrong number of extries left in the index (t6423 testcase 12n)
All of these bugs arise because in a rename-to-self case, when we have a
rename A->B, both A and B name the same file. The code in
process_renames() assumes A & B are different, and tries to move the
higher order stages and file contents so that they are associated just
with the new path, but the assumptions of A & B being different can
cause A to be deleted when it's not supposed to be or mark B as resolved
and kept in place when it's supposed to be deleted. Since A & B are
already the same path in the rename-to-self case, simply skip the steps
in process_renames() for such files to fix these bugs.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Because merge-ort is dealing with potentially all the pathnames in the
repository, it sometimes needs to do an awful lot of string comparisons.
Because of this, struct merge_options_internal's path member was
envisioned from the beginning to contain an interned value for every
path in order to allow us to compare strings via pointer comparison
instead of using strcmp. See
* 5b59c3db059d (merge-ort: setup basic internal data structures,
2020-12-13)
* f591c4724615 (merge-ort: copy and adapt merge_3way() from
merge-recursive.c, 2021-01-01)
for some of the early comments.
However, the original comment was slightly misleading when it switched
from mentioning paths to only mentioning directories. Fix that, and
while at it also point to an example in the code which applies the extra
needed care to permit the pointer comparison optimization.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Commit 806f83287f8d (t6423: test directory renames causing
rename-to-self, 2021-06-30) introduced testcase 12i-12k but omitted
staging one of the files and copy-pasted that mistake to the other
tests. This means the merge runs with an unstaged change, even though
that isn't related to what is being tested and makes the test look more
complicated than it is.
The cover letter for the series associated with the above commit (see
Message-ID: pull.1039.git.git.1624727121.gitgitgadget@gmail.com) noted
that these testcases triggered two bugs in merge-recursive but only one
in merge-ort; in merge-recursive these testcases also triggered a
silent deletion of the file in question when it shouldn't be deleted.
What I didn't realize at the time was that the deletion bug in merge-ort
was merely being sidestepped by the "relevant renames" optimization but
can actually be triggered. A subsequent commit will deal with that
additional bug, but it was complicated by the mistaken forgotten
staging, so this commit first fixes that issue.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When commit 98a1a00d5301 (t6423: add a testcase causing a failed
assertion in process_renames, 2025-03-06) was added, I tweaked
the commit message, and moved the test into t6423. However, that
still left two other things missing that made this test unlike the
others in the same testfile:
* It didn't have an English description of the test setup like
all other tests in t6423
* It didn't check that the right number of files were present at
the end
The former issue is a minor detail that isn't that critical, but the
latter feels more important. If it had been done, I might have noticed
another bug. In particular, this testcase involves
Side A: rename world -> tools/world
and
Side B: rename tools/ -> <the toplevel>
Side B: remove world
The tools/ -> <toplevel> rename turns the world -> tools/world rename
into world -> world, i.e. a rename-to-self case. But, it's a path
conflict because merge.directoryRenames defaults to false. There's
no content conflict because Side A didn't modify world, so we should
just take the content of world from Side B -- i.e. delete it. So, we
have a conflict on the path, but not on its content. We could consider
letting the content trump since it is unconflicted, but if we are going
to leave a conflict, it should certainly represent that 'world' existed
both in the base version and on Side A. Currently it doesn't.
Add a description of this test, add some checking of the number of
entries in the index at the end of the merge, and mark the test as
expecting to fail for now. A subsequent commit will fix this bug.
While at it, I found another related bug from a nearly identical setup
but setting merge.directoryRenames=true. Copy testcase 12n into 12n2,
changing it to use merge instead of cherry-pick, and turn on directory
renames for this test. In this case, since there is no content conflict
and no path conflict, it should be okay to delete the file.
Unfortunately, the code resolves without conflict but silently leaves
world despite the fact it should be deleted. It might also be okay if
the code spuriously thought there was a modify/delete conflict here;
that would at least notify users to look closer and then when they
notice there was no change since the base version, they can easily
resolve. A conflict notice is much better than silently providing the
wrong resolution. Cover this with the 12n2 testcase, which for now is
marked as expecting to fail as well.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
check_for_directory_rename() had a weirdly coded check for whether a
strmap contained a certain key. Replace the temporary variable and call
to strmap_get_entry() with the more natural strmap_contains() call.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In commit 919df3195553 (Collect merge-related tests to t64xx,
2020-08-10), merge related tests were moved from t60xx to t64xx. Some
comments in merge-ort relating to some tricky code referenced specific
testcases within certain testfiles for additional information, but
referred to their historical testfile names; update the testfile names
to mention their modern location.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Improve wording and fix typos for a couple entries part of the Git 2.51
release notes.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The deflate codepath in "git archive --format=zip" had a
longstanding bug coming from misuse of zlib API, which has been
corrected.
* jt/archive-zip-deflate-fix:
archive: flush deflate stream until Z_STREAM_END
Squelch false-positive compiler warning.
* dl/squelch-maybe-uninitialized:
t/unit-tests/clar: fix -Wmaybe-uninitialized with -Og
remote: bail early from set_head() if missing remote name
When renaming a remote we also need to rename all references
accordingly. But while we only need to rename references that are
contained in the "refs/remotes/$OLDNAME/" namespace, we end up using
`refs_for_each_rawref()` that iterates through _all_ references. We know
to exit early in the callback in case we see an irrelevant reference,
but ultimately this is still a waste of compute as we knowingly iterate
through references that we won't ever care about.
Improve this by using `refs_for_each_rawref_in()`, which knows to only
iterate through (potentially broken) references in a given prefix.
The following benchmark renames a remote with a single reference in a
repository that has 100k unrelated references. This shows a sizeable
improvement with the "files" backend:
Benchmark 1: rename remote (refformat = files, revision = HEAD~)
Time (mean ± σ): 42.6 ms ± 0.9 ms [User: 29.1 ms, System: 8.4 ms]
Range (min … max): 40.1 ms … 43.3 ms 10 runs
Benchmark 2: rename remote (refformat = files, revision = HEAD)
Time (mean ± σ): 31.7 ms ± 4.0 ms [User: 19.6 ms, System: 6.9 ms]
Range (min … max): 27.1 ms … 36.0 ms 10 runs
Summary
rename remote (refformat = files, revision = HEAD) ran
1.35 ± 0.17 times faster than rename remote (refformat = files, revision = HEAD~)
The "reftable" backend shows roughly the same absolute improvement, but
given that it's already significantly faster than the "files" backend
this translates to a much larger relative improvement:
Benchmark 1: rename remote (refformat = reftable, revision = HEAD~)
Time (mean ± σ): 18.2 ms ± 0.5 ms [User: 12.7 ms, System: 3.0 ms]
Range (min … max): 17.3 ms … 21.4 ms 110 runs
Benchmark 2: rename remote (refformat = reftable, revision = HEAD)
Time (mean ± σ): 8.8 ms ± 0.5 ms [User: 3.8 ms, System: 2.9 ms]
Range (min … max): 7.5 ms … 9.9 ms 167 runs
Summary
rename remote (refformat = reftable, revision = HEAD) ran
2.07 ± 0.12 times faster than rename remote (refformat = reftable, revision = HEAD~)
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
It was recently reported [1] that renaming a remote that has dangling
symrefs is broken. This issue can be trivially reproduced:
$ git init repo
Initialized empty Git repository in /tmp/repo/.git/
$ cd repo/
$ git remote add origin /dev/null
$ git symbolic-ref refs/remotes/origin/HEAD refs/remotes/origin/master
$ git remote rename origin renamed
$ git symbolic-ref refs/remotes/origin/HEAD
refs/remotes/origin/master
$ git symbolic-ref refs/remotes/renamed/HEAD
fatal: ref refs/remotes/renamed/HEAD is not a symbolic ref
As one can see, the "HEAD" reference did not get renamed but stays in
the same place. There are two issues here:
- We use `refs_resolve_ref_unsafe()` to resolve references, but we
don't pass the `RESOLVE_REF_NO_RECURSE` flag. Consequently, if the
reference does not resolve, the function will fail and we thus
ignore this branch.
- We use `refs_for_each_ref()` to iterate through the old remote's
references, but that function ignores broken references.
Both of these issues are easy to fix. But having a closer look at the
logic that renames remote references surfaces that it leaves a lot to be
desired overall.
The problem is that we're using O(|refs| + |symrefs| * 2) many reference
transactions to perform the renames. We first delete all symrefs, then
individually rename every direct reference and finally we recreate the
symrefs. On the one hand this isn't even remotely an atomic operation,
so if we hit any error we'll already have deleted all references.
But more importantly it is also extremely inefficient. The number of
transactions for symrefs doesn't really bother us too much, as there
should generally only be a single symref anyway ("HEAD"). But the
renames are very expensive:
- For the "reftable" backend we perform auto-compaction after every
single rename, which does add up.
- For the "files" backend we potentially have to rewrite the
"packed-refs" file on every single rename in case they are packed.
The consequence here is quadratic runtime performance. Renaming a
100k references takes hours to complete.
Refactor the code to use a single transaction to perform all the
reference updates atomically, which speeds up the transaction quite
significantly:
Benchmark 1: rename remote (refformat = files, revision = HEAD~)
Time (mean ± σ): 238.770 s ± 13.857 s [User: 91.473 s, System: 143.793 s]
Range (min … max): 204.863 s … 247.699 s 10 runs
Benchmark 2: rename remote (refformat = files, revision = HEAD)
Time (mean ± σ): 2.103 s ± 0.036 s [User: 0.360 s, System: 1.313 s]
Range (min … max): 2.011 s … 2.141 s 10 runs
Summary
rename remote (refformat = files, revision = HEAD) ran
113.53 ± 6.87 times faster than rename remote (refformat = files, revision = HEAD~)
For the "reftable" backend we see a significant speedup, as well, but
given that we don't have quadratic runtime behaviour there it's way less
extreme:
Benchmark 1: rename remote (refformat = reftable, revision = HEAD~)
Time (mean ± σ): 8.604 s ± 0.539 s [User: 4.985 s, System: 2.368 s]
Range (min … max): 7.880 s … 9.556 s 10 runs
Benchmark 2: rename remote (refformat = reftable, revision = HEAD)
Time (mean ± σ): 1.177 s ± 0.103 s [User: 0.446 s, System: 0.270 s]
Range (min … max): 1.023 s … 1.410 s 10 runs
Summary
rename remote (refformat = reftable, revision = HEAD) ran
7.31 ± 0.79 times faster than rename remote (refformat = reftable, revision = HEAD~)
There is one issue though with using atomic transactions: when nesting a
remote into itself it can happen that renamed references conflict with
the old referencse. For example, when we have a reference
"refs/remotes/origin/foo" and we rename "origin" to "origin/foo", then
we'll end up with an F/D conflict when we try to create the renamed
reference "refs/remotes/origin/foo/foo".
This situation is overall quite unlikely to happen: people tend to not
use nested remotes, and if they do they must at the same time also have
a conflicting refname. But the end result would be that the old remote
references stay intact whereas all the other parts of the repository
have been adjusted for the new remote name.
Address this by queueing and preparing the reference update before we
touch any other part of the repository. Like this we can make sure that
the reference update will go through before rewriting the configuration.
Otherwise, if the transaction fails to prepare we can gracefully abort
the whole operation without any changes having been performed in the
repository yet. Furthermore, we can detect the conflict and print some
helpful advice for how the user can resolve this situation. So overall,
the tradeoff is that:
- Reference transactions are now all-or-nothing. This is a significant
improvement over the previous state where we may have ended up with
partially-renamed references.
- Rewriting references is now significantly faster.
- We only rewrite the configuration in case we know that all
references can be updated.
- But we may refuse to rename a remote in case references conflict.
Overall this seems like an acceptable tradeoff.
While at it, fix the handling of symbolic/broken references by using
`refs_for_each_rawref()`. Add tests that cover both this reported issue
and tests that exercise nesting of remotes.
One thing to note: with this change we cannot provide a proper progress
monitor anymore as we queue the references into the transactions as we
iterate through them. Consequently, as we don't know yet how many refs
there are in total, we cannot report how many percent of the operation
is done anymore. But that's a small price to pay considering that you
now shouldn't need the progress monitor in most situations at all
anymore.
[1]: <CANrWfmQWa=RJnm7d3C7ogRX6Tth2eeuGwvwrNmzS2gr+eP0OpA@mail.gmail.com>
Reported-by: Han Jiang <jhcarl0814@gmail.com>
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When renaming a remote we may have to also rename remote refs in case
the refspec changes. Pull out this computation into a separate loop.
While that seems nonsensical right now, it'll help us in a subsequent
commit where we will prepare the reference transaction before we rewrite
the configuration.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Fix -Wsign-comparison warnings. All of the warnings we have are about
mismatches in signedness for loop counters. These are trivially fixable
by using the correct integer type.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When migrating reflog entries between two storage formats we have to do
so via two callback-driven functions:
- `migrate_one_reflog()` gets invoked via `refs_for_each_reflog()` to
first list all available reflogs.
- `migrate_one_reflog_entry()` gets invoked via
`refs_for_each_reflog_ent()` in `migrate_one_reflog()`.
Before the preceding commit we didn't have the refname available in
`migrate_one_reflog_entry()`, which made it necessary to have a separate
structure that we pass to the second callback so that we can propagate
the refname. Now that `refs_for_each_reflog_ent()` knows to pass the
refname to the callback though that indirection isn't necessary anymore.
There's one catch though: we do have an update index that is also stored
in the entry-specific callback data. This update index is required so
that we can tell the ref backend in which order it should persist the
reflog entries to disk.
But that purpose can be trivially achieved by just converting it into a
global counter that is used for all reflog entries, regardless of which
reference they are for. The ordering will remain the same as both the
update index and the refname is considered when sorting the entries.
Move the index into `struct migration_data` and drop the now-unused
`struct reflog_migration_data` to simplify the code a bit.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
With `refs_for_each_reflog_ent()` callers can iterate through all the
reflog entries for a given reference. The callback that is being invoked
for each such entry does not receive the name of the reference that we
are currently iterating through. This isn't really a limiting factor, as
callers can simply pass the name via the callback data.
But this layout sometimes does make for a bit of an awkward calling
pattern. One example: when iterating through all reflogs, and for each
reflog we iterate through all refnames, we have to do some extra book
keeping to track which reference name we are currently yielding reflog
entries for.
Change the signature of the callback function so that the reference name
of the reflog gets passed through to it. Adapt callers accordingly and
start using the new parameter in trivial cases. The next commit will
refactor the reference migration logic to make use of this parameter so
that we can simplify its logic a bit.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* ps/reflog-migrate-fixes:
refs: fix invalid old object IDs when migrating reflogs
refs: stop unsetting REF_HAVE_OLD for log-only updates
refs/files: detect race when generating reflog entry for HEAD
refs: fix identity for migrated reflogs
ident: fix type of string length parameter
builtin/reflog: implement subcommand to write new entries
refs: export `ref_transaction_update_reflog()`
builtin/reflog: improve grouping of subcommands
Documentation/git-reflog: convert to use synopsis type
4c063c82e9 (rebase -i: improve error message when picking merge,
2024-05-30) added advice texts for cases when a merge commit is
passed as argument of sequencer command that cannot operate with
a merge commit. However, it forgot about the 'drop' command, so
that in this case the BUG() in the default branch is reached.
Handle 'drop' like 'merge', i.e., permit it without a message.
Signed-off-by: Johannes Sixt <j6t@kdbg.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When migrating reflog entries between different storage formats we end
up with invalid old object IDs for the migrated entries: instead of
writing the old object ID of the to-be-migrated entry, we end up with
the all-zeroes object ID.
The root cause of this issue is that we don't know to use the old object
ID provided by the caller. Instead, we manually resolve the old object
ID by resolving the current value of its matching reference. But as that
reference does not yet exist in the target ref storage we always end up
resolving it to all-zeroes.
This issue got unnoticed as there is no user-facing command that would
even show the old object ID. While `git log -g` knows to show the new
object ID, we don't have any formatting directive to show the old object
ID.
Fix the bug by introducing a new flag `REF_LOG_USE_PROVIDED_OIDS`. If
set, backends are instructed to use the old and new object IDs provided
by the caller, without doing any manual resolving. Set this flag in
`ref_transaction_update_reflog()`.
Amend our tests in t1460-refs-migrate to use our test tool to read
reflog entries. This test tool prints out both old and new object ID of
each reflog entry, which fixes the test gap. Furthermore it also prints
the full identity used to write the reflog, which provides test coverage
for the previous commit in this patch series that fixed the identity for
migrated reflogs.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The `REF_HAVE_OLD` flag indicates whether a given ref update has its old
object ID set. If so, the value of that field is used to verify whether
the current state of the reference matches this expected state. It is
thus an important part of mitigating races with a concurrent process
that updates the same set of references.
When writing reflogs though we explicitly unset that flag. This is a
sensible thing to do: the old state of reflog entry updates may not
necessarily match the current on-disk state of its accompanying ref, but
it's only intended to signal what old object ID we want to write into
the new reflog entry. For example when migrating refs we end up writing
many reflog entries for a single reference, and most likely those reflog
entries will have many different old object IDs.
But unsetting this flag also removes a useful signal, namely that the
caller _did_ provide an old object ID for a given reflog entry. This
signal will become useful in a subsequent commit, where we add a new
flag that tells the transaction to use the provided old and new object
IDs to write a reflog entry. The `REF_HAVE_OLD` flag is then used as a
signal to verify that the caller really did provide an old object ID.
Stop unsetting the flag so that we can use it as this described signal
in a subsequent commit. Skip checking the old object ID for log-only
updates so that we don't expect it to match the current on-disk state.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When updating a reference that is being pointed to HEAD we don't only
write a reflog message for that particular reference, but also generate
one for HEAD. This logic is handled by `split_head_update()`, where we:
1. Verify that the condition actually triggered. This is done by
reading HEAD at the start of the transaction so that we can then
check whether a given reference update refers to its target.
2. Queue a new log-only update for HEAD in case it did.
But the logic is unfortunately not free of races, as we do not lock the
HEAD reference after we have read its target. This can lead to the
following two scenarios:
- HEAD gets concurrently updated to point to one of the references we
have already processed. This causes us not writing a reflog message
even though we should have done so.
- HEAD gets concurrently updated to no longer point to a reference
anymore that we have already processed. This causes us to write a
reflog message even though we should _not_ have done so.
Improve the situation by introducing a new `REF_LOG_VIA_SPLIT` flag that
is specific to the "files" backend. If set, we will double check that
the HEAD reference still points to the reference that we are creating
the reflog entry for after we have locked HEAD. Furthermore, instead of
manually resolving the old object ID of that entry, we now use the same
old state as for the parent update.
If we detect such a racy update we abort the transaction. This is a bit
heavy-handed: the user didn't even ask us to write a reflog update for
"HEAD", so it might be surprising if we abort the transaction. That
being said:
- Normal users wouldn't typically hit this case as we only hit the
relevant code when committing to a branch that is being pointed to
by "HEAD" directly. Commands like git-commit(1) typically commit to
"HEAD" itself though.
- Scripted users that use git-update-ref(1) and related plumbing
commands are unlikely to hit this case either, as they would have to
update the pointed-to-branch at the same as "HEAD" is being updated,
which is an exceedingly rare event.
The alternative would be to instead drop the log-only update completely,
but that would require more logic that is hard to verify without adding
infrastructure specific for such a test. So we rather do the pragmatic
thing and don't worry too much about an edge case that is very unlikely
to happen.
Unfortunately, this change only helps with the second race. We cannot
reliably plug the first race without locking the HEAD reference at the
start of the transaction. Locking HEAD unconditionally would effectively
serialize all writes though, and that doesn't seem like an option. Also,
double checking its value at the end of the transaction is not an option
either, as its target may have flip-flopped during the transaction.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When migrating reflog entries between different storage formats we must
reconstruct the identity of reflog entries. This is done by passing the
committer passed to the `migrate_one_reflog_entry()` callback function
to `fmt_ident()`.
This results in an invalid identity though: `fmt_ident()` expects the
caller to provide both name and mail of the author, but we pass the full
identity as mail. This leads to an identity like:
pks <Patrick Steinhardt ps@pks.im>
Fix the bug by splitting the identity line first. This allows us to
extract both the name and mail so that we can pass them to `fmt_ident()`
separately.
This commit does not yet add any tests as there is another bug in the
reflog migration that will be fixed in a subsequent commit. Once that
bug is fixed we'll make the reflog verification in t1450 stricter, and
that will catch both this bug here and the other bug.
Note that we also add two new `name` and `mail` string buffers to the
callback structures and splice them through to the callbacks. This is
done so that we can avoid allocating a new buffer every time we compute
the committer information.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The last parameter in `split_ident_line()` is the length of the line
passed in by the caller. As such, most callers pass in either the result
of `strlen()`, `struct strbuf::len` or a pointer diff, all of which
are expected to be positive numbers. Regardless of that, the function
accepts a signed integer, which is somewhat confusing.
Fix the function signature to instead accept a `size_t`.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
While we provide a couple of subcommands in git-reflog(1) to remove
reflog entries, we don't provide any to write new entries. Obviously
this is not an operation that really would be needed for many use cases
out there, or otherwise people would have complained that such a command
does not exist yet. But the introduction of the "reftable" backend
changes the picture a bit, as it is now basically impossible to manually
append a reflog entry if one wanted to do so due to the binary format.
Plug this gap by introducing a simple "write" subcommand. For now, all
this command does is to append a single new reflog entry with the given
object IDs and message to the reflog. More specifically, it is not yet
possible to:
- Write multiple reflog entries at once.
- Insert reflog entries at arbitrary indices.
- Specify the date of the reflog entry.
- Insert reflog entries that refer to nonexistent objects.
If required, those features can be added at a future point in time. For
now though, the new command aims to fulfill the most basic use cases
while being as strict as possible when it comes to verifying parameters.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In a subsequent commit we'll add another user that wants to write reflog
entries. This requires them to call `ref_transaction_update_reflog()`,
but that function is local to "refs.c".
Export the function to prepare for the change. While at it, drop the
`flags` field, as all callers are for now expected to use the same flags
anyway.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The way subcommands of git-reflog(1) are laid out does not make any
immediate sense. Reorder them such that read-only subcommands precede
writing commands for a bit more structure.
Furthermore, move the "expire" subcommand last. This prepares for a
subsequent change where we are about to introduce a new "write" command
to append reflog entries. Like this, the writing subcommands are ordered
such that those affecting a single reflog come before those spanning
across all reflogs.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
With 974cdca345c (doc: introduce a synopsis typesetting, 2024-09-24) we
have introduced a new synopsis type that simplifies the rules for
typesetting a command's synopsis. Convert the git-reflog(1)
documentation to use it.
While at it, convert the list of options to use backticks. This is done
to appease an upcoming new linter that mandates the use of backticks
when using the synopsis type.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The case where a new submodule takes a path where used to be a
completely different subproject is now dealt a bit better than
before.
* kj/renamed-submodule:
fixup! submodule: skip redundant active entries when pattern covers path
fixup! submodule: prevent overwriting .gitmodules on path reuse
submodule: skip redundant active entries when pattern covers path
submodule: prevent overwriting .gitmodules on path reuse
"git -c alias.foo=bar foo -h baz" reported "'foo' is aliased to
'bar'" and then went on to run "git foo -h baz", which was
unexpected. Tighten the rule so that alias expansion is reported
only when "-h" is the sole option.
* rs/tighten-alias-help:
git: show alias info only with lone -h
Reduce implicit assumption and dependence on the_repository in the
object-file subsystem.
* ps/object-file-wo-the-repository:
object-file: get rid of `the_repository` in index-related functions
object-file: get rid of `the_repository` in `force_object_loose()`
object-file: get rid of `the_repository` in `read_loose_object()`
object-file: get rid of `the_repository` in loose object iterators
object-file: remove declaration for `for_each_file_in_obj_subdir()`
object-file: inline `for_each_loose_file_in_objdir_buf()`
object-file: get rid of `the_repository` when writing objects
odb: introduce `odb_write_object()`
loose: write loose objects map via their source
object-file: get rid of `the_repository` in `finalize_object_file()`
object-file: get rid of `the_repository` in `loose_object_info()`
object-file: get rid of `the_repository` when freshening objects
object-file: inline `check_and_freshen()` functions
object-file: get rid of `the_repository` in `has_loose_object()`
object-file: stop using `the_hash_algo`
object-file: fix -Wsign-compare warnings
Add a test script, `t/t1461-refs-list.sh`, for the new `git refs list`
command.
This script acts as a simple driver, leveraging the shared test library
created in the preceding commit. It works by overriding the
`$git_for_each_ref` variable to "git refs list" and then sourcing the
shared library (`t/for-each-ref-tests.sh`).
This approach ensures that `git refs list` is tested against the
entire comprehensive test suite of `git for-each-ref`, verifying
that it acts as a compatible drop-in replacement.
Mentored-by: Patrick Steinhardt <ps@pks.im>
Mentored-by: shejialuo <shejialuo@gmail.com>
Mentored-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: Meet Soni <meetsoni3017@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In preparation for adding tests for the new `git refs list` command,
refactor the existing t6300 test suite to make its logic shareable.
Move the core test logic from `t6300-for-each-ref.sh` into a new
`for-each-ref-tests.sh` file. Inside this new script, replace hardcoded
calls to "git for-each-ref" with the `$git_for_each_ref` variable.
The original `t6300-for-each-ref.sh` script now becomes a simple
"driver". It is responsible for setting the default value of the
variable and then sourcing the test library.
This new structure follows the established pattern used for sharing
tests between `git-blame` and `git-annotate` and prepares the test suite
for the `refs list` tests to be added in a subsequent commit.
Mentored-by: Patrick Steinhardt <ps@pks.im>
Mentored-by: shejialuo <shejialuo@gmail.com>
Mentored-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: Meet Soni <meetsoni3017@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Git's reference management is distributed across multiple commands. As
part of an ongoing effort to consolidate and modernize reference
handling, introduce a `list` subcommand under the `git refs` umbrella as
a replacement for `git for-each-ref`.
Implement `cmd_refs_list` by having it call the `for_each_ref_core()`
helper function. This helper was factored out of the original
`cmd_for_each_ref` in a preceding commit, allowing both commands to
share the same core logic as independent peers.
Add documentation for the new command. The man page leverages the shared
options file, created in a previous commit, by using the AsciiDoc
`include::` macro to ensure consistency with git-for-each-ref(1).
Mentored-by: Patrick Steinhardt <ps@pks.im>
Mentored-by: shejialuo <shejialuo@gmail.com>
Mentored-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: Meet Soni <meetsoni3017@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The implementation of `git for-each-ref` is monolithic within
`cmd_for_each_ref()`, making it impossible to share its logic with other
commands. To enable code reuse for the upcoming `git refs list`
subcommand, refactor the core logic into a shared helper function.
Introduce a new `for-each-ref.h` header to define the public interface
for this shared logic. It contains the declaration for a new helper
function, `for_each_ref_core()`, and a macro for the common usage
options.
Move the option parsing, filtering, and formatting logic from
`cmd_for_each_ref()` into a new helper function named
`for_each_ref_core()`. This helper is made generic by accepting the
command's usage string as a parameter.
The original `cmd_for_each_ref()` is simplified to a thin wrapper that
is only responsible for defining its specific usage array and calling
the shared helper.
Mentored-by: Patrick Steinhardt <ps@pks.im>
Mentored-by: shejialuo <shejialuo@gmail.com>
Mentored-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: Meet Soni <meetsoni3017@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Usage string for `git for-each-ref` was out of sync with its official
documentation. The test `t0450-txt-doc-vs-help.sh` was marked as broken
due to this.
Update the usage string to match the documentation. This allows the test
to pass, so remove the corresponding 'known breakage' marker from the
test file.
Mentored-by: Patrick Steinhardt <ps@pks.im>
Mentored-by: shejialuo <shejialuo@gmail.com>
Mentored-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: Meet Soni <meetsoni3017@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In preparation for adding documentation for `git refs list`, factor out
the common options from the `git-for-each-ref` man page into a
shareable file `for-each-ref-options.adoc` and update
`git-for-each-ref.adoc` to use an `include::` macro.
This change is a pure refactoring and results in no change to the
final rendered documentation for `for-each-ref`.
Mentored-by: Patrick Steinhardt <ps@pks.im>
Mentored-by: shejialuo <shejialuo@gmail.com>
Mentored-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: Meet Soni <meetsoni3017@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When building with -Og on gcc 15.1.1, the build produces a warning. In
practice, though, this cannot be hit because `exact` acts as a guard and
that variable can only be set after `matchlen` is already initialized
Assign a default value to `matchlen` so that the warning is silenced.
Signed-off-by: Denton Liu <liu.denton@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In "git remote set-head", we can take varying numbers of arguments
depending on whether we saw the "-d" or "-a" options. But the first
argument is always the remote name.
The current code is somewhat awkward in that it conditionally handles
the remote name up-front like this:
if (argc)
remote = ...from argv[0]...
and then only later decides to bail if we do not have the right number
of arguments for the options we saw.
This makes it hard to figure out if "remote" is always set when it needs
to be. Both for humans, but also for compilers; with -Og, gcc complains
that "remote" can be accessed without being initialized (although this
is not true, as we'd always die with a usage message in that case).
Let's instead enforce the presence of the remote argument up front,
which fixes the compiler warning and is easier to understand. It does
mean duplicating the code to print a usage message, but it's a single
line.
Noticed-by: Denton Liu <liu.denton@gmail.com>
Signed-off-by: Jeff King <peff@peff.net>
Tested-by: Denton Liu <liu.denton@gmail.com>
Signed-off-by: Denton Liu <liu.denton@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In `archive-zip.c:write_zip_entry()` when using a stream as input for
deflating a file, the call to `git_deflate()` with Z_FINISH always
expects Z_STREAM_END to be returned. Per zlib documentation[1]:
If the parameter flush is set to Z_FINISH, pending input is
processed, pending output is flushed and deflate returns with
Z_STREAM_END if there was enough output space. If deflate
returns with Z_OK or Z_BUF_ERROR, this function must be called
again with Z_FINISH and more output space (updated avail_out)
but no more input data, until it returns with Z_STREAM_END or an
error. After deflate has returned Z_STREAM_END, the only
possible operations on the stream are deflateReset or
deflateEnd.
In scenarios where the output buffer is not large enough to write all
the compressed data, it is perfectly valid for the underlying
`deflate()` to return Z_OK. Thus, expecting a single pass of `deflate()`
here to always return Z_STREAM_END is a bug. Update the code to flush
the deflate stream until Z_STREAM_END is returned.
[1]: https://zlib.net/manual.html
Helped-by: Toon Claes <toon@iotcl.com>
Signed-off-by: Justin Tobler <jltobler@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* 'master' of https://github.com/j6t/git-gui: (21 commits)
git-gui: ensure own version of git-gui--askpass is used
git-gui: Allow Tcl 9.0
git-gui: use -profile tcl8 on encoding conversions
git-gui: use -profile tcl8 for file input with Tcl 9
git-gui: themed.tcl: use full namespace for color
git-gui: remove EOL translation for gets
git-gui: honor TCLTK_PATH in git-gui--askpass
git-gui: retire Git Gui.app
git-gui: fix dependency of GITGUI_MAIN on generator
git-gui: remove uname_O in Makefile
git-gui i18n: Remove the locations within the Bulgarian translation
git-gui i18n: Update Bulgarian translation (557t)
git-gui: do not mix -translation binary and -encoding
git-gui: replace encoding binary with iso8859-1
git-gui: translation binary defines iso8859-1
git-gui: assure -eofchar {} on all channels
git-gui: use /cmd/git-gui.exe for shortcut
git-gui: Windows tk_getSaveFile is not useful for shortcuts
git-gui: let nice work on Windows
git-gui: do not add directories to PATH on Windows
...
* 'master' of https://github.com/j6t/gitk:
gitk: Mention globs in description of preference to hide custom refs
gitk: filter invisible upstream refs from reference list
gitk: avoid duplicated upstream refs
gitk i18n: Remove the locations within the Bulgarian translation
gitk i18n: Update Bulgarian translation (322t)
gitk: allow Tcl/Tk 9.0+
gitk: use -profile tcl8 on encoding conversions
gitk: use -profile tcl8 for file input with Tcl 9
gitk: Tcl9 doesn't expand ~, use $env(HOME)
gitk: switch to -translation binary
gitk: update scrolling for TclTk 8.7+ / TIP 474
gitk: restore ui colors after cancelling config dialog
gitk: set config dialog color swatches in one place
gitk: Add user preference to hide specific references
* cb/no-tcl86-on-macos:
git-gui: ensure own version of git-gui--askpass is used
git-gui: honor TCLTK_PATH in git-gui--askpass
git-gui: retire Git Gui.app
git-gui: fix dependency of GITGUI_MAIN on generator
git-gui: remove uname_O in Makefile
When finding a location for the askpass helper, git will be asked
for its exec path, but if that git is not the same that called
git-gui then we might mistakenly point to its helper instead.
Assume that git-gui and the helper are colocated to derive its
path instead.
This is specially useful in macOS where a broken version of that
helper is provided by the system git.
[j6t: move directory to variable to help in-flight topics]
Suggested-by: Mark Levedahl <mlevedahl@gmail.com>
Signed-off-by: Carlo Marcelo Arenas Belón <carenas@gmail.com>
Signed-off-by: Johannes Sixt <j6t@kdbg.org>
* 'docglobs' of github.com:ilyagr/gitk:
gitk: Mention globs in description of preference to hide custom refs
Signed-off-by: Johannes Sixt <j6t@kdbg.org>
Interactive prompt code did not correctly strip CRLF from the end
of line on Windows.
* js/prompt-crlf-fix:
interactive: do strip trailing CRLF from input
Windows fixes.
* js/mingw-fixes:
mingw: support Windows Server 2016 again
mingw_rename: support ReFS on Windows 2022
mingw: drop Windows 7-specific work-around
mingw_open_existing: handle directories better
"git add/etc -p" now honor the diff.context configuration variable,
and also they learn to honor the -U<n> command-line option.
* lm/add-p-context:
add-patch: add diff.context command line overrides
add-patch: respect diff.context configuration
t: use test_config in t4055
t: use test_grep in t3701 and t4055
The config API had a set of convenience wrapper functions that
implicitly use the_repository instance; they have been removed and
inlined at the calling sites.
* ps/config-wo-the-repository: (21 commits)
config: fix sign comparison warnings
config: move Git config parsing into "environment.c"
config: remove unused `the_repository` wrappers
config: drop `git_config_set_multivar()` wrapper
config: drop `git_config_get_multivar_gently()` wrapper
config: drop `git_config_set_multivar_in_file_gently()` wrapper
config: drop `git_config_set_in_file_gently()` wrapper
config: drop `git_config_set()` wrapper
config: drop `git_config_set_gently()` wrapper
config: drop `git_config_set_in_file()` wrapper
config: drop `git_config_get_bool()` wrapper
config: drop `git_config_get_ulong()` wrapper
config: drop `git_config_get_int()` wrapper
config: drop `git_config_get_string()` wrapper
config: drop `git_config_get_string()` wrapper
config: drop `git_config_get_string_multi()` wrapper
config: drop `git_config_get_value()` wrapper
config: drop `git_config_get_value()` wrapper
config: drop `git_config_get()` wrapper
config: drop `git_config_clear()` wrapper
...
Code clean-up.
* kn/for-each-ref-skip-updates:
ref-filter: use REF_ITERATOR_SEEK_SET_PREFIX instead of '1'
t6302: add test combining '--start-after' with '--exclude'
for-each-ref: reword the documentation for '--start-after'
for-each-ref: fix documentation argument ordering
ref-cache: use 'size_t' instead of int for length
"git switch" and "git restore" are declared to be no longer
experimental.
* jt/switch-restore-no-longer-experimental:
builtin: unmark git-switch and git-restore as experimental
A new test to ensure that a recent change will keep working.
* jb/t7510-gpg-program-path:
t7510: use $PWD instead of $(pwd) inside PATH
t7510: add test cases for non-absolute gpg program
When building with clang-22 and DEVELOPER=1 mode, this warning causes us
to fail compilation:
builtin/revert.c:114:13: error: default initialization of an object of type 'const char' leaves the object uninitialized [-Werror,-Wdefault-const-init-var-unsafe]
114 | const char sentinel_value;
| ^
The compiler is right that this code is a bit funny. We declare a const
value without an initializer. It cannot be assigned to because of the
const, but without an initializer it has no predictable value. So as a
variable it can never have any useful function, and if we tried to look
at it, we'd get undefined behavior.
But it does have a function. We never use its value, but rather use its
address as a sentinel value for some other variables:
const char *gpg_sign = &sentinel_value;
...maybe set gpg_sign via parse_options...
if (gpg_sign != &sentinel_value)
...we got a non-default value...
Normally we'd use NULL as a sentinel value for a pointer, but it doesn't
work here because we also want to detect --no-gpg-sign, which is marked
by setting the pointer to NULL. We need a separate "this was not
touched" value, which is what this sentinel variable gives us.
So the code is correct as-is, but the sentinel variable itself is funny
enough that it's understandable for a compiler warning to flag it. Let's
try to appease the compiler.
There are a few possible options:
1. Instead of a variable, we could just construct an artificial
sentinel address like "1", "-1", etc. I think these technically
fall afoul of the C standard (even if we do not access them, even
constructing invalid pointers is not always allowed). But it's also
something we do elsewhere, and even happens in some standard
interfaces (e.g., mmap()'s MMAP_FAILED value). It does involve some
annoying casts, though.
2. We can mark it as static. That gives it a definite value, but
perhaps makes people wonder if the static-ness is important, when
it's not.
3. We can just give it a value to shut the compiler up, even though
nobody cares about that value.
I went with (3) here as the smallest and most obvious change.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This clarifies that one has to enter e.g. `jj/keep/*` and not just
`jj/keep`.
Follows up on 2441e19.
Signed-off-by: Ilya Grigoriev <ilyagr@users.noreply.github.com>
A few file descriptors left unclosed upon program completion in a
few test helper programs are now closed.
* hl/test-helper-fd-close:
test-delta: close output descriptor after use
test-delta: use strbufs to hold input files
test-delta: handle errors with die()
t/helper/test-truncate: close file descriptor after truncation
"git rebase -i" with bogus rebase.instructionFormat configuration
failed to produce the todo file after recording the state files,
leading to confused "git status"; this has been corrected.
* ow/rebase-verify-insn-fmt-before-initializing-state:
rebase: write script before initializing state
Redefine where the multi-pack-index sits in the object subsystem,
which recently was restructured to allow multiple backends that
support a single object source that belongs to one repository. A
midx does span mulitple "object sources".
* ps/object-store-midx:
midx: remove now-unused linked list of multi-pack indices
packfile: stop using linked MIDX list in `get_all_packs()`
packfile: stop using linked MIDX list in `find_pack_entry()`
packfile: refactor `get_multi_pack_index()` to work on sources
midx: stop using linked list when closing MIDX
packfile: refactor `prepare_packed_git_one()` to work on sources
midx: start tracking per object database source
"git for-each-ref" learns "--start-after" option to help
applications that want to page its output.
* kn/for-each-ref-skip:
ref-cache: set prefix_state when seeking
for-each-ref: introduce a '--start-after' option
ref-filter: remove unnecessary else clause
refs: selectively set prefix in the seek functions
ref-cache: remove unused function 'find_ref_entry()'
refs: expose `ref_iterator` via 'refs.h'
It was reported to the Git for Windows project that a simple `git init`
fails on Windows Server 2016:
D:\Dev\test> git init
error: could not write config file D:/Dev/test/.git/config: Function not implemented
fatal: could not set 'core.repositoryformatversion' to '0'
According to https://endoflife.date/windows-server, Windows Server 2016
is officially supported for another one-and-a-half years as of time of
writing, so this is not good.
The culprit is the `mingw_rename()` changes that try to use POSIX
semantics when available, but fail to fall back properly on Windows
Server 2016.
This fixes https://github.com/git-for-windows/git/issues/5695.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
ReFS is an alternative filesystem to NTFS. On Windows 2022, it seems not
to support the rename operation using POSIX semantics that Git uses on
Windows as of 391bceae4350 (compat/mingw: support POSIX semantics for
atomic renames, 2024-10-27).
However, Windows 2022 reports `ERROR_NOT_SUPPORTED` in this instance.
This is in contrast to `ERROR_INVALID_PARAMETER` (as previous Windows
versions would report that do not support POSIX semantics in renames at
all).
Let's handle both errors the same: by falling back to the best-effort
option, namely to rename without POSIX semantics.
This fixes https://github.com/git-for-windows/git/issues/5427
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In ac33519ddfa8 (mingw: restrict file handle inheritance only on Windows
7 and later, 2019-11-22), I introduced code to safe-guard the
defense-in-depth handling that restricts handles' inheritance so that it
would work with Windows 7, too.
Let's revert this patch: Git for Windows dropped supporting Windows 7 (and
Windows 8) directly after Git for Windows v2.46.2. For full details, see
https://gitforwindows.org/requirements#windows-version.
Actually, on second thought: revert only the part that makes this handle
inheritance restriction logic optional and that suggests to open a bug
report if it fails, but keep the fall-back to try again without said
logic: There have been a few false positives over the past few years
(where the warning was triggered e.g. because Defender was still
accessing a file that Git wanted to overwrite), and the fall-back logic
seems to have helped occasionally in such situations.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Optimize the sequence get+put to peek+replace to avoid one unnecessary
heap rebalance.
Do that by tracking partial get operations in a prio_queue wrapper,
struct lazy_queue, and using wrapper functions that turn get into peek
and put into replace as needed. This is simpler than tracking the
state explicitly in the calling code.
We get a nice speedup on top of the previous patch's conversion to
prio_queue:
Benchmark 1: ./git_2.50.1 describe $(git rev-list v2.41.0..v2.47.0)
Time (mean ± σ): 1.559 s ± 0.002 s [User: 1.493 s, System: 0.051 s]
Range (min … max): 1.556 s … 1.563 s 10 runs
Benchmark 2: ./git_describe_pq describe $(git rev-list v2.41.0..v2.47.0)
Time (mean ± σ): 1.204 s ± 0.001 s [User: 1.138 s, System: 0.051 s]
Range (min … max): 1.202 s … 1.205 s 10 runs
Benchmark 3: ./git describe $(git rev-list v2.41.0..v2.47.0)
Time (mean ± σ): 850.9 ms ± 1.6 ms [User: 786.6 ms, System: 49.8 ms]
Range (min … max): 849.1 ms … 854.1 ms 10 runs
Summary
./git describe $(git rev-list v2.41.0..v2.47.0) ran
1.41 ± 0.00 times faster than ./git_describe_pq describe $(git rev-list v2.41.0..v2.47.0)
1.83 ± 0.00 times faster than ./git_2.50.1 describe $(git rev-list v2.41.0..v2.47.0)
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Replace the use a list-based priority queue whose order is maintained by
commit_list_insert_by_date() with a prio_queue. This avoids quadratic
worst-case complexity. And in the somewhat contrived example of
describing the 4751 commits from v2.41.0 to v2.47.0 in one go (to get a
sizable chunk of describe work with minimal ref loading overhead) it's
significantly faster:
Benchmark 1: ./git_2.50.1 describe $(git rev-list v2.41.0..v2.47.0)
Time (mean ± σ): 1.558 s ± 0.002 s [User: 1.492 s, System: 0.051 s]
Range (min … max): 1.557 s … 1.562 s 10 runs
Benchmark 2: ./git describe $(git rev-list v2.41.0..v2.47.0)
Time (mean ± σ): 1.209 s ± 0.006 s [User: 1.143 s, System: 0.051 s]
Range (min … max): 1.201 s … 1.219 s 10 runs
Summary
./git describe $(git rev-list v2.41.0..v2.47.0) ran
1.29 ± 0.01 times faster than ./git_2.50.1 describe $(git rev-list v2.41.0..v2.47.0)
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
tr2_cfg_load_patterns() and tr2_load_env_vars() functions are
functions with very similar structure that each reads an environment
variable, splits its value at the ',' boundaries, and trims the
resulting string pieces into an array of strbufs.
But the code paths that later use these strbufs take no advantage of
the strbuf-ness of the result (they do not benefit from <ptr,len>
representation to avoid having to run strlen(<ptr>), for example).
Simplify the code by teaching these functions to split into a string
list instead; even the trimming comes for free ;-).
Signed-off-by: Junio C Hamano <gitster@pobox.com>
strbuf_trim_trailing_newline() removes a LF or a CRLF from the tail
of a string. If the code plans to call strbuf_trim() immediately
after doing so, the code is better off skipping the EOL trimming in
the first place. After all, LF/CRLF at the end is a mere special
case of whitespaces at the end of the string, which will be removed
by strbuf_rtrim() anyway.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The code to read status from subprocess reads one packet line and
tries to find "status=<foo>". It is way overkill to split the line
into an array of two strbufs to extract <foo>.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
environment.c:get_git_namespace() learns the raw namespace from an
environment variable, splits it at "/", and appends them after
"refs/namespaces/"; the reason why it splits first is so that an
empty string resulting from double slashes can be omitted.
The split pieces do not need to be edited in any way, so an array of
strbufs is a wrong data structure to use. Instead split into a
string list and use the pieces from there.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When parsing an old-style GIT_CONFIG_PARAMETERS environment
variable, the code parses key=value pairs by splitting them at '='
into an array of strbuf's. As strbuf_split() leaves the delimiter
at the end of the split piece, the code has to manually trim it.
If we split with string_list_split(), that becomes unnecessary.
Retire the use of strbuf_split() from this code path.
Note that the max parameter of string_list_split() is of
an ergonomically iffy design---it specifies the maximum number of
times the function is allowed to split, which means that in order to
split a text into up to 2 pieces, you have to pass 1, not 2.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When reading copy instructions from the standard input, the program
reads a line, splits it into tokens at whitespace, and trims each of
the tokens before using. We no longer need to use strbuf just to be
able to trim, as string_list_split*() family now can trim while
splitting a string.
Retire the use of strbuf_split() from this code path.
Note that this loop is a bit sloppy in that it ensures at least
there are two tokens on each line, but ignores if there are extra
tokens on the line. Tightening it is outside the scope of this
series.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When reading merge instructions from the standard input, the program
reads from the standard input, splits the line into tokens at
whitespace, and trims each of them before using. We no longer need
to use strbuf just for trimming, as string_list_split*() family can
trim while splitting a string.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
builtin/clean.c:filter_by_patterns_cmd() interactively reads a line
that has exclude patterns from the user and splits the line into a
list of patterns. It uses the strbuf_split() so that each split
piece can then trimmed.
There is no need to use strbuf anymore, thanks to the recent
enhancement to string_list_split*() family that allows us to trim
the pieces split into a string_list.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The callee parse_choice() only needs to access a NUL-terminated
string; instead of insisting to take a pointer to a strbuf, just
take a pointer to a character array.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
builtin/clean.c:parse_choice() is fed a single line of input, which
is space or comma separated list of tokens, and a list of menu
items. It parses the tokens into number ranges (e.g. 1-3 that means
the first three items) or string prefix (e.g. 's' to choose the menu
item "(s)elect") that specify the elements in the menu item list,
and tells the caller which ones are chosen.
For parsing the input string, it uses strbuf_split() to split it
into bunch of strbufs. Instead use string_list_split_in_place(),
for a few reasons.
* strbuf_split() is a bad API function to use, that yields an array
of strbuf that is a bad data structure to use in general.
* string_list_split_in_place() allows you to split with "comma or
space"; the current code has to preprocess the input string to
replace comma with space because strbuf_split() does not allow
this.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When you pass a structure by value, the callee can modify the
contents of the structure that was passed in without having to worry
about changing the structure the caller has. Passing structure by
value sometimes (but not very often) can be a valid way to give
callee a temporary variable it can freely modify.
But not a structure with members that are pointers, like a strbuf.
builtin/clean.c:list_and_choose() reads a line interactively from
the user, and passes the line (in a strbuf) to parse_choice() by
value, which then munges by replacing ',' with ' ' (to accept both
comma and space separated list of choices). But because the strbuf
passed by value still shares the underlying character array buf[],
this ends up munging the caller's strbuf contents.
This is a catastrophe waiting to happen. If the callee causes the
strbuf to be reallocated, the buf[] the caller has will become
dangling, and when the caller does strbuf_release(), it would result
in double-free.
Stop calling the function with misleading call-by-value with strbuf.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
strbuf is a very good data structure to work with string data
without having to worry about running past the end of the string,
but strbuf_split() is a wrong API and an array of strbuf that the
function produces is a wrong thing to use in general. You do not
edit these N strings split out of a single strbuf simultaneously.
Often it is much better off to split a string into string_list and
work with the resulting strings.
wt-status.c:abbrev_oid_in_line() takes one line of rebase todo list
(like "pick e813a0200a7121b97fec535f0d0b460b0a33356c title"), and
for instructions that has an object name as the second token on the
line, replace the object name with its unique abbreviation. After
splitting these tokens out of a single line, no simultaneous edit on
any of these pieces of string that takes advantage of strbuf API
takes place. The final string is composed with strbuf API, but
these split pieces are merely used as pieces of strings and there is
no need for them to be stored in individual strbuf.
Instead, split the line into a string_list, and compose the final
string using these pieces.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* jc/string-list-split:
string-list: split-then-remove-empty can be done while splitting
string-list: optionally omit empty string pieces in string_list_split*()
diff: simplify parsing of diff.colormovedws
string-list: optionally trim string pieces split by string_list_split*()
string-list: unify string_list_split* functions
string-list: align string_list_split() with its _in_place() counterpart
string-list: report programming error with BUG
Thanks to the new STRING_LIST_SPLIT_NONEMPTY flag, a common pattern
to split a string into a string list and then remove empty items in
the resulting list is no longer needed. Instead, just tell the
string_list_split*() to omit empty ones while splitting.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Teach the unified split_string() machinery a new flag bit,
STRING_LIST_SPLIT_NONEMPTY, to cause empty split pieces to be
omitted from the resulting string list.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The code to parse this configuration variable, whose value is a
comma-separated list of known tokens like "ignore-space-change" and
"ignore-all-space", uses string_list_split() to split the value into
pieces, and then places each piece of string in a strbuf to trim,
before comparing the result with the list of known tokens.
Thanks to the previous steps, now string_list_split() can trim the
resulting pieces before it places them in the string list. Use it
to simplify the code.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Teach the unified split_string() to take an optional "flags" word,
and define the first flag STRING_LIST_SPLIT_TRIM to cause the split
pieces to be trimmed before they are placed in the string list.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Thanks to the previous step, the only difference between these two
related functions is that string_list_split() works on a string
without modifying its contents (i.e. taking "const char *") and the
resulting pieces of strings are their own copies in a string list,
while string_list_split_in_place() works on a mutable string and the
resulting pieces of strings come from the original string.
Consolidate their implementations into a single helper function, and
make them a thin wrapper around it. We can later add an extra flags
parameter to extend both of these functions by updating only the
internal helper function.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The string_list_split_in_place() function was updated by 52acddf3
(string-list: multi-delimiter `string_list_split_in_place()`,
2023-04-24) to take more than one delimiter characters, hoping that
we can later use it to replace our uses of strtok(). We however did
not make a matching change to the string_list_split() function,
which is very similar.
Before giving both functions more features in future commits, allow
string_list_split() to also take more than one delimiter characters
to make them closer to each other.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* 'strip-post-hooks' of github.com:orgads/git-gui:
git-gui: strip the commit message after running commit-msg hook
Signed-off-by: Johannes Sixt <j6t@kdbg.org>
* ml/tcl90:
git-gui: Allow Tcl 9.0
git-gui: use -profile tcl8 on encoding conversions
git-gui: use -profile tcl8 for file input with Tcl 9
git-gui: themed.tcl: use full namespace for color
git-gui: remove EOL translation for gets
git-gui: do not mix -translation binary and -encoding
git-gui: replace encoding binary with iso8859-1
git-gui: translation binary defines iso8859-1
git-gui: assure -eofchar {} on all channels
Signed-off-by: Johannes Sixt <j6t@kdbg.org>
* 'master' of https://github.com/alshopov/git-gui:
git-gui i18n: Remove the locations within the Bulgarian translation
git-gui i18n: Update Bulgarian translation (557t)
Signed-off-by: Johannes Sixt <j6t@kdbg.org>
Passing a string list that has .strdup_strings bit unset to
string_list_split(), or one that has .strdup_strings bit set to
string_list_split_in_place(), is a programmer error. Do not use
die() to abort the execution. Use BUG() instead.
As a developer-facing message, the message string itself should
be a lot more concise, but let's keep the original one for now.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The reftable unit tests are now ported to the "clar" unit testing
framework.
* sk/reftable-clarify-tests:
t/unit-tests: finalize migration of reftable-related tests
t/unit-tests: convert reftable stack test to use clar
t/unit-tests: convert reftable record test to use clar
t/unit-tests: convert reftable readwrite test to use clar
t/unit-tests: convert reftable table test to use clar
t/unit-tests: convert reftable pq test to use clar
t/unit-tests: convert reftable merged test to use clar
t/unit-tests: convert reftable block test to use clar
t/unit-tests: convert reftable basics test to use clar test framework
t/unit-tests: implement clar specific reftable test helper functions
To help our developers, document what C99 language features are
being considered for adoption, in addition to what past experiments
have already decided.
* jc/document-test-balloons-in-flight:
CodingGuidelines: document test balloons in flight
Document recently added "git imap-send --list" with an example.
* ag/imap-send-list-folders-doc:
docs: explain how to use `git imap-send --list` command to get a list of available folders
Move structure definition from unrelated header file to where it
belongs.
* jc/rev-list-info-cleanup:
rev-list: make "struct rev_list_info" static to the only user
When using the Meson build system with versions of Git before 2.31,
that does not yet know the `git ls-files --deduplicate` option, one
can observe the following error:
../meson.build:697:19: ERROR: Command `/usr/bin/git -C /home/martin/code/git ls-files --deduplicate '*.h' ':!contrib' ':!compat/inet_ntop.c' ':!compat/inet_pton.c' ':!compat/nedmalloc' ':!compat/obstack.*' ':!compat/poll' ':!compat/regex' ':!sha1collisiondetection' ':!sha1dc' ':!t/unit-tests/clar' ':!t/t[0-9][0-9][0-9][0-9]*' ':!xdiff'` failed with status 129.
The failing command is used to find all header files in our code
base, which is required for static analysis.
Static analysis is an entirely optional feature that distributors
typically don't care about, and we already know to skip running the
command when we are not in a Git repository. But we do not handle
the above failure gracefully, even though we could.
Fix this by passing `check: false` to `run_command`, which makes it
tolerate failures. Then check `returncode()` manually to decide
whether to inspect the output.
Signed-off-by: Martin Storsjö <martin@martin.st>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
6e411d20440 (Initial draft of fast-import documentation., 2007-02-05)
pointed out how much time a fast-import took on some hardware with a
specific cost. Let’s further point out that this experiment was done
in 2007. So modern hardware should have no issues with such a repo.
Also move the parenthetical to the end now that it contains four words.
Signed-off-by: Kristoffer Haugsbakk <code@khaugsbakk.name>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In the section for naming various API functions, the fact that
S_release() only releases the resources without preparing the
structure for immediate reuse becomes only apparent when you
readentries for S_release() and S_clear().
Clarify the description of S_release() a bit to make the entry self
sufficient.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* ml/tcltk-9:
gitk: allow Tcl/Tk 9.0+
gitk: use -profile tcl8 on encoding conversions
gitk: use -profile tcl8 for file input with Tcl 9
gitk: Tcl9 doesn't expand ~, use $env(HOME)
gitk: switch to -translation binary
gitk: update scrolling for TclTk 8.7+ / TIP 474
Signed-off-by: Johannes Sixt <j6t@kdbg.org>
* ml/abandon-old-version:
gitk: restore ui colors after cancelling config dialog
gitk: set config dialog color swatches in one place
Signed-off-by: Johannes Sixt <j6t@kdbg.org>
* 'master' of github.com:alshopov/gitk:
gitk i18n: Remove the locations within the Bulgarian translation
gitk i18n: Update Bulgarian translation (322t)
Signed-off-by: Johannes Sixt <j6t@kdbg.org>
`git reset -p file` on a Windows CMD refuses to do anything useful
with this error message:
(1/5) Unstage this hunk [y,n,q,a,d,j,J,g,/,e,p,?]? n
'nly one letter is expected, got 'n
The letter 'O' at the beginning of the line is overwritten by an
apostrophe, so, clearly the parser sees the string "n\r".
strbuf_trim_trailing_newline() removes trailing CRLF from the string.
In particular, it first removes LF if present, and if that was the
case, it also removes CR if present.
git_read_line_interactively() clearly intends to remove CRLF as it
calls strbuf_trim_trailing_newline(). However, input is gathered using
strbuf_getline_lf(), which already removes the trailing LF. Now
strbuf_trim_trailing_newline() does not see LF, so that it does not
remove CR, either, and leaves it for the caller to process.
Call strbuf_getline() instead, which removes both LF and CR.
Signed-off-by: Johannes Sixt <j6t@kdbg.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* ps/config-wo-the-repository: (21 commits)
config: fix sign comparison warnings
config: move Git config parsing into "environment.c"
config: remove unused `the_repository` wrappers
config: drop `git_config_set_multivar()` wrapper
config: drop `git_config_get_multivar_gently()` wrapper
config: drop `git_config_set_multivar_in_file_gently()` wrapper
config: drop `git_config_set_in_file_gently()` wrapper
config: drop `git_config_set()` wrapper
config: drop `git_config_set_gently()` wrapper
config: drop `git_config_set_in_file()` wrapper
config: drop `git_config_get_bool()` wrapper
config: drop `git_config_get_ulong()` wrapper
config: drop `git_config_get_int()` wrapper
config: drop `git_config_get_string()` wrapper
config: drop `git_config_get_string()` wrapper
config: drop `git_config_get_string_multi()` wrapper
config: drop `git_config_get_value()` wrapper
config: drop `git_config_get_value()` wrapper
config: drop `git_config_get()` wrapper
config: drop `git_config_clear()` wrapper
...
Prior to 05e9cd64 (config: quote values containing CR character,
2025-05-19), a repository can trick "clone --recurse-submodules"
into running a post-checkout hook shipped with the project. The
test was written to make sure the trick would no longer run the
hook with the fix in the commit.
However, the test did not check for the path the hook would
create; correct the path to the expected one if the bug were
still with us.
Signed-off-by: chenjianhu <chenjianhu@kylinos.cn>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
TclTk 9.0 is now shipping, and git-gui is now patched to support use of
this newer version. Adjust required versions to allow Tcl/Tk >= 8.6,
including 9.x.
Signed-off-by: Mark Levedahl <mlevedahl@gmail.com>
git-gui in the prior commit learned to apply -profile tcl8 when reading
files, avoiding errors on non-binary data streams whose encoding is not
utf-8. But, git-gui also consumes binary data streams (generally blobs
from commits) as the output of commands, and internally decodes this to
support various displays.
With Tcl9, errors occur in this decoding for the same reasons described
in the previous commit: basically, the underlying data may contain
extended ascii characters violating the assumption of utf-8 encoding.
This problem has a similar fix to the prior issue: we must use the tlc8
profile when converting this data to the internal unicode format. Do so,
again only on Tcl9 as Tcl8.6 does not recognize -profile, and only Tcl
9.0 makes strict the default.
Signed-off-by: Mark Levedahl <mlevedahl@gmail.com>
git-gui invokes many git commands expecting output in utf-8 encoding,
but git accepts extended ascii (code page unknown) as utf-8 without
validating, so cannot guarantee valid utf-8 on output. In particular,
using any extended ascii code page has long been acceptable on git given
that everyone on a project is aware of and uses that same code page to
view all data. utf-8 accepts only 7-bit ascii characters in single
bytes, and any characters outside of that base set require at least two
bytes for representation in unicode.
Tcl is a string based language, and transcodes all input data to an
internal unicode format, and to whatever format is requested on output:
"pure" binary is recoded byte by byte using iso8859-1. Tcl8.x silently
recodes invalid utf-8 as binary data, so extended ascii characters
maintain their binary value on output but may not display correctly.
Tcl 8.7 added three profiles to control this behaviour: strict (raises
exceptions), replace (replaces each invalid byte with ?), and the
default tcl8 maintaining the old behavior. Tcl 9 changes the default
profile to strict, meaning any invalid utf-8 raises an exception that
git-gui does not handle.
An example of this in the git repository is commit 7eb93c8965 ("[PATCH]
Simplify git script", 2005-09-07). This includes extended ascii
characters in the author name and commit message.
The tcl8 profile used so far has acceptable behavior given git-gui's
acceptance: this allows git-gui to accept extended ascii though it may
display incorrectly. Let's continue that behavior by overriding open to
use the tcl8 profile on Tcl9 and later: Tcl 8.6 does not understand
fconfigure -profile, and Tcl 8.7 maintains the tcl8 profile.
Signed-off-by: Mark Levedahl <mlevedahl@gmail.com>
Tcl 9 imposes strict requirements on namespaces for variables, while Tcl
8 does not. lib/themed.tcl does not use the fully qualified name for the
"color" namespace, with result that variables are not found with Tcl
9.0. Fix this.
Signed-off-by: Mark Levedahl <mlevedahl@gmail.com>
git-gui configures '-translation lf' on a number of channels. The
default configuration is 'auto', which on input changes any occurrence
of \n, \r, or \r\n to \n, and on output changes any such EOL sequence to
a platform dependent value (\n on Unix, \r\n on Windows). Such
translation can be necessary, but much of what is configured now is
redundant.
In particular, many of the channels configured this way are then
consumed by gets, which already recognizes any of \n, \r, or \r\n as
terminators. Configuring a channel to first change these line endings,
then give the result to gets, is redundant.
The valid uses of -translation lf are for output where we do not want
\r\n on Windows, and for consuming entire files without going through
gets, assuring that \n will be used internally. Let's remove all the
others that only serve to confuse.
lib/diff.tcl must have -translation lf because \r\n might be stored in
the repository (e.g., on Windows, with no crlf translation enabled), and
git will treat \n as the line ending, while the preceding \r is just
whitespace, and these may be split by ANSI color coding. git-gui's
read_diff handles this correctly as-is.
Signed-off-by: Mark Levedahl <mlevedahl@gmail.com>
* ml/windows-tie-loose-ends:
git-gui: use /cmd/git-gui.exe for shortcut
git-gui: Windows tk_getSaveFile is not useful for shortcuts
git-gui: let nice work on Windows
git-gui: do not add directories to PATH on Windows
Signed-off-by: Johannes Sixt <j6t@kdbg.org>
Since its introduction in 8c76212 (git-gui: Add a simple implementation
of SSH_ASKPASS., 2008-10-15), git-gui--askpass has been calling whatever
wish interpreter is in the path, unlike git-gui.
Correct that by turning it into a script that would be processed at build
time.
Signed-off-by: Carlo Marcelo Arenas Belón <carenas@gmail.com>
Signed-off-by: Johannes Sixt <j6t@kdbg.org>
In a recent commit, the minimum version of Tcl/Tk was raised to
8.6, but the "app" relies on the system provided Framework that
is based on 8.5.
Remove it, and let git-gui use a third party version of Wish if
available.
Signed-off-by: Carlo Marcelo Arenas Belón <carenas@gmail.com>
Signed-off-by: Johannes Sixt <j6t@kdbg.org>
Since 854e883 (git-gui: extract script to generate "git-gui",
2025-03-11), the logic to generate the main script was pulled
out of the Makefile, but adding the resulting generator as a
dependency was missed.
If the logic changes, the main script should be regenerated, so
add it as a dependency.
Signed-off-by: Carlo Marcelo Arenas Belón <carenas@gmail.com>
Signed-off-by: Johannes Sixt <j6t@kdbg.org>
Last used in ae49066 (git gui Makefile - remove Cygwin modifications,
2023-06-26), and unused since.
Signed-off-by: Carlo Marcelo Arenas Belón <carenas@gmail.com>
Signed-off-by: Johannes Sixt <j6t@kdbg.org>
In refill_reflist, upstream refs are now only included if their
commits are visible in the current view. This prevents display
issues like multiple highlighted branches when clicking entries.
Signed-off-by: Michael Rappazzo <michael.rappazzo@infor.com>
As I ended up wasting a few dozen minutes looking for the reason why
this is still here, help future developers by saving them from
wasting their time by documenting why this code that apparently is
not used by anybody is still here.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
It is possible that multiple local branches track the same upstream.
In this case, the refs dialog lists the tracked upstream branch
multiple times. This is undesirable. Make them unique.
Signed-off-by: Johannes Sixt <j6t@kdbg.org>
* ps/object-store-midx:
midx: remove now-unused linked list of multi-pack indices
packfile: stop using linked MIDX list in `get_all_packs()`
packfile: stop using linked MIDX list in `find_pack_entry()`
packfile: refactor `get_multi_pack_index()` to work on sources
midx: stop using linked list when closing MIDX
packfile: refactor `prepare_packed_git_one()` to work on sources
midx: start tracking per object database source
This patch compliments the previous commit, where builtins that use
add-patch infrastructure now respect diff.context and
diff.interHunkContext file configurations.
In particular, this patch helps users who don't want to set persistent
context configurations or just want a way to override them on a one-time
basis, by allowing the relevant builtins to accept corresponding command
line options that override the file configurations.
This mimics commands such as diff and log, which allow for both context
file configuration and command line overrides.
Signed-off-by: Leon Michalak <leonmichalak6@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Various builtins that use add-patch infrastructure do not respect
the user's diff.context and diff.interHunkContext file configurations.
The user may be used to seeing their diffs with customized context size,
but not in the patches "git add -p" shows them to pick from.
Teach add-patch infrastructure to read these configuration variables and
pass their values when spawning the underlying plumbing commands as
their command line option.
Signed-off-by: Leon Michalak <leonmichalak6@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Use the modern "test_config" test utility instead of manual"git config"
as the former provides clean up on test completion.
This is a prerequisite to the commits that follow which add to this test
file.
Signed-off-by: Leon Michalak <leonmichalak6@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
As a preparatory clean-up, use the "test_grep" test utility instead of
regular "grep" which provides better debug information if tests fail.
Signed-off-by: Leon Michalak <leonmichalak6@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The "clar-decls.h" header gets generated by us to extract prototypes of
unit test functions from our clar-based tests. This generated file is
then written into "t/unit-tests/" and included via "unit-test.h". The
intent of all this is that we can keep "-Wmissing-prototype" warnings
enabled. If we had that warning disabled, it would be easy to miss in
case any of the non-static functions had a typo in its name and thus
wasn't picked up by our test case extractor.
Including the file directly has a big downside though: if a source tree
was built both with our Makefile and with Meson, then the Meson build
would include the "clar-decls.h" file from our Makefile. And if those
are out of sync we get compiler errors.
We already fixed a similar issue in 4771501c0a (meson: ensure correct
version-def.h is used, 2025-01-14). Let's do the same and pass the
absolute path to "clar-decls.h" via a preprocessor define.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
On Windows, $(pwd) will give us a Windows-style path like "D:/foo".
Putting that into $PATH confuses anybody parsing that variable, since
colon is a separator character in $PATH. Instead, we should use the
Unix-style value we get from $PWD ("/d/foo").
This is similar to the cases fixed by 71dd50472d (t0021, t5615: use $PWD
instead of $(pwd) in PATH-like shell variables, 2016-11-11).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The get_commit_info() function accepts a parameter that can be used
to stop the commit parsing early. However, none of the callers use
this feature, and testing proved that the performance gain of
stopping parsing early is negligible and unmeasurable.
Signed-off-by: Han Young <hanyang.tony@bytedance.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In 4e43b7ff (Declare both git-switch and git-restore experimental,
2019-04-25), the newly introduced git-switch(1) and git-restore(1)
commands were marked as experimental. This was done to provide time to
make breaking changes to the interface. It has now been over six years
since these commands were implemented and there hasn't been much change.
Consequently, users have grown to rely on how these commands work and it
is no longer feasible to make any breaking changes.
Let's remove the experimental label for git-switch(1) and
git-restore(1).
Signed-off-by: Justin Tobler <jltobler@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In the commit 51511d68f4 (for-each-ref: introduce a '--start-after'
option, 2025-07-15), for introducing the '--start-after' flag, the
`ref_iterator_seek()` was modified to also accept a flag. This was to
allow the function to also set the prefix when
'REF_ITERATOR_SEEK_SET_PREFIX' was set.
In `do_filter_refs()` instead of passing the flag, we pass in '1' which
is the value of the flag. While this works, this is definitely hard to
read and introduces inconsistency. Change it to use the flag.
While here, remove the unnecessary 'if (prefix)' clause in the 'else'
statement, since the block already checks for 'prefix'.
Reported-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The '--start-after' doesn't explicitly mention being compatible with the
'--exclude' flag, generally only incompatibility is explicitly called
out. However, it would be nice to test the compatibility between the
two to avoid future regressions. Let's do that.
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The documentation for '--start-after' states that the flag cannot be
used with general pattern matching. This is a bit vague, since there is
no clear understanding about what 'general' means here. Rewrite the
sentence to be more specific.
While here, fix a typo in the 'OPT_STRING'.
Helped-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Improve the 'git-for-each-ref(1)' documentation with two corrections:
1. Add parentheses around `--exclude=<pattern>` to indicate this option
can be repeated as a complete unit.
2. Move `--stdin | <pattern> ...` to the end, after all flags, since
`<pattern>` is a positional argument that should appear last in the
argument list.
While here, change to using the synopsis block which will automatically
format placeholders in italics and keywords in monospace.
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The commit 090eb5336c (refs: selectively set prefix in the seek
functions, 2025-07-15) modified the ref-cache iterator to support
seeking to a specified marker without setting the prefix.
The commit adds and uses an integer 'len' to capture the length of the
seek marker to compare with the entries of a given directory. Since the
type of the variable is 'int', this is met with a typecast of converting
a `strlen` to 'int' so it can be assigned to the 'len' variable.
This is whole operation is a bit wrong:
1. Since the 'len' variable is eventually used in a 'strncmp', it should
have been of type 'size_t'.
2. This also truncates the value provided from 'strlen' to an int, which
could cause a large refname to produce a negative number.
Let's do the correct thing here and simply use 'size_t' for `len`.
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Inline the check for whitespace flags so that the compiler can hoist
it out of the loop in xdl_prepare_ctx(). This improves the performance
by 8%.
$ hyperfine --warmup=1 -L rev HEAD,HEAD^ --setup='git checkout {rev} -- :/ && make git' ': {rev}; GIT_CONFIG_GLOBAL=/dev/null ./git log --oneline --shortstat v2.0.0..v2.5.0'
Benchmark 1: : HEAD; GIT_CONFIG_GLOBAL=/dev/null ./git log --oneline --shortstat v2.0.0..v2.5.0
Time (mean ± σ): 1.670 s ± 0.044 s [User: 1.473 s, System: 0.196 s]
Range (min … max): 1.619 s … 1.754 s 10 runs
Benchmark 2: : HEAD^; GIT_CONFIG_GLOBAL=/dev/null ./git log --oneline --shortstat v2.0.0..v2.5.0
Time (mean ± σ): 1.801 s ± 0.021 s [User: 1.605 s, System: 0.192 s]
Range (min … max): 1.766 s … 1.831 s 10 runs
Summary
': HEAD^; GIT_CONFIG_GLOBAL=/dev/null ./git log --oneline --shortstat v2.0.0..v2.5.0' ran
1.08 ± 0.03 times faster than ': HEAD^^; GIT_CONFIG_GLOBAL=/dev/null ./git log --oneline --shortstat v2.0.0..v2.5.0'
Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git commit" that concludes a conflicted merge failed to notice and remove
existing comment added automatically (like "# Conflicts:") when the
core.commentstring is set to 'auto'.
* ac/auto-comment-char-fix:
config: set comment_line_str to "#" when core.commentChar=auto
commit: avoid scanning trailing comments when 'core.commentChar' is "auto"
The pop_most_recent_commit() function can have quite expensive
worst case performance characteristics, which has been optimized by
using prio-queue data structure.
* rs/pop-recent-commit-with-prio-queue:
commit: use prio_queue_replace() in pop_most_recent_commit()
prio-queue: add prio_queue_replace()
commit: convert pop_most_recent_commit() to prio_queue
A number of tests in "t9350-fast-export.sh" are using sub-shells to
redirect content to a number of commands instead of only
`git fast-import`.
This is confusing and possibly error-prone, so let's change those tests
so that no sub-shell is used and the content goes only to
`git fast-import`.
Reported-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Builtin commands show usage information on stdout if called with -h as
their only option, usage.c::show_usage_if_asked() makes sure of that.
Aliases show alias information on stderr if called with -h as the first
option since a9a60b94cc (git.c: handle_alias: prepend alias info when
first argument is -h, 2018-10-09). This is surprising when using
aliases for commands that take -h as a normal argument among others,
like git grep.
Tighten the condition and show the alias information only if -h is the
only option given, to be consistent with builtins.
It's probably still is a good idea to write to stderr, as an alias
command doesn't have to be a builtin and could instead produce output
with just -h that might be spoiled by an extra alias info line.
Reported-by: Kevin Brodsky <kevin.brodsky@arm.com>
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Document that we do not require "real" name when signing your
patches off.
* bc/contribution-under-non-real-names:
SubmittingPatches: allow non-real name contributions
Meson-based build did not handle libexecdir setting correctly,
which has been corrected.
* rj/meson-libexecdir-fix:
po/meson.build: add missing 'ga' language code
meson: fix installation when -Dlibexexdir is set
Clean-up compat/bswap.h mess.
* ss/compat-bswap-revamp:
bswap.h: provide a built-in based version of bswap32/64 if possible
bswap.h: remove optimized x86 version of bswap32/64
bswap.h: always overwrite ntohl/ ntohll macros
bswap.h: define GIT_LITTLE_ENDIAN on msvc as little endian
bswap.h: add support for __BYTE_ORDER__
GIT_TEST_INSTALLED was not honored in the recent topic related to
SHA256 hashes, which has been corrected.
* kl/test-installed-fix:
test-lib: respect GIT_TEST_INSTALLED when querying default hash
Declare weather-balloon we raised for "bool" type 18 months ago a
success and officially allow using the type in our codebase.
* pw/adopt-c99-bool-officially:
strbuf: convert predicates to return bool
git-compat-util: convert string predicates to return bool
CodingGuidelines: allow the use of bool
In 090eb5336c (refs: selectively set prefix in the seek functions,
2025-07-15) we separated the seeking functionality of reference
iterators from the functionality to set prefix to an iterator. This
allows users of ref iterators to seek to a particular reference to
provide pagination support.
The files-backend, uses the ref-cache iterator to iterate over loose
refs. The iterator tracks directories and entries already processed via
a stack of levels. Each level corresponds to a directory under the files
backend. New levels are added to the stack, and when all entries from a
level is yielded, the corresponding level is popped from the stack.
To accommodate seeking, we need to populate and traverse the levels to
stop the requested seek marker at the appropriate level and its entry
index. Each level also contains a 'prefix_state' which is used for
prefix matching, this allows the iterator to skip levels/entries which
don't match a prefix. The default value of 'prefix_state' is
PREFIX_CONTAINS_DIR, which yields all entries within a level. When
purely seeking without prefix matching, we want to yield all entries.
The commit however, skips setting the value explicitly. This causes the
MemorySanitizer to issue a 'use-of-uninitialized-value' error when
running 't/t6302-for-each-ref-filter'.
Set the value explicitly to avoid to fix the issue.
Reported-by: Kyle Lippincott <spectral@google.com>
Helped-by: Kyle Lippincott <spectral@google.com>
Helped-by: Jeff King <peff@peff.net>
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
configure_added_submodule always writes an explicit
submodule.<name>.active entry, even when the new
path is already matched by submodule.active
patterns. This leads to unnecessary and cluttered configuration.
change the logic to centralize wildmatch-based pattern lookup,
in configure_added_submodule. Wrap the active-entry write in a conditional
that only fires when that helper reports no existing pattern covers the
submodule’s path.
Signed-off-by: K Jayatheerth <jayatheerthkulkarni2005@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Adding a submodule at a path that previously hosted
another submodule (e.g., 'child') reuses the submodule
name derived from the path. If the original submodule
was only moved (e.g., to 'child_old') and not renamed,
this silently overwrites its configuration in .gitmodules.
This behavior loses user configuration and causes
confusion when the original submodule is expected
to remain intact. It assumes that the path-derived
name is always safe to reuse, even though the name
might still be in use elsewhere in the repository.
Teach module_add() to check if the computed submodule
name already exists in the repository's submodule config,
and if so, refuse the operation unless the user explicitly
renames the submodule or uses the --force option,
which will automatically generate a unique name by
appending a number (e.g., child1).
Signed-off-by: K Jayatheerth <jayatheerthkulkarni2005@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The old `lib-reftable.{c,h}` implemented helper functions for our
homegrown unit-testing framework. As part of migrating reftable-related
tests to the Clar framework, Clar-specific versions of these functions
in `lib-reftable-clar.{c,h}` were introduced.
Now that all test files using these helpers have been converted to Clar,
we can safely remove the original `lib-reftable.{c,h}` and rename the
Clar- specific versions back to `lib-reftable.{c,h}`. This restores a
clean and consistent naming scheme for shared test utilities.
Finally, update our build system to reflect the changes made and remove
redundant code related to the reftable tests and our old homegrown
unit-testing setup. `test-lib.{c,h}` remains unchanged in our build
system as some files particularly `t/helper/test-example-tap.c` depends
on it in order to run, and removing that would be beyond the scope of
this patch.
Signed-off-by: Seyi Kuforiji <kuforiji98@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Adapt reftable stack test file to use clar by using clar assertions
where necessary.
This marks the end of all unit tests migrated away from the
`unit-tests/t-*.c` pattern, there are no longer any files matching that
glob. Remove the sanity check for `t-*.c` files to prevent Meson
configuration errors during CI and local builds.
Signed-off-by: Seyi Kuforiji <kuforiji98@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Adapt reftable record test file to use clar by using clar assertions
where necessary.
Signed-off-by: Seyi Kuforiji <kuforiji98@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Adapt reftable readwrite test file to use clar by using clar assertions
where necessary.
Signed-off-by: Seyi Kuforiji <kuforiji98@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Adapt reftable table test file to use clar by using clar assertions
where necessary.
Signed-off-by: Seyi Kuforiji <kuforiji98@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Adapt reftable priority queue test file to use clar by using clar
assertions where necessary.
Signed-off-by: Seyi Kuforiji <kuforiji98@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Adapt reftable merged test file to use clar testing framework by using
clar assertions where necessary.
Signed-off-by: Seyi Kuforiji <kuforiji98@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Adapt reftable block test file to use clar testing framework by using
clar assertions where necessary.
Signed-off-by: Seyi Kuforiji <kuforiji98@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Adapt reftable basics test file to clar by using clar assertions
where necessary.Break up test edge case to improve modularity and
clarity.
Signed-off-by: Seyi Kuforiji <kuforiji98@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Helper functions defined in `t/unit-tests/lib-reftable.{c,h}` are
required for the reftable-related test files to run. In the current
implementation these functions are designed to conform with our
homegrown unit-testing structure. So in other to convert the reftable
test files, there is need for a clar specific implementation of these
helper functions.
Implement equivalent helper functions in `lib-reftable-clar.{c,h}` to
use clar. These functions conform with the clar testing framework and
become available for all reftable-related test files implemented using
the clar testing framework, which requires them. This will be used by
subsequent commits.
Signed-off-by: Seyi Kuforiji <kuforiji98@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
After we write to the output file, the program exits. This naturally
closes the descriptor. But we should do an explicit close for two
reasons:
1. It's possible to hit an error on close(), which we should detect
and report via our exit code.
2. Leaking descriptors is a bad practice in general. Even if it isn't
meaningful here, it sets a bad example.
It is tempting to write:
if (write_in_full(fd, ...) < 0 || close(fd) < 0)
die_errno(...);
But that pattern contains a subtle problem that has resulted in
descriptor leaks before. If write_in_full() fails, we'll short-circuit
and never call close(), leaking the descriptor.
That's not a problem here, since our error path dies instead of
returning up the stack. But since we're trying to set a good example,
let's write it out as two separate conditions. As a bonus, that lets us
produce a slightly more specific error message.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We want to read the whole contents of two files into memory. If we
switch from raw ptr/len pairs to strbufs, we can use strbuf_read_file()
to shorten the code.
This incidentally fixes two small bugs:
1. We stat() the files and allocate our buffers based on st.st_size.
But that is an off_t which may be larger than the size_t we'd use
to allocate. We should use xsize_t() to do a checked conversion.
Otherwise integer truncation (on a file >4GB) could cause us to
under-allocate (though in practice this does not result in a buffer
overflow because the same truncation happens when read_in_full()
also takes a size_t).
2. We get the size from st.st_size, and then try to read_in_full()
that many bytes. But it may return fewer bytes than expected (if
the file changed racily and we get an early EOF), leading us to
read uninitialized bytes in the allocated buffer. We don't notice
because we only check the value for error, not that we got the
expected number of bytes.
The strbuf code doesn't run into this, because it just reads to EOF,
expanding the buffer dynamically as necessary. Neither bug is a big deal
for a test helper, but fixing them is a nice bonus on top of simplifying
the code.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This is a short test helper that does all of its work in the main
function. When we encounter an error, we try to clean up memory and
descriptors and then jump to an error return, which exits the program.
We can get the same effect by just calling die(), which means we do not
have to bother with cleaning up. This simplifies the code, and also
removes some inconsistencies where a few code paths forgot to clean up
descriptors (though in practice it was not a big deal since we were
exiting anyway).
In addition to die() and die_errno(), we'll also use a few of our usual
helpers like xopen() and usage() that make things more ergonomic.
This does change the exit code in these cases from 1 to 128, but I
don't think it matters (and arguably is better, as we'd already exit 128
for other errors like xmalloc() failure).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Due to portability concerns, we do not blindly say "It is in [[this
standard]], so we will make liberal use of it" for many features,
and use of C99 language features follow this same principle. When
we contemplate adopting a language feature that we haven't used in
our codebase, we typically first raise a test balloon, which
- is a piece of code that exercises the language feature we are
trying to see if it is OK to adopt
- is in a small section of code that we know everybody who cares
about having a working Git must be compiling
- is in a fairly stable part of the code, to allow reverting it
easily if some platforms do not understand it yet.
After a few years, with no breakage report from the community, we'd
declare that the feature is now safe to use in our codebase. Before
that, we forbid the use of the language construct except for the
designated test balloon code site.
The CodingGuidelines document lists these selected features that we
already have determined that they are safe, and also those features
that we know some platforms had trouble with.
Let's also start listing ongoing test balloons and expected timeline
for adoption. Recently phillip proposed to adopt the syntax to
spell a structure literally (i.e. compound literal) with a new test
balloon, which Patrick made redundant by pointing out an existing
one we had already.but without documenting it. Start the new section
with an entry for that test balloon.
Helped-by: Patrick Steinhardt <ps@pks.im>
Helped-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Clean up the way how signature on commit objects are exported to
and imported from fast-import stream.
* cc/fast-import-export-signature-names:
fast-(import|export): improve on commit signature output format
Our <sane-ctype.h> header file relied on that the system-supplied
<ctype.h> header is not later included, which would override our
macro definitions, but "amazon linux" broke this assumption. Fix
this by preemptively including <ctype.h> near the beginning of
<sane-ctype.h> ourselves.
* ps/sane-ctype-workaround:
sane-ctype: fix compiler error on Amazon Linux 2
Lift the limitation to use changed-path filter in "git log" so that
it can be used for a pathspec with multiple literal paths.
* ly/changed-paths-traversal:
bloom: optimize multiple pathspec items in revision
revision: make helper for pathspec to bloom keyvec
bloom: replace struct bloom_key * with struct bloom_keyvec
bloom: rename function operates on bloom_key
bloom: add test helper to return murmur3 hash
There are a couple of -Wsign-compare warnings in "config.c":
- `prepare_include_condition_pattern()` is returns a signed integer,
where it either returns a negative error code or the index of the
last dir separator in a path. That index will always be a
non-negative number, but we cannot just change the return type to a
`size_t` due to it being re-used as error code. This is fixed by
splitting up concerns: the return value is only used as error code,
and the prefix is now returned via an out-pointer. This fixes a sign
comparison warning when comparing `text.len < prefix`,
- We treat `struct config_store_data::seen` as signed integer in
several places even though it's unsigned.
- There are multiple trivial sign comparison warnings where we use a
signed loop index to iterate through an unsigned number of items.
Fix all of these issues and drop the `DISABLE_SIGN_COMPARE_WARNINGS`
macro.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In "config.c" we host both the business logic to read and write config
files as well as the logic to parse specific Git-related variables. On
the one hand this is mixing concerns, but even more importantly it means
that we cannot easily remove the dependency on `the_repository` in our
config parsing logic.
Move the logic into "environment.c". This file is a grab bag of all
kinds of global state already, so it is quite a good fit. Furthermore,
it also hosts most of the global variables that we're parsing the config
values into, making this an even better fit.
Note that there is one hidden change: in `parse_fsync_components()` we
use an `int` to iterate through `ARRAY_SIZE(fsync_component_names)`. But
as -Wsign-compare warnings are enabled in this file this causes a
compiler warning. The issue is fixed by using a `size_t` instead.
This change allows us to drop the `USE_THE_REPOSITORY_VARIABLE`
declaration.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Remove the last couple of wrapper functions that implicitly depend on
`the_repository`.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In 036876a1067 (config: hide functions using `the_repository` by
default, 2024-08-13) we have moved around a bunch of functions in the
config subsystem that depend on `the_repository`. Those function have
been converted into mere wrappers around their equivalent function that
takes in a repository as parameter, and the intent was that we'll
eventually remove those wrappers to make the dependency on the global
repository variable explicit at the callsite.
Follow through with that intent and remove `git_config_set_multivar()`.
All callsites are adjusted so that they use
`repo_config_set_multivar(the_repository, ...)` instead. While some
callsites might already have a repository available, this mechanical
conversion is the exact same as the current situation and thus cannot
cause any regression. Those sites should eventually be cleaned up in a
later patch series.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In 036876a1067 (config: hide functions using `the_repository` by
default, 2024-08-13) we have moved around a bunch of functions in the
config subsystem that depend on `the_repository`. Those function have
been converted into mere wrappers around their equivalent function that
takes in a repository as parameter, and the intent was that we'll
eventually remove those wrappers to make the dependency on the global
repository variable explicit at the callsite.
Follow through with that intent and remove
`git_config_get_multivar_gently()`. All callsites are adjusted so that
they use `repo_config_get_multivar_gently(the_repository, ...)` instead.
While some callsites might already have a repository available, this
mechanical conversion is the exact same as the current situation and
thus cannot cause any regression. Those sites should eventually be
cleaned up in a later patch series.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In 036876a1067 (config: hide functions using `the_repository` by
default, 2024-08-13) we have moved around a bunch of functions in the
config subsystem that depend on `the_repository`. Those function have
been converted into mere wrappers around their equivalent function that
takes in a repository as parameter, and the intent was that we'll
eventually remove those wrappers to make the dependency on the global
repository variable explicit at the callsite.
Follow through with that intent and remove
`git_config_set_multivar_in_file_gently()`. All callsites are adjusted
so that they use
`repo_config_set_multivar_in_file_gently(the_repository, ...)` instead.
While some callsites might already have a repository available, this
mechanical conversion is the exact same as the current situation and
thus cannot cause any regression. Those sites should eventually be
cleaned up in a later patch series.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In 036876a1067 (config: hide functions using `the_repository` by
default, 2024-08-13) we have moved around a bunch of functions in the
config subsystem that depend on `the_repository`. Those function have
been converted into mere wrappers around their equivalent function that
takes in a repository as parameter, and the intent was that we'll
eventually remove those wrappers to make the dependency on the global
repository variable explicit at the callsite.
Follow through with that intent and remove
`git_config_set_in_file_gently()`. All callsites are adjusted so that
they use `repo_config_set_in_file_gently(the_repository, ...)` instead.
While some callsites might already have a repository available, this
mechanical conversion is the exact same as the current situation and
thus cannot cause any regression. Those sites should eventually be
cleaned up in a later patch series.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In 036876a1067 (config: hide functions using `the_repository` by
default, 2024-08-13) we have moved around a bunch of functions in the
config subsystem that depend on `the_repository`. Those function have
been converted into mere wrappers around their equivalent function that
takes in a repository as parameter, and the intent was that we'll
eventually remove those wrappers to make the dependency on the global
repository variable explicit at the callsite.
Follow through with that intent and remove `git_config_set()`. All
callsites are adjusted so that they use `repo_config_set(the_repository,
...)` instead. While some callsites might already have a repository
available, this mechanical conversion is the exact same as the current
situation and thus cannot cause any regression. Those sites should
eventually be cleaned up in a later patch series.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In 036876a1067 (config: hide functions using `the_repository` by
default, 2024-08-13) we have moved around a bunch of functions in the
config subsystem that depend on `the_repository`. Those function have
been converted into mere wrappers around their equivalent function that
takes in a repository as parameter, and the intent was that we'll
eventually remove those wrappers to make the dependency on the global
repository variable explicit at the callsite.
Follow through with that intent and remove `git_config_set_gently()`.
All callsites are adjusted so that they use
`repo_config_set_gently(the_repository, ...)` instead. While some
callsites might already have a repository available, this mechanical
conversion is the exact same as the current situation and thus cannot
cause any regression. Those sites should eventually be cleaned up in a
later patch series.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In 036876a1067 (config: hide functions using `the_repository` by
default, 2024-08-13) we have moved around a bunch of functions in the
config subsystem that depend on `the_repository`. Those function have
been converted into mere wrappers around their equivalent function that
takes in a repository as parameter, and the intent was that we'll
eventually remove those wrappers to make the dependency on the global
repository variable explicit at the callsite.
Follow through with that intent and remove `git_config_set_in_file()`.
All callsites are adjusted so that they use
`repo_config_set_in_file(the_repository, ...)` instead. While some
callsites might already have a repository available, this mechanical
conversion is the exact same as the current situation and thus cannot
cause any regression. Those sites should eventually be cleaned up in a
later patch series.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In 036876a1067 (config: hide functions using `the_repository` by
default, 2024-08-13) we have moved around a bunch of functions in the
config subsystem that depend on `the_repository`. Those function have
been converted into mere wrappers around their equivalent function that
takes in a repository as parameter, and the intent was that we'll
eventually remove those wrappers to make the dependency on the global
repository variable explicit at the callsite.
Follow through with that intent and remove `git_config_get_bool()`. All
callsites are adjusted so that they use
`repo_config_get_bool(the_repository, ...)` instead. While some
callsites might already have a repository available, this mechanical
conversion is the exact same as the current situation and thus cannot
cause any regression. Those sites should eventually be cleaned up in a
later patch series.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In 036876a1067 (config: hide functions using `the_repository` by
default, 2024-08-13) we have moved around a bunch of functions in the
config subsystem that depend on `the_repository`. Those function have
been converted into mere wrappers around their equivalent function that
takes in a repository as parameter, and the intent was that we'll
eventually remove those wrappers to make the dependency on the global
repository variable explicit at the callsite.
Follow through with that intent and remove `git_config_get_ulong()`. All
callsites are adjusted so that they use
`repo_config_get_ulong(the_repository, ...)` instead. While some
callsites might already have a repository available, this mechanical
conversion is the exact same as the current situation and thus cannot
cause any regression. Those sites should eventually be cleaned up in a
later patch series.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In 036876a1067 (config: hide functions using `the_repository` by
default, 2024-08-13) we have moved around a bunch of functions in the
config subsystem that depend on `the_repository`. Those function have
been converted into mere wrappers around their equivalent function that
takes in a repository as parameter, and the intent was that we'll
eventually remove those wrappers to make the dependency on the global
repository variable explicit at the callsite.
Follow through with that intent and remove `git_config_get_int()`. All
callsites are adjusted so that they use
`repo_config_get_int(the_repository, ...)` instead. While some callsites
might already have a repository available, this mechanical conversion is
the exact same as the current situation and thus cannot cause any
regression. Those sites should eventually be cleaned up in a later patch
series.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In 036876a1067 (config: hide functions using `the_repository` by
default, 2024-08-13) we have moved around a bunch of functions in the
config subsystem that depend on `the_repository`. Those function have
been converted into mere wrappers around their equivalent function that
takes in a repository as parameter, and the intent was that we'll
eventually remove those wrappers to make the dependency on the global
repository variable explicit at the callsite.
Follow through with that intent and remove `git_config_get_string()`.
All callsites are adjusted so that they use
`repo_config_get_string(the_repository, ...)` instead. While some
callsites might already have a repository available, this mechanical
conversion is the exact same as the current situation and thus cannot
cause any regression. Those sites should eventually be cleaned up in a
later patch series.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In 036876a1067 (config: hide functions using `the_repository` by
default, 2024-08-13) we have moved around a bunch of functions in the
config subsystem that depend on `the_repository`. Those function have
been converted into mere wrappers around their equivalent function that
takes in a repository as parameter, and the intent was that we'll
eventually remove those wrappers to make the dependency on the global
repository variable explicit at the callsite.
Follow through with that intent and remove `git_config_get_string()`.
All callsites are adjusted so that they use
`repo_config_get_string(the_repository, ...)` instead. While some
callsites might already have a repository available, this mechanical
conversion is the exact same as the current situation and thus cannot
cause any regression. Those sites should eventually be cleaned up in a
later patch series.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In 036876a1067 (config: hide functions using `the_repository` by
default, 2024-08-13) we have moved around a bunch of functions in the
config subsystem that depend on `the_repository`. Those function have
been converted into mere wrappers around their equivalent function that
takes in a repository as parameter, and the intent was that we'll
eventually remove those wrappers to make the dependency on the global
repository variable explicit at the callsite.
Follow through with that intent and remove
`git_config_get_string_multi()`. All callsites are adjusted so that they
use `repo_config_get_string_multi(the_repository, ...)` instead. While
some callsites might already have a repository available, this
mechanical conversion is the exact same as the current situation and
thus cannot cause any regression. Those sites should eventually be
cleaned up in a later patch series.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In 036876a1067 (config: hide functions using `the_repository` by
default, 2024-08-13) we have moved around a bunch of functions in the
config subsystem that depend on `the_repository`. Those function have
been converted into mere wrappers around their equivalent function that
takes in a repository as parameter, and the intent was that we'll
eventually remove those wrappers to make the dependency on the global
repository variable explicit at the callsite.
Follow through with that intent and remove `git_config_get_value()`. All
callsites are adjusted so that they use
`repo_config_get_value(the_repository, ...)` instead. While some
callsites might already have a repository available, this mechanical
conversion is the exact same as the current situation and thus cannot
cause any regression. Those sites should eventually be cleaned up in a
later patch series.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In 036876a1067 (config: hide functions using `the_repository` by
default, 2024-08-13) we have moved around a bunch of functions in the
config subsystem that depend on `the_repository`. Those function have
been converted into mere wrappers around their equivalent function that
takes in a repository as parameter, and the intent was that we'll
eventually remove those wrappers to make the dependency on the global
repository variable explicit at the callsite.
Follow through with that intent and remove `git_config_get_value()`. All
callsites are adjusted so that they use
`repo_config_get_value(the_repository, ...)` instead. While some
callsites might already have a repository available, this mechanical
conversion is the exact same as the current situation and thus cannot
cause any regression. Those sites should eventually be cleaned up in a
later patch series.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In 036876a1067 (config: hide functions using `the_repository` by
default, 2024-08-13) we have moved around a bunch of functions in the
config subsystem that depend on `the_repository`. Those function have
been converted into mere wrappers around their equivalent function that
takes in a repository as parameter, and the intent was that we'll
eventually remove those wrappers to make the dependency on the global
repository variable explicit at the callsite.
Follow through with that intent and remove `git_config_get()`. All
callsites are adjusted so that they use `repo_config_get(the_repository,
...)` instead. While some callsites might already have a repository
available, this mechanical conversion is the exact same as the current
situation and thus cannot cause any regression. Those sites should
eventually be cleaned up in a later patch series.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In 036876a1067 (config: hide functions using `the_repository` by
default, 2024-08-13) we have moved around a bunch of functions in the
config subsystem that depend on `the_repository`. Those function have
been converted into mere wrappers around their equivalent function that
takes in a repository as parameter, and the intent was that we'll
eventually remove those wrappers to make the dependency on the global
repository variable explicit at the callsite.
Follow through with that intent and remove `git_config_clear()`. All
callsites are adjusted so that they use
`repo_config_clear(the_repository, ...)` instead. While some callsites
might already have a repository available, this mechanical conversion is
the exact same as the current situation and thus cannot cause any
regression. Those sites should eventually be cleaned up in a later patch
series.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In 036876a1067 (config: hide functions using `the_repository` by
default, 2024-08-13) we have moved around a bunch of functions in the
config subsystem that depend on `the_repository`. Those function have
been converted into mere wrappers around their equivalent function that
takes in a repository as parameter, and the intent was that we'll
eventually remove those wrappers to make the dependency on the global
repository variable explicit at the callsite.
Follow through with that intent and remove `git_config()`. All callsites
are adjusted so that they use `repo_config(the_repository, ...)`
instead. While some callsites might already have a repository available,
this mechanical conversion is the exact same as the current situation
and thus cannot cause any regression. Those sites should eventually be
cleaned up in a later patch series.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
find_cfg_ent() allocates a struct reflog_expire_entry_option via
FLEX_ALLOC_MEM and inserts it into a linked list in the
reflog_expire_options structure. The entries in this list are never
freed, resulting in a leak in cmd_reflog_expire and the gc reflog expire
maintenance task:
Direct leak of 39 byte(s) in 1 object(s) allocated from:
#0 0x7ff975ee6883 in calloc (/lib64/libasan.so.8+0xe6883)
#1 0x0000010edada in xcalloc ../wrapper.c:154
#2 0x000000df0898 in find_cfg_ent ../reflog.c:28
#3 0x000000df0898 in reflog_expire_config ../reflog.c:70
#4 0x00000095c451 in configset_iter ../config.c:2116
#5 0x0000006d29e7 in git_config ../config.h:724
#6 0x0000006d29e7 in cmd_reflog_expire ../builtin/reflog.c:205
#7 0x0000006d504c in cmd_reflog ../builtin/reflog.c:419
#8 0x0000007e4054 in run_builtin ../git.c:480
#9 0x0000007e4054 in handle_builtin ../git.c:746
#10 0x0000007e8a35 in run_argv ../git.c:813
#11 0x0000007e8a35 in cmd_main ../git.c:953
#12 0x000000441e8f in main ../common-main.c:9
#13 0x7ff9754115f4 in __libc_start_call_main (/lib64/libc.so.6+0x35f4)
#14 0x7ff9754116a7 in __libc_start_main@@GLIBC_2.34 (/lib64/libc.so.6+0x36a7)
#15 0x000000444184 in _start (/home/jekeller/libexec/git-core/git+0x444184)
Close this leak by adding a reflog_clear_expire_config() function which
iterates the linked list and frees its elements. Call it upon exit of
cmd_reflog_expire() and reflog_expire_condition().
Add a basic test which covers this leak. While at it, cover the
functionality from commit commit 3cb22b8efe (Per-ref reflog expiry
configuration, 2008-06-15). We've had this support for years, but lacked
any tests.
Co-developed-by: Jeff King <peff@peff.net>
Signed-off-by: Jacob Keller <jacob.keller@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Fix a resource leak where the file descriptor was not closed after
truncating a file in t/helper/test-truncate.c.
Signed-off-by: Hoyoung Lee <lhywkd22@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* 'master' of https://github.com/j6t/gitk: (21 commits)
gitk: remove header of now empty section "General options"
gitk: separate upstream refs when using the sort-by-type option
gitk: make 'sort-refs-by-type' optional and persistent
gitk: sort by ref type on the 'tags and heads' view
gitk: choosefont - remove a stray debugging line
gitk: allow horizontal commit-graph scrolling
gitk: update aqua scrolling for TclTk 8.6 / TIP171
gitk: update x11 scrolling for TclTk 8.6 / TIP 171
gitk: update win32 scrolling for Tk 8.6 / TIP 171
gitk: mousewheel scrolling functions for Tk 8.6
gitk: wheel scrolling multiplier preference
gitk: separate x11 / win32 / aqua Mouse bindings
gitk: remove non-ttk support code
gitk: replace ${NS} with ttk
gitk: always use themed Tk (ttk)
gitk: use $config_variables as list for save/restore
gitk: remove implementations for Tcl/Tk < 8.6
gitk: Make TclTk 8.6 the minimum, allow 8.7
gitk: remove code targeting git <= 1.7.2
gitk: require git >= 2.20
...
These cases cover scenarios where `gpg.program` is set as a program in
`$PATH` or as a path relative to the user's home directory.
Signed-off-by: Jonas Brandstötter <jonas.brandstoetter@gmx.at>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
An earlier commit remove the only option that was available under
"General options". We don't need the header for the empty section.
Signed-off-by: Johannes Sixt <j6t@kdbg.org>
git-gui has many instances of '-translation binary' and '-encoding
$SOMETHING' on the same channel. As eofchar is always null given a
prior commit, the net effect of having '-translation binary' in such
configuration is only to change how text line endings are handled.
For cases where the channel is opened to be consumed via gets, the eol
translation is irrelevant because Tcl's gets is documented to recognize
any of \n, \r, and \r\n as a line ending. So, keep only the '-encoding
$SOMETHING' configuration in these cases, making the configuration more
clear.
Signed-off-by: Mark Levedahl <mlevedahl@gmail.com>
git-gui currently configures some channels as '-encoding binary' when
the channel is not really binary (e.g, the channel is consumed as lines
of text). In 8.6, '-encoding binary' is an alias for '-encoding
iso8859), but TIP 699 removes this alias for Tcl 9.0. Let's switch to
'-encoding iso8859-1' to be compatible across Tcl versions.
Signed-off-by: Mark Levedahl <mlevedahl@gmail.com>
git-gui has many cases where -translation binary and -encoding binary
are configured on the same channel. But, -translation binary defines a
binary channel, which sets up -encoding iso8859-1 as part of its work.
Tcl 8.x defines -encoding binary as an alias of -encoding iso8859-1, and
this alias is deleted in Tcl 9.0. Let's delete the redundant encoding
definition now.
Signed-off-by: Mark Levedahl <mlevedahl@gmail.com>
Per 6eb420ef61 ("git-gui: Always disable the Tcl EOF character when
reading", 2007-07-17), git-gui should disable Tcl's EOF character
detection on all files when on Windows: the default is disabled on all
other platforms (and with Tcl 9.0, is disabled on Windows too). This
EOF character is for compatibility with files / applications written for
file systems that know only the disc sectors allocated, and not the
number of bytes used. This has nothing to do with git.
But, git-gui does not set -eofchar {} on all channels. To avoid any
further leakage, let's just add this to the Windows specific override of
open. This override is needed only as long as Tcl 8.x is in use (Tcl
9.0 makes -eofchar {} default on all platforms).
Signed-off-by: Mark Levedahl <mlevedahl@gmail.com>
* mr/sort-refs-by-type:
gitk: separate upstream refs when using the sort-by-type option
gitk: make 'sort-refs-by-type' optional and persistent
gitk: sort by ref type on the 'tags and heads' view
The output `git imap-send --list` command can be a bit confusing for new
users since the IMAP LIST command output is very verbose. Help such users
to analyse the same by using an example output.
Signed-off-by: Aditya Garg <gargaditya08@live.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Optimize pop_most_recent_commit() by adding the first parent using the
more efficient prio_queue_peek() and prio_queue_replace() instead of
prio_queue_get() and prio_queue_put().
On my machine this neutralizes the performance hit it took in Git's own
repository when we converted it to prio_queue two patches ago (git_pq):
$ hyperfine -w3 -L git ./git_2.50.1,./git_pq,./git '{git} rev-parse :/^Initial.revision'
Benchmark 1: ./git_2.50.1 rev-parse :/^Initial.revision
Time (mean ± σ): 1.073 s ± 0.003 s [User: 1.053 s, System: 0.019 s]
Range (min … max): 1.069 s … 1.078 s 10 runs
Benchmark 2: ./git_pq rev-parse :/^Initial.revision
Time (mean ± σ): 1.077 s ± 0.002 s [User: 1.057 s, System: 0.018 s]
Range (min … max): 1.072 s … 1.079 s 10 runs
Benchmark 3: ./git rev-parse :/^Initial.revision
Time (mean ± σ): 1.069 s ± 0.003 s [User: 1.049 s, System: 0.018 s]
Range (min … max): 1.065 s … 1.074 s 10 runs
Summary
./git rev-parse :/^Initial.revision ran
1.00 ± 0.00 times faster than ./git_2.50.1 rev-parse :/^Initial.revision
1.01 ± 0.00 times faster than ./git_pq rev-parse :/^Initial.revision
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add a function to replace the top element of the queue that basically
does the same as prio_queue_get() followed by prio_queue_put(), but
without the work by prio_queue_get() to rebalance the heap. It can be
used to optimize loops that get one element and then immediately add
another one. That's common e.g., with commit history traversal, where
we get out a commit and then put in its parents.
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
pop_most_recent_commit() calls commit_list_insert_by_date() for parent
commits, which is itself called in a loop. This can lead to quadratic
complexity if there are many merges. Replace the commit_list with a
prio_queue to ensure logarithmic worst case complexity and convert all
three users.
Add a performance test that exercises one of them using a pathological
history that consists of 50% merges and 50% root commits to demonstrate
the speedup:
Test v2.50.1 HEAD
----------------------------------------------------------------------
1501.2: rev-parse ':/65535' 2.48(2.47+0.00) 0.20(0.19+0.00) -91.9%
Alas, sane histories don't benefit from the conversion much, and
traversing Git's own history takes a 1% performance hit on my machine:
$ hyperfine -w3 -L git ./git_2.50.1,./git '{git} rev-parse :/^Initial.revision'
Benchmark 1: ./git_2.50.1 rev-parse :/^Initial.revision
Time (mean ± σ): 1.071 s ± 0.004 s [User: 1.052 s, System: 0.017 s]
Range (min … max): 1.067 s … 1.078 s 10 runs
Benchmark 2: ./git rev-parse :/^Initial.revision
Time (mean ± σ): 1.079 s ± 0.003 s [User: 1.060 s, System: 0.017 s]
Range (min … max): 1.074 s … 1.083 s 10 runs
Summary
./git_2.50.1 rev-parse :/^Initial.revision ran
1.01 ± 0.00 times faster than ./git rev-parse :/^Initial.revision
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
git-gui on Windows creates a shortcut that presumes the git-gui script
will run on the basic Windows environment as configured. But, Git for
Windows uses wrapper scripts to launch executables, assuring the
environment is correct (see [1] for details). The launcher for git-gui
is /cmd/git-gui.exe, is not on PATH, and is not detected or used by the
current shortcut code. Let's look for this before trying the existing
approaches.
[1] https://gitforwindows.org/git-wrapper.html
Signed-off-by: Mark Levedahl <mlevedahl@gmail.com>
The comment is poorly phrased and it in't clear what it wanted to
say. Strongly discourage this broken pattern to be copied and
pasted to other code paths.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The structure has nothing to do with what "git bisect" does; as
nobody other than "git rev-list" implementation uses it, move it
as a private data type to builtin/rev-list.c
Signed-off-by: Junio C Hamano <gitster@pobox.com>
git-gui has _search_exe as needed to give the executable suffix
(.exe) on Windows. But, the prior commit eliminated the only user of
this variable. Delete it.
Signed-off-by: Mark Levedahl <mlevedahl@gmail.com>
gitexec looks up and caches the method to execute git subcommands using
the long deprecated dashed form if found in $(git--exec-path). But,
git_read and git_write now use the dashless form, by-passing gitexec.
This leaves two remaining uses of gitexec: one during startup to define
use of an ssh_key helper, and one in the about dialog box. These are
neither performance critical nor likely to be called more than once, so
do not justify an otherwise unused cacheing system.
Let's change those two uses, making gitexec unused. This allows removing
gitexec and _git_cmd.
Signed-off-by: Mark Levedahl <mlevedahl@gmail.com>
git-gui implements its own approach to locating and running various git
subcommands, bypassing git's capabilities for running git-*. This was
written in 2007: at that time, many git commands were shell-scripts
stored in $(git --exec-path), git's run-command api was not well adapted
to Windows and had serious performance issues when it worked at all, and
running subcommand 'git foo' as 'git-foo' was common and fully supported.
On Windows, git-gui searches $(git --exec-path) for builtin commands,
then attempts to find an interpreter on PATH to run those, invoking
these differently than on other platforms. For instance, the explicit
shebang #!/usr/bin/perl found in a script will be run by the first Perl
interpreter found on $PATH, which might not be at that specific location
so could be different than what git would run.
The various issues leading to the current implemention no longer exist.
Most git commands are now builtins, links to run those are not installed
in $(git --exec-path) by default (the "dashless" form is recommended
instead), and git's run-command api works well everywhere.
So, let's use git to launch its subcommands on all platforms. Do so by
modifying procs git_read and git_write to use the "dashless" form for
invoking git commands, avoiding the search for git-<foo>. This leaves
_git_cmd unused with cleanup in a later patch.
Signed-off-by: Mark Levedahl <mlevedahl@gmail.com>
git-gui's default clone method is git-clone's default, and this uses
hardlinks rather than copying the objects directory for local
repositories. However, this method explicitly fails if a symlink (or
.gitfile) exists in the path to the objects directory. Thus, the default
clone option fails for worktrees created by git-new-workdir or
git-worktree. git-gui's original do_clone trapped this error for a
symlinked git-new-workdir tree, directly falling back to a full clone,
while the updated git-gui using git-clone does not. (The old do_clone
could not handle gitfile linked worktrees, however).
Let's apply the more friendly fallback to a full clone in both these
cases where git-clone behavior throws an error on the default method.
Signed-off-by: Mark Levedahl <mlevedahl@gmail.com>
git-gui clones a repository by invoking git-plumbing commands, in proc
do_clone, rather than using git-clone. The justification was that the
low-level commands are guaranteed to provide a stable interface, while
the higher level commands such as git-clone may not be stable. This
approach requires git-gui to continually evolve by mirroring new
features in git itself, which has not happened, while the user interface
in git-clone has proven very stable. Also, git-gui does directly call
many other non-plumbing commands in git's repertoire.
do_clone's last significant functionality change was in 2015, and
updates are required for shallow clones, the reftable backend, cloning
from linked worktrees, and perhaps other features and bugs. For
instance, I had reports of git-gui failing to correctly clone
repositories prior to 2015, resulting in essentially the patch given
here. The only significant work was supporting .gitfile linked worktrees
unknown to do_clone, but supported by git-clone, and none regarding the
interface to git-clone itself. That interface is clearly stable enough
to not be a problem.
Supporting new use-cases with this requires exposing new options in the
clone dialog, then passing flags to git-clone. This avoids updating
do_clone to understand those options, reducing the maintenance burdens.
So, teach git-gui to use git-clone. This change is in one patch as
there is no obvious incremental path to migration. The existing dialog /
options / status screen are unchanged, the known user-visible changes
are that cloning from a working directory linked by a gitfile now works,
there is no auto-fallback to a full copy when cloning linked workdirs
and worktrees (meaning git-clone fails unless a full or shared copy is
selected), and messages displayed are from git-clone.
Signed-off-by: Mark Levedahl <mlevedahl@gmail.com>
Git uses `rebase.autostash` or `merge.autostash` to determine whether a
dirty worktree is allowed during pull. However, this behavior is not
clearly documented, making it difficult for users to discover how to
enable autostash, or causing them to unknowingly enable it. Add new
config option `pull.autostash` along with its documentation and test
cases.
`pull.autostash` provides the same functionality as `rebase.autostash`
and `merge.autostash`, but overrides them when set. If `pull.autostash`
is not set, it falls back to `rebase.autostash` or `merge.autostash`,
depending on the value of `pull.rebase`.
Signed-off-by: Lidong Yan <yldhome2d2@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We added the --early-output feature long ago in cdcefbc971 (Add
"--early-output" log flag for interactive GUI use, 2007-11-03). The idea
was that GUIs could use it to progressively render a history view,
showing something quick-and-inaccurate at first and then enhancing it
later.
But we never documented it, and it appears never to have been used, even
by the projects which initially expressed interest. There was an RFC
patch for gitk to use it:
http://public-inbox.org/git/18221.2285.259487.655684@cargo.ozlabs.ibm.com/
but it was never merged. Likewise QGit had a patch in:
https://lore.kernel.org/git/e5bfff550711040225ne67c907r2023b1354c35f35@mail.gmail.com/
but it was never fully merged (to this day, QGit has a commented-out line to
add "--early-output" to the "log" invocation). Searching for other
mentions on the web or forges like github.com turns up nothing.
Meanwhile, the feature has been broken off and on over the years without
anybody noticing (and naturally, there are no tests, either). From 2011
to 2017 the option didn't even turn on via "--early-output"; this was
fixed in e35b6ac56f (revision.h: turn rev_info.early_output back into an
unsigned int, 2017-06-10).
It worked for a while then, but it does not interact well at all with
commit-graphs (which are turned on by default these days). The main
logic to count early commits is triggered by limit_list(), which we
traditionally invoked when showing output in topo-order (and
--early-output always enables --topo-order). But that changed in
f0d9cc4196 (revision.c: begin refactoring --topo-order logic,
2018-11-01). Now when we have generation numbers, we skip limit_list()
entirely, and the early-output code shows no commits, and just the final
header "Final output: 1 done". Which is syntactically OK, but
semantically wrong: that message should give the total number of commits
we're about to show.
So let's drop the feature. It is extra code that is untested and
undocumented, and makes working on the revision machinery more brittle.
Given the history above, it seems unlikely that anybody is using it (or
has used it), and we can drop it without the usual deprecation period.
A gentler option might be to "soft" drop it: keep accepting the option,
have it imply --topo-order as it does now, print "Final output: 1 done",
and then do our regular traversal. That would keep any hypothetical
caller working. But it doesn't seem worth the hassle to me.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The gpg.program configuration variable, which names a pathname to
the (custom) GPG compatible program, can now be spelled with ~tilde
expansion.
* jb/gpg-program-variable-is-a-pathname:
gpg-interface: expand gpg.program as a path
Futz with SIGCHLD handling in "git daemon".
* cb/daemon-reap-children:
daemon: use sigaction() to install child_handler()
compat/mingw: allow sigaction(SIGCHLD)
Doc mark-up updates.
* ja/doc-git-log-markup:
doc: git-log: convert log config to new doc format
doc: git-log: convert diff options to new doc format
doc: git-log: convert pretty formats to new doc format
doc: git-log: convert pretty options to new doc format
doc: git-log: convert rev list options to new doc format
doc: git-log: convert line range format to new doc format
doc: git-log: convert line range options to new doc format
doc: git-log convert rev-list-description to new doc format
doc: convert git-log to new documentation format
Meson-based build update.
* ps/meson-cleanups:
ci: use Meson's new `--slice` option
meson: update subproject wrappers
meson: fix lookup of shell on MINGW64
meson: clean up unnecessary variables
meson: improve summary of auto-detected features
meson: stop printing 'https' option twice in our summaries
meson: stop discovering native version of Python
"git remote" now detects remote names that overlap with each other
(e.g., remote nickname "outer" and "outer/inner" are used at the
same time), as it will lead to overlapping remote-tracking
branches.
* jk/remote-avoid-overlapping-names:
remote: detect collisions in remote names
"pack-objects" has been taught to avoid pointing into objects in
cruft packs from midx.
* tb/midx-avoid-cruft-packs:
repack: exclude cruft pack(s) from the MIDX where possible
pack-objects: introduce '--stdin-packs=follow'
pack-objects: swap 'show_{object,commit}_pack_hint'
pack-objects: fix typo in 'show_object_pack_hint()'
pack-objects: perform name-hash traversal for unpacked objects
pack-objects: declare 'rev_info' for '--stdin-packs' earlier
pack-objects: factor out handling '--stdin-packs'
pack-objects: limit scope in 'add_object_entry_from_pack()'
pack-objects: use standard option incompatibility functions
Prepare to flip the default hash function to SHA-256.
* bc/use-sha256-by-default-in-3.0:
Enable SHA-256 by default in breaking changes mode
help: add a build option for default hash
t5300: choose the built-in hash outside of a repo
t4042: choose the built-in hash outside of a repo
t1007: choose the built-in hash outside of a repo
t: default to compile-time default hash if not set
setup: use the default algorithm to initialize repo format
Use legacy hash for legacy formats
builtin: use default hash when outside a repository
hash: add a constant for the legacy hash algorithm
hash: add a constant for the default hash algorithm
git-gui has code paths to support older non-ttk widgets, but this code
is no longer reachable as ttk is always used. Remove that code.
Signed-off-by: Mark Levedahl <mlevedahl@gmail.com>
Since the upstream refs of local refs may be of more significance in the
context of the local refs, they are sorted after local refs and before the
remainder of the remote refs.
Signed-off-by: Michael Rappazzo <michael.rappazzo@infor.com>
Signed-off-by: Johannes Sixt <j6t@kdbg.org>
On the 'tags and heads' view, add an option to enable or disable
'Sort refs by type'. This option is read from and written to the
config file. Clicking on the option will update the refs in the
view.
Signed-off-by: Michael Rappazzo <michael.rappazzo@infor.com>
Signed-off-by: Johannes Sixt <j6t@kdbg.org>
In the 'tags and heads' view, the list of refs was globally sorted,
which caused the local ref list to be split around other ref list types.
This change re-orders the view to be: local refs, remote refs, tags,
and then other refs.
Signed-off-by: Michael Rappazzo <rappazzo@gmail.com>
Signed-off-by: Johannes Sixt <j6t@kdbg.org>
git-gui invokes the tk_getSaveFile dialog to determine the full
path-name of the shortcut file to create. But, on Windows, this dialog
always dereferences a shortcut (.lnk) file, as this is essentially a
soft-link to its target. If the shortcut file already exists, the dialog
returns the path-name of the target (i.e., GIT/cmd/git-gui.exe), and not
the desired shortcut file selected by the user.
There is no Windows file chooser available in Tcl/Tk that does not
dereference .lnk files, so this patch avoids using a dialog: the
shortcut to be created is on the desktop and named as "Git + Repository
Name". If this .lnk file already exists, the user must give permission
to overwrite it or the process terminates.
Signed-off-by: Mark Levedahl <mlevedahl@gmail.com>
git-gui runs blame and diff commands with nice by default. On Unix, nice
is accepted if found and it will run git. Commit ff9db6c79d ("On
Windows, avoid git-gui to call Cygwin's nice utility", 2010-10-05)
rejects nice if not collocated with git. In Git for Windows' (g4w) POSIX
path name space, nice and git are found in different directories:
$ which git
/mingw64/bin/git
$ which nice
/usr/bin/nice
Thus, git-gui will not use nice in the supported Windows configuration.
Commit ff9db6c79d justifies the collocation requirement as avoiding
problems in a mixed MSYS and Cygwin configuration: such configurations
are not supported by either project as they are known to cause many
problems.
So, let's revert ff9db6c79d and let git-gui work correctly in the
supported configuration.
Signed-off-by: Mark Levedahl <mlevedahl@gmail.com>
git-gui on Windows prepends three directories to PATH so does not honor
PATH as configured. This can have undesirable consequences, for instance
by preventing use of a different git for testing. This also provides at
best a subset of the configuration included with Git for Windows (g4w),
so is neither necessary nor sufficient there.
Since commit be700fe3, git-gui.sh adds its directory to the front of
PATH: this is essentially adding $(git --execdir) to the path, this is
long deprecated as git moved to using "dashless" subcommands.
The windows/git-gui.sh wrapper file, since commit 99fe594d, adds two
directories relative to its installed location to PATH, and does so
without checking that either exists or is needed.
The above modifications were made before the Git For Windows project
took responsibility for distributing a working solution on Windows. g4w
assures a correct configuration on Windows without these, and doing so
requires more than the above modifications. See [1] for a more thorough
treatment.
git-gui does not modify PATH on any platform except on Windows, and
doing so is not needed by g4w. Let's stop modifying PATH on Windows as
well.
[1] https://gitforwindows.org/git-wrapper.html
Signed-off-by: Mark Levedahl <mlevedahl@gmail.com>
git-gui uses ${NS} to switch between non-themed and themed widgets, with
${NS} == 'ttk' selecting the latter. As git-gui now always uses ttk,
this indirection is not needed. Remove it.
Signed-off-by: Mark Levedahl <mlevedahl@gmail.com>
git-gui optionally uses themed ui elements from ttk, but the full set of
ttk ui elements is always available with Tk 8.6. Keeping code making
ttk use optional increases maintenance burden for no benefit. Let's use
ttk always, allowing removal of alternate code paths in subsequent
patches.
Signed-off-by: Mark Levedahl <mlevedahl@gmail.com>
Since commit c80d7be5e1e0d, git-gui checks for the availability of ttk
before enabling its use, but this check is redundant as Tk >= 8.6 is
required. Remove the redundant check.
Signed-off-by: Mark Levedahl <mlevedahl@gmail.com>
git-gui has remnant code to allow some drawing with Tk 8.4 predating the
addition of themed widgets. As git-gui requires Tk >= 8.6, this code can
never trigger. Remove it.
Signed-off-by: Mark Levedahl <mlevedahl@gmail.com>
git-version supports choosing different bodies of code passed into it,
rather than using the more traditional if/else construct typically used.
The only use of git-version in this mode was by its author in 2007, and
that code has been deleted. So, delete this now unused function that
was mostly ignored.
Signed-off-by: Mark Levedahl <mlevedahl@gmail.com>
When creating a new repository, git-gui creates a directory, cds to it,
then runs git-init, but git-init learned to create and initialize the
directory in 1.6.5. git-gui requires git version >= 2.36, so teach
git-gui to use git-init's full capability.
Signed-off-by: Mark Levedahl <mlevedahl@gmail.com>
git-gui checks for git version >= 1.6.6 before enabling the remotes
menu. But git-gui requires git v2.36 or later, so git-remote is always
available. Delete this check and always enable the menu.
Signed-off-by: Mark Levedahl <mlevedahl@gmail.com>
git-gui's merge driver includes code to invoke the recursive strategy
for merging prior to git v2.5 that added a simpler syntax. As git-gui
requires git v2.36 or later, let's delete the code targeting earlier
git.
Signed-off-by: Mark Levedahl <mlevedahl@gmail.com>
git-gui's diff functions avoid using textconv filters on git < 1.6.1, or
asking about submodules on version before 1.7.2, but git-gui requires
git >= v2.36. So, remove this now obsolete code.
Signed-off-by: Mark Levedahl <mlevedahl@gmail.com>
git-gui uses alternate code paths for git versions < 1.7.2, avoiding use
of --ignore-all-space and textconv. git-gui requires git v2.36 or later,
so this alternate code is obsolete. Remove it.
Signed-off-by: Mark Levedahl <mlevedahl@gmail.com>
git-gui has its own code to determine the worktree root for git-versions
earlier than 1.7.0, where git rev-parse learned this function. git-gui
requires git v2.36 or later, so delete the now obsolete alternate code.
Signed-off-by: Mark Levedahl <mlevedahl@gmail.com>
git-gui relies upon the files back-end to determine the current branch.
This does not support the newer reftables backend. But, git-branch has
long supported --show-current to get this same information regardless of
backend cahnged. So teach git-gui to use git-branch --show-current.
Signed-off-by: Mark Levedahl <mlevedahl@gmail.com>
git-gui asks for submodule info only on git-versions >=1.72, which
introduced such capability. But, git-gui requires git version >= 2.36,
so this alternate code path is obsolete. Remove it.
Signed-off-by: Mark Levedahl <mlevedahl@gmail.com>
git-gui includes code to implement ls-files for git versions prior to
1.63 that did not know --exclude-standard. But, git-gui now requires git
version >= 2.36, so remove the obsolete code.
Signed-off-by: Mark Levedahl <mlevedahl@gmail.com>
macOS provides a PCRE2 library in base that is not usable and not
configured properly, as it installs a pkgconf module that
points to a non-existent pcre2.h header in /usr/local/include.
Detect that case and if the feature is enabled, try to fallback
to a wrapped subproject through an anonymous dependency, aborting
with an error if that is not possible.
Change the feature to "auto" and print a warning and disable it
if a broken dependency was detected, but to keep consistency
with the cmake build system used on Windows, add a special rule
to re-enable the pcre2 feature by default there.
Helped-by: Eric Sunshine <sunshine@sunshineco.com>
Suggested-by: Eli Schwartz <eschwartz@gentoo.org>
Signed-off-by: Carlo Marcelo Arenas Belón <carenas@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
git-gui requires that Tcl and Tk are 8.5, though the check using
'package require' allows 8.6. As git-gui runs under wish, both Tcl and
Tk are always available and of the same version, so only one need be
checked.
The 8.5 requirement is very outdated as the earliest Tcl currently
shipping on any supported OS is 8.6. 8.7 is in alpha test and is
generally compatible with 8.6, so should also be allowed. Tcl 9.0 has
planned compatibility breaking changes so cannot be allowed.
Let's update the requirements to be 8.6 or 8.7, and check only on Tcl as
Tk will be the same version.
Signed-off-by: Mark Levedahl <mlevedahl@gmail.com>
git-gui since commit d6967022 explicitly requires version >= 1.5.0, and
this coded requirement has never been changed. But, since 0730a5a3a
git-gui actually requires git 2.36, providing 'git hook run.' git-gui
throws an error if that command is not supported.
So, let's update the requirement checking code to 2.36, and throw a more
useful error if this is not met.
Signed-off-by: Mark Levedahl <mlevedahl@gmail.com>
* bc/use-sha256-by-default-in-3.0:
Enable SHA-256 by default in breaking changes mode
help: add a build option for default hash
t5300: choose the built-in hash outside of a repo
t4042: choose the built-in hash outside of a repo
t1007: choose the built-in hash outside of a repo
t: default to compile-time default hash if not set
setup: use the default algorithm to initialize repo format
Use legacy hash for legacy formats
builtin: use default hash when outside a repository
hash: add a constant for the legacy hash algorithm
hash: add a constant for the default hash algorithm
This output was added in d93f1713b0 ("gitk: Use themed tk widgets",
2009-04-17), we can assume, by accident.
Signed-off-by: Mark Levedahl <mlevedahl@gmail.com>
Signed-off-by: Johannes Sixt <j6t@kdbg.org>
Both `index_fd()` and `index_path()` still use `the_repository` even
though they have a repository available via `struct index_state`. Adapt
them so that they use the index' repository instead to get rid of this
global dependency.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The function `force_object_loose()` forces an object to become a loose
object in case it only exists in its packed form. To do so it implicitly
relies on `the_repository`.
Refactor the function by passing a `struct odb_source` as parameter.
While the check whether any such loose object exists already acts on the
whole object database, writing the loose object happens in one specific
source.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The function `read_loose_object()` takes a path to an object file and
tries to parse it. As such, the function does not depend on any specific
object database but instead acts as an ODB-independent way to read a
specific file. As such, all it needs as input is a repository so that we
can derive repo settings and the hash algorithm.
That repository isn't passed in as a parameter though, as we implicitly
depend on the global `the_repository`. Refactor the function so that we
pass in the repository as a parameter.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The iterators for loose objects still rely on `the_repository`. Refactor
them:
- `for_each_loose_file_in_objdir()` is refactored so that the caller
is now expected to pass an `odb_source` as parameter instead of the
path to that source. Furthermore, it is renamed accordingly to
`for_each_loose_file_in_source()`.
- `for_each_loose_object()` is refactored to take in an object
database now and calls the above function in a loop.
This allows us to get rid of the global dependency.
Adjust callers accordingly.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The function `for_each_file_in_obj_subdir()` is declared in our headers,
but it is not used anywhere else than in the corresponding code file
itself. Drop the declaration and mark the function as file-local.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The function `for_each_loose_file_in_objdir_buf()` is declared in our
headers, but it is not used anywhere else than in the corresponding code
file itself. Drop the declaration and inline the function into its only
caller.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The logic that writes loose objects still relies on `the_repository` to
decide where exactly the object shall be written to. Refactor it so that
the logic instead operates on a `struct odb_source` so that we can get
rid of this global dependency.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We do not have a backend-agnostic way to write objects into an object
database. While there is `write_object_file()`, this function is rather
specific to the loose object format.
Introduce `odb_write_object()` to plug this gap. For now, this function
is a simple wrapper around `write_object_file()` and doesn't even use
the passed-in object database yet. This will change in subsequent
commits, where `write_object_file()` is converted so that it works on
top of an `odb_source`. `odb_write_object()` will then become
responsible for deciding which source an object shall be written to.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When a repository is configured to have a compatibility hash algorithm
we keep track of object ID mappings for loose objects via the loose
object map. This map simply maps an object ID of the actual hash to the
object ID of the compatibility hash. This loose object map is an
inherent property of the loose files backend and thus of one specific
object source.
Refactor the interfaces to reflect this by requiring a `struct
odb_source` as input instead of a repository. This prepares for
subsequent commits where we will refactor writing of loose objects to
work on a `struct odb_source`, as well.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We implicitly depend on `the_repository` when moving an object file into
place in `finalize_object_file()`. Get rid of this global dependency by
passing in a repository.
Note that one might be pressed to inject an object database instead of a
repository. But the function doesn't really care about the ODB at all.
All it does is to move a file into place while checking whether there is
any collision. As such, the functionality it provides is independent of
the object database and only needs the repository as parameter so that
it can adjust permissions of the file we are about to finalize.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
While `loose_object_info()` already accepts a repository as parameter we
still have one callsite in there where we use `the_repository` to figure
out the hash algorithm. Use the passed-in repository instead.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We implicitly depend on `the_repository` when freshening either loose or
packed objects. Refactor these functions to instead accept an object
database as input so that we can get rid of the global dependency.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The `check_and_freshen()` functions are only used by a single caller
now. Inline them into `freshen_loose_object()`.
While at it, rename `check_and_freshen_odb()` to `_source()` to reflect
that it works on a single object source instead of on the whole database.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We implicitly depend on `the_repository` in `has_loose_object()`.
Refactor the function to accept an `odb_source` as input that should be
checked for such a loose object.
This refactoring changes semantics of the function to not check the
whole object database for such a loose object anymore, but instead we
now only check that single source. Existing callers thus need to loop
through all sources manually now.
While this change may seem illogical at first, whether or not an object
exists in a specific format should be answered by the source using that
format. As such, we can eventually convert this into a generic function
`odb_source_has_object()` that simply checks whether a given object
exists in an object source. And as we will know about the format that
any given source uses it allows us to derive whether the object exists
in a given format.
This change also makes `has_loose_object_nonlocal()` obsolete. The only
caller of this function is adapted so that it skips the primary object
source.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
There are a couple of users of the `the_hash_algo` macro, which
implicitly depends on `the_repository`. Adapt these callers to not do so
anymore, either by deriving it from already-available context or by
using `the_repository->hash_algo`. The latter variant doesn't yet help
to remove the global dependency, but such users will be adapted in the
following commits to not use `the_repository` anymore.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
There are some trivial -Wsign-compare warnings in "object-file.c". Fix
them and drop the preprocessor define that disables those warnings.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Tcl/Tk 9.0 has been released, and has shipped in Fedora 42. Prior
patches in this sequence have addressed known incompatibilities, so gitk
is now operating with Tcl9. So, let's allow Tcl9.
Signed-off-by: Mark Levedahl <mlevedahl@gmail.com>
gitk in the prior commit learned to apply -profile tcl8 to all input
data streams, avoiding errors on non-binary data streams whose encoding
is not utf-8. But, gitk also consumes binary data streams (generally blobs
from commits), and internally decodes this to support various displays.
With Tcl9, errors occur in this decoding for the same reasons described
in the previous commit: basically, the underlying data was not validated
to conform to the given encoding, and this source encoding may not be
utf-8. gitk performs this decoding using Tcl's '[encoding convert from'
operator.
For example, the 7th commit in gitk's history has the extended ascii
value 0xA9, so
gitk 9a40c50c1e
in gitk's repository raises an exception. The error log has:
unexpected byte sequence starting at index 11: '\xA9'
while executing
"encoding convertfrom $diffencoding $line"
(procedure "parseblobdiffline" line 135)
invoked from within
"parseblobdiffline $ids $line"
(procedure "getblobdiffline" line 16)
invoked from within
"getblobdiffline file6 9a40c50c1e05c0658b7a7c68b56d615eb6f170dd"
("eval" body line 1)
invoked from within
"eval $script"
(procedure "dorunq" line 11)
invoked from within
"dorunq"
("after" script)
This problem has a similar fix to the prior issue: we must use the tlc8
profile when converting this data. Do so, again only on Tcl9 as Tcl8.6
does not recognize -profile, and only Tcl 9.0 makes strict the default.
Signed-off-by: Mark Levedahl <mlevedahl@gmail.com>
gitk invokes many git commands expecting output in utf-8 encoding, but
git accepts extended ascii (code page unknown) as utf-8 without
validating, so cannot guarantee valid utf-8 on output. In particular,
using any extended ascii code page, of which there are many, has long
been acceptable given that everyone on a project is aware of and uses
that same code page to view all data. utf-8 accepts only 7-bit ascii
characters in single bytes, and any characters outside of that base set
require at least two bytes.
Tcl is a string based language, and transcodes all input data to an
internal unicode format, and to whatever format is requested on output:
"pure" binary is recoded using iso8859-1. Tcl8.x silently recodes
invalid utf-8 as binary data, so extended ascii characters maintain
their binary value on output but may not display correctly.
Tcl 8.7 added three profiles to control this behaviour: strict (raises
exceptions), replace (replaces each invalid byte with ?), and the
default tcl8 maintaining the old behavior. Tcl 9 changes the default
profile to strict, meaning any invalid utf-8 raises an exception that
gitk does not handle.
An example of this in the git repository is commit 7eb93c8965 ("[PATCH]
Simplify git script", 2005-09-07). This includes extended ascii
characters in the author name and commit message. As a result, gitk +
Tcl 9 cannot view the git repository at any point beyond that commit.
Note: Tcl 9.0 has a bug, to be fixed in 9.1, where this particular
condition results in a memory error causing Tcl to crash [1].
The tcl8 profile used so far has acceptable behavior given gitk's
acceptance: this allows gitk to accept extended ascii though it may
display incorrectly. Let's continue that behavior by overriding open to
use the tcl8 profile on Tcl9 and later: Tcl 8.6 does not understand
fconfigure -profile, and Tcl 8.7 maintains the tcl8 profile.
[1] Per https://core.tcl-lang.org/tcl/tktview/73bb42fb3f35cd613af6fcea465e35bbfd352216
Signed-off-by: Mark Levedahl <mlevedahl@gmail.com>
gitk looks for configuration files under $(HOME)/.., and uses the
typical shortcut formats to find this, e.g., ~/.config/. This relies
upon Tcl expanding such constructs to replace ~ with $(HOME). But, Tcl 9
has stopped doing that for various reasons, and now supplies [file
tildeexpand ...] to perform this expansion.
There are a very few places that need this expansion, and all must be
modified regardless of approach taken.
POSIX specifies that $HOME be defined at the time of login, and both
Cygwin and MSYS (underlying git for windows) set this variable. Tcl8
uses the POSIX defined pwnam to look up the underlying database record
on Unix, but will get the same result as using $HOME on any POSIX
compliant system. On Windows, Tcl just accesses $HOME, falling back to
other environment variables if $HOME is not set. Git for Windows has
$HOME defined by MSYS, so this works just as on the others.
As $env(HOME) works in Tcl 8 and 9, while anything using [file
tildeexpand ... ] will not, let's use the simpler approach as doing so
adds no lines of code.
Signed-off-by: Mark Levedahl <mlevedahl@gmail.com>
gitk uses '-encoding binary' in several places to handle non-text data.
Per TIP 699, this is not recommended as there has been too much
confusion and misconfiguration of binary channels, and this option is
removed in Tcl 9.
Tcl defines a binary channel as one that reproduces the input data
exactly. As Tcl stores all data internally in unicode format, a binary
channel requires 3 things:
- -encoding iso8859-1 : this causes each byte of input to be translated
to its unicode equivalent (may be multi-byte).
- -translation lf : this avoids any translation of line endings, which
by default are translated to \n on input.
- -eofchar {} : this avoids any use of an end of file character, which
is ctrl-z by default on Windows.
The recommended '-translation binary' makes all three settings, but this
is not done in gitk now. Rather, gitk uses '-encoding binary', which is
an alias to '-encoding iso8859-1' removed by TIP 699, in multiple places,
and -eofchar {} in one place but not all. All other files, configured in
non-binary fashion, have -eofchar {}.
Unix and Windows differ on line ending conventions, Tcl by default
converts line endings to \n on input, and to those common on the
platform on output. git emits only \n on Unix or Windows. Also, Tcl's
proc gets recognizes and removes \n, \r, or \r\n as line endings, and
this is used by gitk except in procs selectline and parsecommit. But,
those two procs recognize any combination of \n and \r as terminating a
line. So, there is no need to translate line endings on input, and using
-translation binary avoids any such translation.
Tcl sets eofchar to ctrl-z (ascii \0x1a) only on Windows, otherwise
eofchar is {}. This provides compatibility to old DOS based codes and
files originating when file systems recorded only sectors allocated, and
not bytes used. git does not use ctrl-z to terminate data anywhere. Only
two channels in gitk leave eofchar at the default value, both use
-encoding binary now. A third one was converted in commit 681c3290e3
("gitk: Handle blobs containing a DOS end-of-file marker", 2009-03-16),
fixing such a problem of early data termination. Using eofchar {} is
correct, even if not always necessary.
Tcl 9 forces change, using -translation binary per TIP 699 does what
gitk needs and is backwards compatible to Tcl 8.x. Do it.
Signed-off-by: Mark Levedahl <mlevedahl@gmail.com>
TclTk 8.7 (still in alpha), and 9.0 (released), implement TIP 474 that
delivers uniform handling of mouse and touchpad scrolling events on all
platforms, and by default bound to most widgets. TIP 474 also implements
use of the Option- modifier key (Alt- key on PC, Option- key on Macs) to
indicate desire for more motion per scroll wheel event, the
amplification is not defined but seems to be 5x to 10x.
So, for TclTk >= 8.7 we can use identical MouseWheel bindings on all
platforms, and should enable use of the Option- modifier to enable
larger motion. Let's do all of this, and use a 5x multiplier for the
Option- modifier.
This largely follows the prior win32 model, except that Tk 8.6 does not
reliably use the Option- modifier because the Alt- key conflicts with
builtin behavior to activate the main menubar. Presumably this conflict
is addressed in the win32 Tcl9.x package.
Signed-off-by: Mark Levedahl <mlevedahl@gmail.com>
gitk provides a dialog to configure many ui colors. Any color element
changed in the dialog takes immediate effect before closing the dialog.
While cancelling the dialog after changing one or more colors avoids
saving the modified colors, the user must restart gitk to restore the
prior color set. This unfortunate behavior results because gitk does not
have a single routine to update all of the ui colors. The prior commit
eliminated the key impediment to having such a routine. So, let's create
a routine to update all configured colors at once, use this when
modifying colors, and also invoke this after restoring the prior set if
the dialog is cancelled.
Signed-off-by: Mark Levedahl <mlevedahl@gmail.com>
gitk commit 5fdcbb1390 ("gitk: Fixes for Mac OS X TkAqua", 2009-03-23),
adds horizontal scrolling of the commit graph pane on aqua, but not on
x11 or win32. Also, the horizontal scrolling is triggered by MouseWheel
events attached to any of the three panes, not just the commit graph
that is the only one that scrolls. It is unusual to scroll a widget that
is not under the mouse, many would consider this a bug. No horizontal
scrollbar is provided for this, so there is no real cue for the user
that horizontal scrolling is available. We removed this aqua only
feature by transitioning aqua to use the common MouseWheel bindings set.
Let's add this as a feature on all platforms, and use the same approach
for scaling scroll motion as we do elsewhere. For horizontal scrolling,
honor only events received by the commit graph in conformance with
normal GUI design. Vertical scrolling is unchanged, and events received
by any of the 3 panes continue to scroll all 3 in unison.
Per the ancient and long ignored CUA standards, we should add a
horizontal scrollbar to the commit-graph, but gitk's interface is
already very cluttered: adding a scrollbar to only one of these three
panes is difficult while maintaining common pane vertical size,
especially so considering the movable sash separating panes 1 & 2, and
will consume yet more space. So, leave this as a hidden feature, now
available on all platforms.
Signed-off-by: Mark Levedahl <mlevedahl@gmail.com>
gitk's color selection dialog uses a number of "label" widgets to show
the current value of each selectable color. This uses the -background
color property of label widgets, and this property is overwritten when
the full ui color set is refreshed. The swatch colors are set
individually using code passed into the chooser dialog, so there is no
common routine to set all after updating the global ui colors.
Let's replace this with a single routine that does set all swatches,
removing a key impediment to restoring the ui colors if the dialog is
cancelled.
Signed-off-by: Mark Levedahl <mlevedahl@gmail.com>
Tk provides MouseWheel events to aqua, similar to win32. But, these
events on aqua have a nominal motion value (%D) of 1, not 120 as on
win32. gitk on aqua provides specific bindings only for the top 3 panes,
giving a nominal scrolling amount of +/- 1 for all events. gitk includes
a hidden feature providing horizontal scrolling of the commit graph,
added in 5fdcbb1390 ("gitk: Fixes for Mac OS X TkAqua", 2009-03-23).
This horizontal scrolling is triggered by mouse events in any of the top
3 panes, and thus violates normal gui design where the object under the
mouse cursor scrolls.
Let's update this using the common bindings in 'proc bind_mousewheel',
allowing user preferences on motion scaling to apply to all windows.
The commit graph scrolling feature is removed by this, and will be added
back for all platforms in a later commit.
Signed-off-by: Mark Levedahl <mlevedahl@gmail.com>
gitk has x11 mouse bindings that receive button presses, not MouseWheel
events, as this is the Tk implementation through Tk 8.6. On x11, gitk
translates each button event to a scrolling value of +/- 5 for the upper
three panes that scroll vertically as one unit. gitk applies similar
scaling for horizontal scaling of the lower-left commit details pane
(ctext), but not for vertical scrolling of either of the bottom panes.
Rather, the Tk default scrolling actions are used for vertical
scrolling.
Let's make X11 behave similarly to the just modified win32 platform. Do
so by connecting vertical and horizontal scrolling events for the same
items bound in 'proc bind_mousewheel' and using the same user preference
values.
Signed-off-by: Mark Levedahl <mlevedahl@gmail.com>
gitk on win32 binds windows_mousewheel_redirector to all MouseWheel
events in the main window. This proc determines the widget under the
cursor, then determines what scroll command to give, possibly none, and
issues scroll commands to the widget. The top panes get only vertical
scroll events, as does the lower right Patch/Tree pane. All others get
both vertical and horizontal events. These are all hard coded at +/-
five lines.
We now have common MouseWheel event bindings that follow user
preferences for the scrolling amount, bind for only the five main
display widgets, and leave the other gui elements untouched. Let's use
this instead. With the scrolling preference set at 5, the users should
not notice much, if any, difference.
Signed-off-by: Mark Levedahl <mlevedahl@gmail.com>
gitk supports scrolling of 5 windows, but does this differently on the
aqua, x11, and win32 platforms as Tk provides different events on each.
TIP 171 removes some differences on win32 while altering the required
bindings on x11. TIP 474, which is in Tk 8.7 and later, finally unifies
all platforms on using common MouseWheel bindings. Importantly for now,
TIP 171 causes delivery of MouseWheel events to the widget under the
mouse cursor on win32, eliminating the need for completely different
bindings on win32.
Let's make some common functions to unify as much as we can in Tk 8.6.
Examining the platforms shows that the default platform scrolling is
overridden differently on the 3 platforms, and the nominal amount of
motion achieved per mouse wheel "click" is different. win32 nominally
makes everything move 5 lines per click, aqua 1 line per click, and x11
is a mixture. Part of this is due to win32 overriding all scroll events,
while x11 and aqua override smaller sets. Also, note that the text
widgets (the lower two panes) always scroll by 2-3 lines when given a
smaller scroll amount, while the upper three canvas objects follow the
requested scrolling value more accurately.
First, let's have a common routine to calculate the scroll value to give
to a widget in an event. This accounts for the user preference, the
scale of the %D (delta) value given by the event (120 on win32, 1 on
aqua, assumed 1 on x11), and must always be integer. Include negation as
by convention the screen moves opposite to the MouseWheel delta. Allow
setting an offset value to account for the larger minimum scrolling of
text widgets.
Second, let's have a common declaration of MouseWheel event bindings, as
those are shared by all in Tcl9, and by aqua/win32 earlier. Bind all
five display windows here. Note that the Patch/Tree widget (cflist)
cannot scroll horizontally.
Signed-off-by: Mark Levedahl <mlevedahl@gmail.com>
gitk provides scrolling of several windows, uses hard-coded values for
the amount of scrolling, and these values differ across platforms and
widgets. The nominal value used is either 1 text line per mouse /
touchpad / button event, or 5 lines. Furthermore, Tk does not scroll
text widgets by 1 line when told to, this usually gets 2-3 lines of
motion. The upper canvas objects holding the commit graph do scroll as
defined. But, clearly no value is universally preferred, so let's give
the user some control over this. Provide a single multiplier to be
applied for all scroll bindings, with a value of 3 to mean the default
nominal value of 3 line. This is selected both as a compromise between
the various defaults across platforms, and because it is the smallest
value honored by the two text widgets on the bottom of the screen.
Later commits will connect this variable for actual scrolling events.
Signed-off-by: Mark Levedahl <mlevedahl@gmail.com>
Tk through 8.6 has different approaches for handling mouse wheel /
touchpad scrolling events on the different platforms, and gitk has
separate code for these. But, some x11 bindings are applied on aqua as
we do not have these in a clean if / then / else tree based upon
platform. Let's split these bindings apart.
Signed-off-by: Mark Levedahl <mlevedahl@gmail.com>
gitk has code and variables to use the earlier non-themed widget set,
but this code is now irrelevant as gitk now always uses ttk. Clean this
up.
Signed-off-by: Mark Levedahl <mlevedahl@gmail.com>
gitk uses ${NS} to select between the original Tk widgets and the newer
themed widgets in ttk. As gitk uses only themed widgets from ttk::,
this indirection now serves no purpose, so let's switch to explicit use
of ttk:: via global search/replace. More simplification, including
removal of the NS variable, is kept for a later patch to keep this one
smaller.
Signed-off-by: Mark Levedahl <mlevedahl@gmail.com>
gitk added the option to used themed Tk (ttk) in 0cc08ff7dd ("gitk: Add
a user preference to enable/disable use of themed widgets", 2009-09-05).
Using ttk had to be optional as Tk 8.4, then in common use, does not
have ttk. ttk is the default when available, so the ttk code paths are
by now very well tested. gitk also has code paths for the older default
widgets, increasing the maintenance burden. Let's make ttk non-optional
to reduce code complexity in later commits.
Signed-off-by: Mark Levedahl <mlevedahl@gmail.com>
gitk includes many user defined configuration variables, has all of
these are listed in $config_variables. But this list is not used to
define the variables to be loaded, saved, or restored when cancelling
the configuration dialog, and developers must maintain separate lists of
variables for these purposes. This leads to unnecessary errors and merge
conflicts. Let's replace those separate lists with $config_variables to
make maintenance easier.
While we are on topic, sort the list of names in $config_variables.
This makes it simpler to scan and has fewer chances of conflicts
when new names are introduced.
Helped-by: Johannes Sixt <j6t@kdbg.org>
Signed-off-by: Mark Levedahl <mlevedahl@gmail.com>
eab5dbab (ci: wire up Meson builds, 2024-12-13) added two instances
of a very similar construct
FAILED_TEST_ARTIFACTS=${TEST_OUTPUT_DIRECTORY:-t}/failed-test-artifacts
one to ci/lib.sh and the other to ci/print-test-failures.sh
Unfortunately, the latter had a typo causing shell to emit "Bad
substitution". Fix it.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This patch adds the basic support of SHA256 Git repositories.
Most of changes are idiomatic replacement of the hard-coded hash ID
length, but there are subtle things:
* The hash length is determined on startup, and stored in $hashlength
global variable (either 40 or 64).
* The hard-coded "40" are replaced with $hashlength;
for regexp patterns, the ugly string map is used.
* Some code have the fixed numbers like 39 and 45, and those are
replaced with the $hashlength and the offset correction.
* $nullid and $nullid2 are generated for the hash length.
A caveat is that repository picker dialog is performed before
evaluating the repo type, hence $hashlength isn't set there yet.
So the code dealing with the hard-coded "40" are handled differently;
namely, the regexp range is expanded, and the null id is generated
from the HEAD id length locally.
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Johannes Sixt <j6t@kdbg.org>
Documentation updates for "git send-email".
* ag/doc-send-email:
docs: mention possible options for Proton Mail users
docs: add a paragraph explaining the `sendmailCmd` option of sendemail
docs: add an OAuth2.0 credential helper for AOL accounts
docs: add outlookidfix config option to sendemail documentation
docs: link OpenSSL's verify(1) manual page to know about -CAfile and -CApath options
Define .precision to more canned parse-options type to avoid bugs
coming from using a variable with a wrong type to capture the
parsed values.
* rs/parse-options-precision:
parse-options: add precision handling for OPTION_COUNTUP
parse-options: add precision handling for OPTION_BITOP
parse-options: add precision handling for OPTION_NEGBIT
parse-options: add precision handling for OPTION_BIT
parse-options: add precision handling for OPTION_SET_INT
parse-options: add precision handling for PARSE_OPT_CMDMODE
parse-options: require PARSE_OPT_NOARG for OPTION_BITOP
When a ref creation at refs/heads/foo/bar fails, the files backend
now removes refs/heads/foo/ if the directory is otherwise not used.
* ps/refs-files-remove-empty-parent:
refs/files: remove empty parent dirs when ref creation fails
"git fetch --prune" used to be O(n^2) expensive when there are many
refs, which has been corrected.
* ph/fetch-prune-optim:
clean up interface for refs_warn_dangling_symrefs
refs: remove old refs_warn_dangling_symref
fetch-prune: optimize dangling-ref reporting
Both $nullid and $null_sha1 point to the same content.
Use only $nullid consistently.
This is a preliminary cleanup for adding the support of SHA256 repo.
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Johannes Sixt <j6t@kdbg.org>
gitk includes code specifically for Tcl 8.4 and 8.5, but the requirement
is now for at least 8.6. Remove the now unusable code targeting earlier
Tcl.
Signed-off-by: Mark Levedahl <mlevedahl@gmail.com>
gitk runs under wish so naturally has Tcl and Tk available and of the
same version. gitk sets a requirement on Tk version >= 8.4: this is very
outdated, and the earliest Tcl currently shipping on any supported OS is
8.6. As 8.7 is in alpha test and is generally compatible with 8.6, we
should allow 8.7. Tcl 9.0 has planned compatibility breaking changes so
is not yet supported.
Let's change the requirements to 8.6-8.7, but not 9.0. Place this at the
top of file so the requirements are obvious.
Signed-off-by: Mark Levedahl <mlevedahl@gmail.com>
gitk has a few code fragments that are used only for git versions <=
1.7.2 that do not support submodules, notes, word differences, or
textconv filters. We just set the minimum git version higher than 1.7.2
so these code fragments have no effect. Delete them.
Helped-by: Johannes Sixt <j6t@kdbg.org>
Signed-off-by: Mark Levedahl <mlevedahl@gmail.com>
gitk has alternate code paths for early git up to 1.72, and has no
defined minimum version. Setting any version > 1.72 as minimum will
allow removing those code paths.
The recent set of advisories published for git, gitk, and git-gui add
updates for v2.43 and later, but Debian has buster withgit 2.20 still
available. While Debian would be responsible for backporting any fixes
to such an early version, we have no good reason preclude it.
So, make 2.20 the minimum required git version.
Helped-by: Johannes Sixt <j6t@kdbg.org>
Signed-off-by: Mark Levedahl <mlevedahl@gmail.com>
If conflict comments already use a comment character that isn't "#", and
core.commentChar is set "auto", Git will ignore these lines during the
scan using ignored_log_message_bytes() and pick a new comment character
based on the rest of the message. The newly chosen character may be
different from the one used in the conflict comments and therefore,
these are no longer treated as comments and end up in the final commit
message.
For example, during a rebase if the user previously set
core.commentChar=% and then encounters a conflict, conflict comments
like "% Conflicts:" are generated. If the user subsequently sets
core.commentChar=auto before running `rebase --continue`, Git parses the
"auto" setting and begins scanning. It first uses the existing
'comment_line_str' (which is '%') to detect and ignore conflict comments
via ignored_log_message_bytes().
Then, Git scans the rest of the message (excluding conflict comments),
sees that none of the remaining lines start with '#' and decides to set
comment_line_str to '#'. Since the final commit character differs from
the one used in the conflict comments, those lines are no longer
considered comments and get included in the final commit message.
Set 'comment_line_str' to '#' when core.commentChar is set to 'auto' to
reset any previously set value.
While this does not solve the issue of conflict comment inclusion and
the user visible behaviour stays tha same, it standardizes the behaviour
of the code by always resetting 'comment_line_str' to '#' when
core.commentChar=auto is parsed.
The patch text is based on Phillip Wood's message:
https://lore.kernel.org/git/9e96aaab-79a2-4632-94cd-d016d4a63b30@gmail.com/
and the commit log message is wriiten by me.
Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Mentored-by: Christian Couder <christian.couder@gmail.com>
Mentored-by: Ghanshyam Thakkar <shyamthakkar001@gmail.com>
Signed-off-by: Ayush Chandekar <ayu.chandekar@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When core.commentChar is set to "auto", Git selects a comment character
by scanning the commit message contents and avoiding any character
already present in the message.
If the message still contains old conflict comments (starting with a
comment character), Git assumes that character is in use and chooses a
different one. As a result, those existing comment lines are no longer
recognized as comments and end up being included in the final commit
message.
To avoid this, skip scanning the trailing comment block when selecting
the comment character. This allows Git to safely reuse the original
character when appropriate, keeping the commit message clean and free of
leftover conflict information.
Background:
The "auto" value for core.commentchar was introduced in the commit
84c9dc2c5a (commit: allow core.commentChar=auto for character auto
selection, 2014-05-17) but did not exhibit this issue at that time.
The bug was introduced in commit a6c2654f83 (rebase -m: fix --signoff
with conflicts, 2024-04-18) where Git started writing conflict comments
to the file at 'rebase_path_message()'.
Mentored-by: Christian Couder <christian.couder@gmail.com>
Mentored-by: Ghanshyam Thakkar <shyamthakkar001@gmail.com>
Signed-off-by: Ayush Chandekar <ayu.chandekar@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Now that the string predicates defined in git-compat-util.h all
return bool let's convert the return type of the string predicates
in strbuf.{c,h} to match them.
Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Since 8277dbe987 (git-compat-util: convert skip_{prefix,suffix}{,_mem}
to bool, 2023-12-16) a number of our string predicates have been
returning bool instead of int. Now that we've declared that experiment
a success, let's convert the return type of the case-independent
skip_iprefix() and skip_iprefix_mem() functions to match the return
type of their case-dependent equivalents. Returning bool instead of
int makes it clear that these functions are predicates.
Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We have had a test balloon for C99's bool type since 8277dbe987
(git-compat-util: convert skip_{prefix,suffix}{,_mem} to bool,
2023-12-16). As we've had it over 18 months without any complaints
let's declare it a success.
Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Our submission guidelines require people to use their real name, but
this is not always suitable for various reasons.
For people who are transgender or non-binary and are transitioning or
who think they might want to transition, it can be a major obstacle and
cause major discomfort to require the use of their real name. This is
made worse by the fact that Git provides no way to change names built
into history, so the use of a deadname is forever. Our code of conduct
states that we "pledge to act and interact in ways that contribute to an
open, welcoming, diverse, inclusive, and healthy community," and
changing this policy is one way we can improve things for contributors.
In addition, there are some developers who are so widely known
pseudonymously that they have a Wikipedia page with their handle and no
real name. It would seem silly to reject patches from people who are
known and respected in their open-source community just because they
don't wish to share a real name.
There are also other good reasons why people might operate
pseudonymously: because they or their family members are well known and
they wish to protect their privacy, because of current or past
harassment or retaliation or fear of that happening in the future, or
because of concerns about unwanted attention from government officials
or other authority figures. As much as possible, we want to welcome
contributions from anyone who is willing to participate positively in
our community without having them worry about their safety or privacy.
In all of these cases, we should allow people to proceed using a
preferred name or pseudonymously if, in their best judgment, that's the
right thing to do. State that it is common to use a real name but
explicitly mention that contributors who are not comfortable doing so or
prefer to operate pseudonymously or under a preferred name can proceed
otherwise, provided the name is distinctive, identifying, and not
misleading. For instance, using U+2060 (WORD JOINER) as one's ID would
likely be distinctive but not identifying, since most people would have
trouble reading it due to its zero-width nature.
We prohibit identities which are misleading, since our goal is to create
a community which works together with a common goal, and misleading or
deceiving others is not conducive to good community or compatible with
our code of conduct, nor is it compatible with making a legal assertion
about the provenance of one's code.
Explicitly prohibit anonymous contributions to ensure that we have some
line of provenance to a known (if pseudonymous) author who might be able
to respond to questions about it. Explain that this is the reason we
have this policy to help contributors understand the rationale better.
Use "some form of your real name" since some current contributors use
shortened forms of their name or use initials, which have always been
considered acceptable. This helps guide people who would be fine using
their real name but have misconfigured `user.name` thinking it is
intended to be a username or is used for authentication (despite our
documentation to the contrary), but also allows for a variety of
circumstances where the contributor would feel more comfortable not
doing so.
Note that this policy is the same as that of the Linux kernel[0] and the
CNCF[1], as well as many smaller projects. The Linux kernel patch was
Acked-by one of the Linux Foundation's lawyers, Michael Dolan, so it
appears these changes have had legal review.
Additionally, retain the section header ID for ease of linking across
versions.
[0] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=d4563201f33a022fc0353033d9dfeb1606a88330
[1] https://github.com/cncf/foundation/blob/659fd32c86dc/dco-guidelines.md
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Commit bf5ce434db ("l10n: Add full Irish translation (ga.po)", 2025-05-16)
added a new translation to git. In a make build, new 'po' files (ga.po
in this case) are added to the build automatically using a wildcard
pattern. In a meson build you have to add the language code ('ga') to a
list explicitly to have it included in the build. In order to include the
new translation in the meson build, add the 'ga' language code to the
list of translations in the 'po/meson.build' file.
Signed-off-by: Ramsay Jones <ramsay@ramsayjones.plus.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
commit 837f637cf5 ("meson.build: correct setting of GIT_EXEC_PATH",
2025-05-19) corrected the GIT_EXEC_PATH build setting, but then forgot
to update the installation path for the library executables. This causes
a regression when attempting to execute commands, after installing to a
non-standard location (reported here[1]):
$ meson -Dprefix=/tmp/git -Dlibexecdir=libexec-different build
$ meson install
$ /tmp/git/bin/git --exec-path
/tmp/git/libexec-different
$ /tmp/git/bin/git daemon
git: 'daemon' is not a git command. See 'git --help'
In order to fix the issue, use the 'git_exec_path' variable (calculated
while processing -Dlibexecdir) as the 'install_dir' field during the
installation of the library executables.
[1]: <66fd343a-1351-4350-83eb-c797e47b7693@gmail.com>
Reported-by: irecca.kun@gmail.com
Signed-off-by: Ramsay Jones <ramsay@ramsayjones.plus.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Leakfix with a new and a bit invasive test.
* ly/load-bitmap-leakfix:
pack-bitmap: add load corrupt bitmap test
pack-bitmap: reword comments in test_bitmap_commits()
pack-bitmap: fix memory leak if load_bitmap() failed
Code clean-up around object access API.
* ps/object-store:
odb: rename `read_object_with_reference()`
odb: rename `pretend_object_file()`
odb: rename `has_object()`
odb: rename `repo_read_object_file()`
odb: rename `oid_object_info()`
odb: trivial refactorings to get rid of `the_repository`
odb: get rid of `the_repository` when handling submodule sources
odb: get rid of `the_repository` when handling the primary source
odb: get rid of `the_repository` in `for_each()` functions
odb: get rid of `the_repository` when handling alternates
odb: get rid of `the_repository` in `odb_mkstemp()`
odb: get rid of `the_repository` in `assert_oid_type()`
odb: get rid of `the_repository` in `find_odb()`
odb: introduce parent pointers
object-store: rename files to "odb.{c,h}"
object-store: rename `object_directory` to `odb_source`
object-store: rename `raw_object_store` to `object_database`
The compiler is in general able to recognize the endian shift and
replace it with an optimized opcode if possible. On certain
architectures such as RiscV or MIPS the situation can get complicated.
They don't provide an optimized opcode and masking the "higher" bits may
required loading a constant which needs shifting. This causes the
compiler to emit a lot of instructions for the operation.
The provided builtin directive on these architecture calls a function
which does the operation instead of emitting the code for operation.
Bring back the change from commit 6547d1c9 (bswap.h: add support for
built-in bswap functions, 2025-04-23). The bswap32/64 macro can now be
defined unconditionally so it won't regress on big endian architectures.
Signed-off-by: Sebastian Andrzej Siewior <sebastian@breakpoint.cc>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
On x86 the bswap32/64 macro is implemented based on the x86 opcode which
performs the required shifting in just one opcode.
The other CPUs fallback to the generic shifting as implemented by
default_swab32() and default_bswap64() if needed.
I've been looking at how good a compiler is at recognizing the default
shift and emitting an optimized operation:
- x86, arm64 msvc v19.20
default_swab32() optimized
default_bswap64() shifts
_byteswap_uint64() optimized
- x86, arm64 msvc v19.37
default_swab32() optimized
default_bswap64() optimized
_byteswap_uint64() optimized
- arm64, gcc-4.9.4: optimized
- x86-64, gcc-4.4.7: shifts
- x86-64, gcc-4.5.3: optimized
- x86-64, clang-3.0: optimized
Given that gcc-4.5 and clang-3.0 are fairly old, any recent compiler
should recognize the shift.
Remove the optimized x86 version and rely on the compiler.
Signed-off-by: Sebastian Andrzej Siewior <sebastian@breakpoint.cc>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The ntohl and htonl macros are redefined because the provided macros were
not always optimal. Sometimes it was a function call, sometimes it was a
macro which did the shifting. Using the 'bswap' opcode on x86 provides
probably better performance than performing the shifting.
These macros are only overwritten on x86 if the "optimized" version is
available.
The ntohll and htonll macros are not available on every platform (at
least glibc does not provide them) which means they need to be defined
once the endianness of the system is determined.
In order to get a more symmetrical setup, redfine the macros once the
endianness of the system has been determined.
Signed-off-by: Sebastian Andrzej Siewior <sebastian@breakpoint.cc>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The Microsoft Visual C++ (MSVC) compiler (as of Visual Studio 2022
version 17.13.6) does not define __BYTE_ORDER__ and its C-library does
not define __BYTE_ORDER. The compiler is supported only on arm64 and x86
which are all little endian.
Define GIT_BYTE_ORDER on msvc as little endian to avoid further checks.
Signed-off-by: Sebastian Andrzej Siewior <sebastian@breakpoint.cc>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The __BYTE_ORDER__ define is provided by gcc (since ~v4.6), clang
(since ~v3.2) and icc (since ~16.0.3).
The __BYTE_ORDER and BYTE_ORDER macros are libc specific and are not
available on all supported platforms such as mingw.
Add support for the __BYTE_ORDER__ macro as a fallback.
Signed-off-by: Sebastian Andrzej Siewior <sebastian@breakpoint.cc>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
$GIT_TEST_INSTALLED can be set to use an "installed" git instead of the
one from $GIT_BUILD_DIR. This is used by my company's internal test
infrastructure, and not using $GIT_TEST_INSTALLED when querying the
default hash meant that the tests were failing because the hash was
effectively set to the empty string (since git didn't execute).
In the two places we attempt to detect/execute git itself prior to
overriding everything and putting it in $PATH, use identical logic for
identifying the git binary to execute. This also has the effect of
including the $X suffix when querying the default hash, but that's not
strictly necessary. You don't need to specify .exe when running a binary
on Windows, just when testing whether it exists or not.
Signed-off-by: Kyle Lippincott <spectral@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
As well as receiving the config key and value, config callbacks
also receive a "struct key_value_info" containing information about
the source of the key-value pair. Accessing the "path" field of
this struct from a callback passed to repo_config() results in a
use-after-free. This happens because repo_config() first populates a
configset by calling config_with_options() and then iterates over the
configset with the callback passed by the caller. When the configset
is constructed it takes a shallow copy of the "struct key_value_info"
for each config setting. This leads to the use-after-free as the
"path" member is freed before config_with_options() returns.
We could fix this by interning the "path" field as we do
for the "filename" field but the "path" field is not actually
needed. It is populated with a copy of the "path" field from "struct
config_source". That field was added in d14d42440d8 (config: disallow
relative include paths from blobs, 2014-02-19) to distinguish between
relative include directives in files and those in blobs. However,
since 1b8132d99d8 (i18n: config: unfold error messages marked for
translation, 2016-07-28) we can differentiate these by looking at the
"origin_type" field in "struct key_value_info". So let's remove the
"path" members from "struct config_source" and "struct key_value_info"
and instead use a combination of the "filename" and "origin_type"
fields to determine the absolute path of relative includes.
Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In the preceding commits we have migrated all users of the linked list
of multi-pack indices to instead use those stored in the object database
sources. Remove those now-unused pointers.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Refactor `get_all_packs()` so that we stop using the linked list of
multi-pack indices. Note that there is no need to explicitly prepare
alternates, and neither do we have to use `get_multi_pack_index()`,
because `prepare_packed_git()` already takes care of populating all data
structures for us.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Refactor `find_pack_entry()` so that we stop using the linked list of
multi-pack indices. Note that there is no need to explicitly prepare
alternates, and neither do we have to use `get_multi_pack_index()`,
because `prepare_packed_git()` already takes care of populating all data
structures for us.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The function `get_multi_pack_index()` loads multi-pack indices via
`prepare_packed_git()` and then returns the linked list of multi-pack
indices that is stored in `struct object_database`. That list is in the
process of being removed though in favor of storing the MIDX as part of
the object database source it belongs to.
Refactor `get_multi_pack_index()` so that it returns the multi-pack
index for a single object source. Callers are now expected to call this
function for each source they are interested in. This requires them to
iterate through alternates, so we have to prepare alternate object
sources before doing so.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When calling `close_midx()` we not only close the multi-pack index for
one object source, but instead we iterate through the whole linked list
of MIDXs to close all of them. This linked list is about to go away in
favor of using the new per-source pointer to its respective MIDX.
Refactor the function to iterate through sources instead.
Note that after this patch, there's a couple of callsites left that
continue to use `close_midx()` without iterating through all sources.
These are all cases where we don't care about the MIDX from other
sources though, so it's fine to keep them as-is.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In the preceding commit we refactored how we load multi-pack indices to
take a corresponding "source" as input. As part of this refactoring we
started to store a pointer to the MIDX in `struct odb_source` itself.
Refactor loading of packfiles in the same way: instead of passing in the
object directory, we now pass in the source from which we want to load
packfiles. This allows us to simplify the code because we don't have to
search for a corresponding MIDX anymore, but we can instead directly use
the MIDX that we have already prepared beforehand.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Multi-pack indices are tracked via `struct multi_pack_index`. This data
structure is stored as a linked list inside `struct object_database`,
which is the global database that spans across all of the object
sources.
This layout causes two problems:
- Object databases consist of multiple object sources (e.g. one source
per alternate object directory), where each multi-pack index is
specific to one of those sources. Regardless of that though, the
MIDX is not tracked per source, but tracked globally for the whole
object database. This creates a mismatch between the on-disk layout
and how things are organized in the object database subsystems and
makes some parts, like figuring out whether a source has an MIDX,
quite awkward.
- Multi-pack indices are an implementation detail of how efficient
access for packfiles work. As such, they are neither relevant in the
context of loose objects, nor in a potential future where we have
pluggable backends.
Refactor `prepare_multi_pack_index_one()` so that it works on a specific
source, which allows us to easily store a pointer to the multi-pack
index inside of it. For now, this pointer exists next to the existing
linked list we have in the object database. Users will be adjusted in
subsequent patches to instead use the per-source pointers.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* tb/midx-avoid-cruft-packs:
repack: exclude cruft pack(s) from the MIDX where possible
pack-objects: introduce '--stdin-packs=follow'
pack-objects: swap 'show_{object,commit}_pack_hint'
pack-objects: fix typo in 'show_object_pack_hint()'
pack-objects: perform name-hash traversal for unpacked objects
pack-objects: declare 'rev_info' for '--stdin-packs' earlier
pack-objects: factor out handling '--stdin-packs'
pack-objects: limit scope in 'add_object_entry_from_pack()'
pack-objects: use standard option incompatibility functions
The `git-for-each-ref(1)` command is used to iterate over references
present in a repository. In large repositories with millions of
references, it would be optimal to paginate this output such that we
can start iteration from a given reference. This would avoid having to
iterate over all references from the beginning each time when paginating
through results.
The previous commit added 'seek' functionality to the reference
backends. Utilize this and expose a '--start-after' option in
'git-for-each-ref(1)'. When used, the reference iteration seeks to the
lexicographically next reference and iterates from there onward.
This enables efficient pagination workflows, where the calling script
can remember the last provided reference and use that as the starting
point for the next set of references:
git for-each-ref --count=100
git for-each-ref --count=100 --start-after=refs/heads/branch-100
git for-each-ref --count=100 --start-after=refs/heads/branch-200
Since the reference iterators only allow seeking to a specified marker
via the `ref_iterator_seek()`, we introduce a helper function
`start_ref_iterator_after()`, which seeks to next reference by simply
adding (char) 1 to the marker.
We must note that pagination always continues from the provided marker,
as such any concurrent reference updates lexicographically behind the
marker will not be output. Document the same.
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In 'ref-filter.c', there is an 'else' clause within `do_filter_refs()`.
This is unnecessary since the 'if' clause calls `die()`, which would
exit the program. So let's remove the unnecessary 'else' clause. This
improves readability since the indentation is also reduced and flow is
simpler.
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The ref iterator exposes a `ref_iterator_seek()` function. The name
suggests that this would seek the iterator to a specific reference in
some ways similar to how `fseek()` works for the filesystem.
However, the function actually sets the prefix for refs iteration. So
further iteration would only yield references which match the particular
prefix. This is a bit confusing.
Let's add a 'flags' field to the function, which when set with the
'REF_ITERATOR_SEEK_SET_PREFIX' flag, will set the prefix for the
iteration in-line with the existing behavior. Otherwise, the reference
backends will simply seek to the specified reference and clears any
previously set prefix. This allows users to start iteration from a
specific reference.
In the packed and reftable backend, since references are available in a
sorted list, the changes are simply setting the prefix if needed. The
changes on the files-backend are a little more involved, since the files
backend uses the 'ref-cache' mechanism. We move out the existing logic
within `cache_ref_iterator_seek()` to `cache_ref_iterator_set_prefix()`
which is called when the 'REF_ITERATOR_SEEK_SET_PREFIX' flag is set. We
then parse the provided seek string and set the required levels and
their indexes to ensure that seeking is possible.
Helped-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The 'find_ref_entry' function is no longer used, so remove it.
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The `ref_iterator` is an internal structure to the 'refs/'
sub-directory, which allows iteration over refs. All reference iteration
is built on top of these iterators.
External clients of the 'refs' subsystem use the various
'refs_for_each...()' functions to iterate over refs. However since these
are wrapper functions, each combination of functionality requires a new
wrapper function. This is not feasible as the functions pile up with the
increase in requirements. Expose the internal reference iterator, so
advanced users can mix and match options as needed.
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
External tools such as Jujutsu may add many references that are of no
interest to the user. This preference allows hiding them.
Signed-off-by: Ori Avtalion <ori@avtalion.name>
To enable optimize multiple pathspec items in revision traversal,
return 0 if all pathspec item is literal in forbid_bloom_filters().
Add for loops to initialize and check each pathspec item's bloom_keyvec
when optimization is possible.
Add new test cases in t/t4216-log-bloom.sh to ensure
- consistent results between the optimization for multiple pathspec
items using bloom filter and the case without bloom filter
optimization.
- does not use bloom filter if any pathspec item is not literal.
With these optimizations, we get some improvements for multi-pathspec runs
of 'git log'. First, in the Git repository we see these modest results:
Benchmark 1: old
Time (mean ± σ): 73.1 ms ± 2.9 ms
Range (min … max): 69.9 ms … 84.5 ms 42 runs
Benchmark 2: new
Time (mean ± σ): 55.1 ms ± 2.9 ms
Range (min … max): 51.1 ms … 61.2 ms 52 runs
Summary
'new' ran
1.33 ± 0.09 times faster than 'old'
But in a larger repo, such as the LLVM project repo below, we get even
better results:
Benchmark 1: old
Time (mean ± σ): 1.974 s ± 0.006 s
Range (min … max): 1.960 s … 1.983 s 10 runs
Benchmark 2: new
Time (mean ± σ): 262.9 ms ± 2.4 ms
Range (min … max): 257.7 ms … 266.2 ms 11 runs
Summary
'new' ran
7.51 ± 0.07 times faster than 'old'
Signed-off-by: Derrick Stolee <stolee@gmail.com>
[ly: rename convert_pathspec_to_filter() to convert_pathspec_to_bloom_keyvec()]
Signed-off-by: Lidong Yan <502024330056@smail.nju.edu.cn>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git apply -N" should start from the current index and register
only new files, but it instead started from an empty index, which
has been corrected.
* rp/apply-intent-to-add-fix:
apply docs: clarify wording for --intent-to-add
t4140: test apply --intent-to-add interactions
apply: only write intents to add for new files
apply: read in the index in --intent-to-add mode
Code and test clean-up around string-list API.
* sj/string-list:
u-string-list: move "remove duplicates" test to "u-string-list.c"
u-string-list: move "filter string" test to "u-string-list.c"
u-string-list: move "test_split_in_place" to "u-string-list.c"
u-string-list: move "test_split" into "u-string-list.c"
string-list: enable sign compare warnings check
string-list: return index directly when inserting an existing element
string-list: remove unused "insert_at" parameter from add_entry
string-list: fix sign compare warnings for loop iterator
Tempfile removal fix in the codepath to sign commits with SSH keys.
* re/ssh-sign-buffer-fix:
ssh signing: don't detach the filename strbuf from key_file tempfile
A failure to open the index file for writing due to conflicting
access did not state what went wrong, which has been corrected.
* hy/read-cache-lock-error-fix:
read-cache: report lock error when refreshing index
Update ".clang-format" and ".editorconfig" to match our style guide
a bit better.
* kn/clang-format-updates:
meson: add rule to run 'git clang-format'
clang-format: add 'RemoveBracesLLVM' to the main config
clang-format: set 'ColumnLimit' to 0
"netrc" credential helper has been improved to understand textual
service names (like smtp) in addition to the numeric port numbers
(like 25).
* mc/netrc-service-names:
contrib: better support symbolic port names in git-credential-netrc
contrib: warn for invalid netrc file ports in git-credential-netrc
contrib: use a more portable shebang for git-credential-netrc
"make coccicheck" succeeds even when spatch made suggestions, which
has been updated to fail in such a case.
* jc/coccicheck-fails-make-when-it-fails:
coccicheck: fail "make" when it fails
The reftable ref backend has matured enough; Git 3.0 will make it
the default format in a newly created repositories by default.
* ps/use-reftable-as-default-in-3.0:
setup: use "reftable" format when experimental features are enabled
BreakingChanges: announce switch to "reftable" format
A diff-filter with negative-only specification like "git log
--diff-filter=d" did not trigger correctly, which has been fixed.
* jk/all-negative-diff-filter-fix:
setup_revisions(): turn on diffs for all-negative diff filter
Some code paths in the "git prune" used to ignore passed in
repository object and used the_repository singleton instance
instead, which has been corrected.
* ac/prune-wo-the-repository:
builtin/prune: stop depending on 'the_repository'
repository: move 'repository_format_precious_objects' to repo scope
Drop FreeBSD 4 support and assume we are at least at FreeBSD 6 with
memmem() supported.
* bs/config-mak-freebsd:
build: retire NO_UINTMAX_T
config.mak.uname: set NO_MEMMEM only for functional version
Use of sysctl() system call to learn the total RAM size used on
BSDs has been corrected.
* cb/total-ram-bsd-fix:
builtin/gc: correct total_ram calculation with HAVE_BSD_SYSCTL
This allows using a custom gpg program under the user's home directory
by specifying a path starting with '~'
[gpg]
program = "~/.local/bin/mygpg"
Signed-off-by: Jonas Brandstötter <jonas.brandstoetter@gmx.at>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When preparing to use bloom filters in a revision walk, Git populates a
boom_keyvec with an array of bloom keys for the components of a path.
Before we create the ability to map multiple pathspecs to multiple
bloom_keyvecs, extract the conversion from a pathspec to a bloom_keyvec
into its own helper method. This simplifies the state that persists in
prepare_to_use_bloom_filter() as well as makes the future change much
simpler.
Signed-off-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Lidong Yan <502024330056@smail.nju.edu.cn>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Previously, we stored bloom keys in a flat array and marked a commit
as NOT TREESAME if any key reported "definitely not changed".
To support multiple pathspec items, we now require that for each
pathspec item, there exists a bloom key reporting "definitely not
changed".
This "for every" condition makes a flat array insufficient, so we
introduce a new structure to group keys by a single pathspec item.
`struct bloom_keyvec` is introduced to replace `struct bloom_key *`
and `bloom_key_nr`. And because we want to support multiple pathspec
items, we added a bloom_keyvec * and a bloom_keyvec_nr field to
`struct rev_info` to represent an array of bloom_keyvecs. This commit
still optimize only one pathspec item, thus bloom_keyvec_nr can only
be 0 or 1.
New bloom_keyvec_* functions are added to create and destroy a keyvec.
bloom_filter_contains_vec() is added to check if all key in keyvec is
contained in a bloom filter.
Signed-off-by: Lidong Yan <502024330056@smail.nju.edu.cn>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
git code style requires that functions operating on a struct S
should be named in the form S_verb. However, the functions operating
on struct bloom_key do not follow this convention. Therefore,
fill_bloom_key() and clear_bloom_key() are renamed to bloom_key_fill()
and bloom_key_clear(), respectively.
Signed-off-by: Lidong Yan <502024330056@smail.nju.edu.cn>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In bloom.h, murmur3_seeded_v2() is exported for the use of test murmur3
hash. To clarify that murmur3_seeded_v2() is exported solely for testing
purposes, a new helper function test_murmur3_seeded() was added instead
of exporting murmur3_seeded_v2() directly.
Signed-off-by: Lidong Yan <502024330056@smail.nju.edu.cn>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This patch adds a basic support of SHA256 Git repository to Gitk, so
that Gitk can show and operate on both SHA1 and SHA256 repos
gracefully. Since SHA256 has a longer ID length (64 char) than SHA1
(40 char), many field widths are adjusted to fit with it.
A caveat is that the configuration of auto selection length is shared
between SHA1 and SHA256 repos. That is, once when this value is saved
and read, it's applied to both repo types, which may result in shorter
selection than the full SHA256 ID. We may introduce another
individual config for sha256 (actually I did write in the first
version), but for simplicity, the common config is used as of writing
this.
Many lines still refer "sha1" although they may point to both SHA1 and
SHA256. They are left untouched for making the changes simpler.
This patch is based on the early work by Rostislav Krasny:
https://patchwork.kernel.org/project/git/patch/pull.979.git.1623687519832.gitgitgadget@gmail.com
I refreshed, revised and extended to the latest state.
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Johannes Sixt <j6t@kdbg.org>
When commit-msg writes the file using CRLF, the lines in the final
message include trailing spaces.
Postpone stripping until after hooks execute.
This aligns with Git's behavior, which passes the original message
to commit-msg, then strips comments and whitespace.
Signed-off-by: Orgad Shaneh <orgads@gmail.com>
As executing our test suite is notoriously slow on Windows we use matrix
jobs in our CI systems to slice up tests and run them via multiple jobs.
On Meson this is done with a comparatively complex PowerShell invocation
as Meson didn't yet have a native way to slice tests like this.
I have upstreamed a new `--slice` option [1] that addresses this use
case though, which has been merged and released with Meson 1.8. Both
GitLab and GitHub CI have Meson 1.8.2 available by now, so let's update
the jobs to use that new option.
[1]: https://github.com/mesonbuild/meson/pull/14092
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Update subproject wrappers to newer versions by executing `meson wrap
update` in the project's root directory
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
gitglossary documents Git pathspecs. One type of pathspec is the "glob"
pathspec, prefixed with the magic word "glob".
Regarding glob pathspecs, gitglossary says, '"**/foo" matches file or
directory "foo" anywhere, the same as pattern "foo".' That last phrase
('the same as pattern "foo") is incorrect. "**/foo" and "foo" are not
equivalent. "**/foo" matches foo anywhere, but "foo" does not.
This change removes the incorrect phrase from the glob pathspec doc.
Signed-off-by: Russell Hanneken <rhanneken@pobox.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Replace signal() with an equivalent invocation of sigaction(), but
make sure to NOT set SA_RESTART so the original code that expects
to be interrupted when children complete still works as designed.
This change has the added benefit of using BSD signal semantics reliably
and therefore not needing the rearming call in the signal handler.
Signed-off-by: Carlo Marcelo Arenas Belón <carenas@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
A future change will start using sigaction to setup a SIGCHLD signal
handler.
The current code uses signal(), which returns SIG_ERR (but doesn't
seem to set errno) so instruct sigaction() to do the same.
A new SA flag will be needed, so copy the one from Cygwin; note that
the sigaction() implementation that is provided won't use it, so
its value is otherwise irrelevant.
Signed-off-by: Carlo Marcelo Arenas Belón <carenas@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Compiling Git fails on Amazon Linux 2 when using GCC 7.3.1 with the
following compiler error:
In file included from compat/posix.h:449:0,
from git-compat-util.h:26,
from daemon.c:3:
compat/../sane-ctype.h:29:60: error: expected expression before ']' token
#define sane_istest(x,mask) ((sane_ctype[(unsigned char)(x)] & (mask)) != 0)
^
compat/../sane-ctype.h:29:72: error: expected ')' before '!=' token
#define sane_istest(x,mask) ((sane_ctype[(unsigned char)(x)] & (mask)) != 0)
^
compat/../sane-ctype.h:29:60: error: expected expression before ']' token
#define sane_istest(x,mask) ((sane_ctype[(unsigned char)(x)] & (mask)) != 0)
^
... lots of similar lines ...
compat/../sane-ctype.h:45:50: error: expected declaration specifiers or '...' before numeric constant
#define toupper(x) sane_case((unsigned char)(x), 0)
^
/usr/include/ctype.h:142:12: error: expected identifier or '(' before 'int'
extern int isascii (int __c) __THROW;
^
compat/../sane-ctype.h:30:26: error: expected ')' before '&' token
#define isascii(x) (((x) & ~0x7f) == 0)
^
compat/../sane-ctype.h:30:35: error: expected ')' before '==' token
#define isascii(x) (((x) & ~0x7f) == 0)
^
In file included from /usr/include/features.h:423:0,
from /usr/include/unistd.h:25,
from compat/posix.h:90,
from git-compat-util.h:26,
from daemon.c:3:
compat/../sane-ctype.h:44:30: error: expected declaration specifiers or '...' before '(' token
#define tolower(x) sane_case((unsigned char)(x), 0x20)
^
compat/../sane-ctype.h:44:50: error: expected declaration specifiers or '...' before numeric constant
#define tolower(x) sane_case((unsigned char)(x), 0x20)
^
compat/../sane-ctype.h:45:30: error: expected declaration specifiers or '...' before '(' token
#define toupper(x) sane_case((unsigned char)(x), 0)
^
compat/../sane-ctype.h:45:50: error: expected declaration specifiers or '...' before numeric constant
#define toupper(x) sane_case((unsigned char)(x), 0)
^
This error bisect back to 75a044f748 (git-compat-util.h: split out
POSIX-emulating bits, 2025-02-18), where lots of bits got split out of
"git-compat-util.h" into a new "compat/posix.h" header.
The compiler error isn't immediately obvious, doubly so because the
actual errors are ~3x as long as the above snippet. But what happens
here is that we transitively include <ctype.h> after we have included
our own "sane-ctype.h" header. Consequently, the function declarations
that exist in <ctype.h> for isascii(3p) et al will be mangled by our
macros of the same type. The result is of course completely broken.
It's unclear why this issue only happens on Amazon Linux 2. My guess is
that it's either specific to the compiler version or specific to the
glibc version. We don't explicitly include <ctypes.h> anywhere, but it's
being transitively included. So chances are that later versions of the
toolchain reorganized their headers so that <ctypes.h> is not included
transitively anymore.
Fix the issue by explicitly including <ctype.h> in "sane-ctype.h". This
ensures that the header guards will be activated and that any subsequent
include of the same header will become a no-op. With this we can then
safely override the function declarations with our own macros.
Reported-by: Stan Hu <stanhu@gmail.com>
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* ps/object-store:
odb: rename `read_object_with_reference()`
odb: rename `pretend_object_file()`
odb: rename `has_object()`
odb: rename `repo_read_object_file()`
odb: rename `oid_object_info()`
odb: trivial refactorings to get rid of `the_repository`
odb: get rid of `the_repository` when handling submodule sources
odb: get rid of `the_repository` when handling the primary source
odb: get rid of `the_repository` in `for_each()` functions
odb: get rid of `the_repository` when handling alternates
odb: get rid of `the_repository` in `odb_mkstemp()`
odb: get rid of `the_repository` in `assert_oid_type()`
odb: get rid of `the_repository` in `find_odb()`
odb: introduce parent pointers
object-store: rename files to "odb.{c,h}"
object-store: rename `object_directory` to `odb_source`
object-store: rename `raw_object_store` to `object_database`
A recent commit, d9cb0e6ff8 (fast-export, fast-import: add support for
signed-commits, 2025-03-10), added support for signed commits to
fast-export and fast-import.
When a signed commit is processed, fast-export can output either
"gpgsig sha1" or "gpgsig sha256" depending on whether the signed
commit uses the SHA-1 or SHA-256 Git object format.
However, this implementation has a number of limitations:
- the output format was not properly described in the documentation,
- the output format is not very informative as it doesn't even say
if the signature is an OpenPGP, an SSH, or an X509 signature,
- the implementation doesn't support having both one signature on
the SHA-1 object and one on the SHA-256 object.
Let's improve on these limitations by improving fast-export and
fast-import so that:
- all the signatures are exported,
- at most one signature on the SHA-1 object and one on the SHA-256
are imported,
- if there is more than one signature on the SHA-1 object or on
the SHA-256 object, fast-import emits a warning for each
additional signature,
- the output format is "gpgsig <git-hash-algo> <signature-format>",
where <git-hash-algo> is the Git object format as before, and
<signature-format> is the signature type ("openpgp", "x509",
"ssh" or "unknown"),
- the output is properly documented.
About the output format:
- <git-hash-algo> allows to know which representation of the commit
was signed (the SHA-1 or the SHA-256 version) which helps with
both signature verification and interoperability between repos
with different hash functions,
- <signature-format> helps tools that process the fast-export
stream, so they don't have to parse the ASCII armor to identify
the signature type.
It could be even better to be able to import more than one signature
on the SHA-1 object and on the SHA-256 object, but other parts of
Git don't handle that well for now, so this is left for future
improvements.
Helped-by: brian m. carlson <sandals@crustytoothpaste.net>
Helped-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Similar to 09705696f7 (parse-options: introduce precision handling for
`OPTION_INTEGER`, 2025-04-17) support value variables of different sizes
for OPTION_COUNTUP. Do that by requiring their "precision" to be set,
casting their "value" pointer accordingly and checking whether the value
fits.
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Similar to 09705696f7 (parse-options: introduce precision handling for
`OPTION_INTEGER`, 2025-04-17) support value variables of different sizes
for OPTION_BITOP. Do that by requiring their "precision" to be set,
casting their "value" pointer accordingly and checking whether the value
fits.
Check if "devfal" fits into an integer variable with the given
"precision", but don't check "extra", as its value is only used to clear
bits, so cannot lead to an overflow. Not checking continues to allow
e.g., using -1 to clear all bits even if the value variable has a
narrower type than intptr_t.
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Similar to 09705696f7 (parse-options: introduce precision handling for
`OPTION_INTEGER`, 2025-04-17) support value variables of different sizes
for OPTION_NEGBIT. Do that by requiring their "precision" to be set,
casting their "value" pointer accordingly and checking whether the value
fits.
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Similar to 09705696f7 (parse-options: introduce precision handling for
`OPTION_INTEGER`, 2025-04-17) support value variables of different sizes
for OPTION_BIT. Do that by requiring their "precision" to be set,
casting their "value" pointer accordingly and checking whether the value
fits.
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Similar to 09705696f7 (parse-options: introduce precision handling for
`OPTION_INTEGER`, 2025-04-17) support value variables of different sizes
for OPTION_SET_INT. Do that by requiring their "precision" to be set,
casting their "value" pointer accordingly and checking whether the value
fits.
Factor out the casting code from the part of do_get_value() that handles
OPTION_INTEGER to avoid code duplication. We're going to use it in the
next patches as well.
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Build on 09705696f7 (parse-options: introduce precision handling for
`OPTION_INTEGER`, 2025-04-17) to support value variables of different
sizes for PARSE_OPT_CMDMODE options. Do that by requiring their
"precision" to be set and casting their "value" pointer accordingly.
Call the function that does the raw casting do_get_int_value() to
reserve the name get_int_value() for a more friendly wrapper we're
going to introduce in one of the next patches.
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
OPTION_BITOP options don't take arguments. Make sure they are declared
that way using the flag PARSE_OPT_NOARG.
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* ps/object-store:
odb: rename `read_object_with_reference()`
odb: rename `pretend_object_file()`
odb: rename `has_object()`
odb: rename `repo_read_object_file()`
odb: rename `oid_object_info()`
odb: trivial refactorings to get rid of `the_repository`
odb: get rid of `the_repository` when handling submodule sources
odb: get rid of `the_repository` when handling the primary source
odb: get rid of `the_repository` in `for_each()` functions
odb: get rid of `the_repository` when handling alternates
odb: get rid of `the_repository` in `odb_mkstemp()`
odb: get rid of `the_repository` in `assert_oid_type()`
odb: get rid of `the_repository` in `find_odb()`
odb: introduce parent pointers
object-store: rename files to "odb.{c,h}"
object-store: rename `object_directory` to `odb_source`
object-store: rename `raw_object_store` to `object_database`
In 4cba20fbdc6 (meson: prefer shell at "/bin/sh", 2025-04-25) we have
addressed an issue where the shell path embedded into Git was looked up
via PATH, which easily led to unportable shell paths other than the
usual "/bin/sh" location. The fix was to simply add '/bin' to the search
path explicitly, which made us prefer that directory over the PATH-based
lookup.
This fix causes issues on MINGW64 though, which uses Windows-style
paths. "/bin" is not an absolute Windows-style path, but Meson expects
the directories to be absolute. This leads to the following error:
meson.build:248:15: ERROR: Search directory /bin is not an absolute path.
Fix this by instead searching for both '/bin/sh' and 'sh', which also
causes us to prefer '/bin/sh' over a PATH-based lookup. Meson does
accept that path alright on MINGW64, even though it's not an absolute
Windows-style path, either.
Furthermore, this continues to work alright with cross-files, as well,
in case one wants to explicitly override the shell path:
$ meson setup build
...
Runtime executable paths
perl : /nix/store/gy10hw004rl2xfbfq41vnw0yb1w8rvbl-perl-5.40.0/bin/perl
python : /nix/store/sd81bvmch7njdpwx3lkjslixcbj5mivz-python3-3.13.4/bin/python3
shell : /bin/sh
$ cat >cross.ini <<-EOF
[binaries]
sh = '/nix/store/94lg0shvsfc845zy8gnflvpqxxiyijbz-bash-interactive-5.2p37/bin/bash'
EOF
$ meson setup build --cross-file=cross.ini --wipe
...
Runtime executable paths
perl : /nix/store/gy10hw004rl2xfbfq41vnw0yb1w8rvbl-perl-5.40.0/bin/perl
python : /nix/store/sd81bvmch7njdpwx3lkjslixcbj5mivz-python3-3.13.4/bin/python3
shell : /nix/store/94lg0shvsfc845zy8gnflvpqxxiyijbz-bash-interactive-5.2p37/bin/bash
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The `manpage_target` variable isn't used at all, and the `manpage_path`
variable is only used in a single location. Remove the former variable
and inline the latter.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The summary of auto-detected features prints a boolean for every option
to tell the user whether or not the feature has been auto-enabled or
not. This summary can be improved though, as in some cases this boolean
is derived from a dependency. So if we pass in the dependency directly,
then Meson knows to both print a boolean and, if the dependency was
found, it also prints a version number.
Adapt the code accordingly and enable `bool_yn` so that actual booleans
are formatted similarly to dependencies. Before this change:
Auto-detected features
benchmarks : true
curl : true
expat : true
gettext : true
gitweb : true
iconv : true
pcre2 : true
perl : true
python : true
And after this change, we now see the version numbers as expected:
Auto-detected features
benchmarks : YES
curl : YES 8.14.1
expat : YES 2.7.1
gettext : YES
gitweb : YES
iconv : YES
pcre2 : YES 10.44
perl : YES
python : YES
Note that this change also enables colorization of the boolean options,
green for "YES" and red for "NO".
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The value for the 'https' backend option is printed twice: once via the
summary of auto-detected features and once via our summary of backends.
Drop it from the former summary.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When Python features are enabled we search both for a native and
non-native version of Python. This is wrong though: we don't use Python
in our build process, so there is no need to search for it in the first
place.
There is one location where we use the native version of Python, namely
when deciding whether or not we want to wire up git-p4(1). This check is
invalid though, as we shouldn't check for the build host to have Python,
but for the target host.
Fix this invalid check to use the non-native version of Python and stop
searching for a native version of Python altogether.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When two remotes collide in the destinations of their fetch refspecs,
the results can be confusing. For example, in this silly example:
git config remote.one.url [...]
git config remote.one.fetch +refs/heads/*:refs/remotes/collide/*
git config remote.two.url [...]
git config remote.two.fetch +refs/heads/*:refs/remotes/collide/*
git fetch --all
we may try to write to the same ref twice (once for each remote we're
fetching). There's also a more subtle version of this. If you have
remotes "outer/inner" and "outer", then the ref "inner/branch" on the
second remote will conflict with just "branch" on the former (they both
want to write to "refs/remotes/outer/inner/branch").
We probably don't want to forbid this kind of overlap completely. While
the results can be confusing, there are legitimate reasons to have
multiple refs write into the same namespace (e.g., if one is a "backup"
of the other that is rarely fetched from).
But it may be worth limiting the porcelain "git remote" command to avoid
this confusion. The example above cannot be done with "git remote",
because it always[1] matches the refspecs to the remote name, and you
can only have one instance of each remote name. But you can still
trigger the more subtle variant like this:
git remote add outer [...]
git remote add outer/inner [...]
So let's detect that kind of name collision (in both directions) and
forbid it. You can still do whatever you like by manipulating the config
directly, but this should prevent the most obvious foot-gun.
[1] Almost always. With the --mirror option, the resulting refspec will
just write into "refs/*"; the remote name does not appear in the ref
namespace at all.
Our new "names must not overlap" rule is not necessary for that
case, but it seems reasonable to enforce it consistently. We already
require all remote names to be valid in the ref namespace, even
though we won't ever use them in that context for --mirror remotes.
Likewise, our new rule doesn't help with overlap here. Any two
mirror remotes will always overlap (in fact, any mirror remote along
with any other single one, since refs/remotes/ is a subset of the
mirrored refs). I'm not sure this is worth worrying about, but if it
is, we'd want an additional rule like "mirror remotes must be the
only remote".
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git push" and "git fetch" are taught to update refs in batches to
gain performance.
* kn/fetch-push-bulk-ref-update:
receive-pack: handle reference deletions separately
refs/files: skip updates with errors in batched updates
receive-pack: use batched reference updates
send-pack: fix memory leak around duplicate refs
fetch: use batched reference updates
refs: add function to translate errors to strings
This turns into a no-op merge, since more recent versions of Git
newer than 2.46 track do support the newer "git config" syntax.
* maint-2.45:
t: avoid git config syntax from newer releases
In a recent security release, 05e9cd64ee (config: quote values
containing CR character, 2025-05-19) added calls to `git config get`,
`git config set`, and `git config unset` which are not present on the
maint-2.43 branch.
These subcommands were added in the following commits, released in
git-2.46.0:
4e51389000 (builtin/config: introduce "get" subcommand, 2024-05-06),
00bbdde141 (builtin/config: introduce "set" subcommand, 2024-05-06),
95ea69c67b (builtin/config: introduce "unset" subcommand, 2024-05-06)
Revert to the previous `git config` syntax for older maintenance
branches.
Signed-off-by: Todd Zullinger <tmz@pobox.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When running t1006 via Meson we receive an error about invalid TAP
format:
$ meson test t1006-cat-file
1/1 t1006-cat-file OK 3.86s 420 subtests passed
stdout: 147: UNKNOWN: c308ae01840d8e620ad554ee5d77fe114dc2d912:path with spaces
stdout: 159: UNKNOWN: 3625298bf5e7c464a7d0e38ea80c2a5b5904d9a3e5b2b025b67f360e09b68dc7:path with spaces
ERROR: Unknown TAP output lines for a supported TAP version.
This is probably a bug in the test; if they are not TAP syntax, prefix them with a #
Ok: 1
Fail: 0
While Meson copes with it alright, it's still annoying to see these
errors on every test run.
The root cause of the broken format is a call to grep(1) that gets
executed outside of a test case, which has been added recently via
9fd38038b9c (t1006: update 'run_tests' to test generic object
specifiers, 2025-06-02). This call is done to determine whether a
subsequent test case is expected to succeed or fail, so it makes sense
to have it execute outside of a test case. But whenever we do that, we
must be extra careful to not generate any output that breaks the TAP
format.
Fix the issue by adding '-q' to the command so that it doesn't print
any matching lines.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When creating a new reference in the "files" backend we first create the
directory hierarchy for that reference, then create the lockfile for
that reference, and finally rename the lockfile into place. When the
transaction gets aborted we prune the lockfile, but we don't clean up
the directory hierarchy that we may have created for the lockfile.
In some egde cases this can lead to lots of empty directories being
cluttered in the ".git/refs" directory that really serve no purpose at
all. We know to prune such empty directories when packing refs, but that
only patches over the issue.
Improve this by removing empty parents when cleaning up still-locked
references in `files_transaction_cleanup()`. This function is also
called when preparing or committing the transaction, so this change also
helps when not explicitly aborting the transaction.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The `git pack-refs --auto` flag asks the ref backend to decide for
itself whether or not references need to be repacked. This is done to
ensure that we don't repack in cases where the backend is already in a
good-enough state, which is typically the case for the "reftable"
backend that performs auto-compaction on writes.
As such, we initially only had heuristics in place for the "reftable"
backend. The "files" backend didn't have any heuristics, so we'd repack
loose references every time `git pack-refs --auto` was executed. This
caused excessive repacking with that backend though, which is why we
eventually implemented a heuristic via c3459ae9ef2 (refs/files: use
heuristic to decide whether to repack with `--auto`, 2024-09-04).
The documentation for the `--auto` flag hasn't been updated accordingly
and still claims that we don't have any metrics for the "files" backend.
Update it to reflect the new reality.
Reported-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When preparing the latest round of security fixes, we wrote release
notes in v2.43.7, and then successively merged those up through to the
various 'maint' branches.
However, the 2.49 release series is the first to have commit 1f010d6bdf
(doc: use .adoc extension for AsciiDoc files, 2025-01-20). This means
that we should have renamed the new-but-historical release notes from
*.txt to *.adoc during the merge into the 'maint-2.49' branch, but
neglected to do so.
Rename them accordingly to match the convention introduced by
1f010d6bdf. Since the release materials in question here were prepared
before v2.50.0 was tagged, the 'maint' track for that release series is
OK as is.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This addresses CVE-2025-46835, Git GUI can create and overwrite a
user's files:
When a user clones an untrusted repository and is tricked into editing
a file located in a maliciously named directory in the repository, then
Git GUI can create and overwrite files for which the user has write
permission.
* js/fix-open-exec-git:
git-gui: sanitize 'exec' arguments: convert new 'cygpath' calls
git-gui: do not mistake command arguments as redirection operators
git-gui: introduce function git_redir for git calls with redirections
git-gui: pass redirections as separate argument to git_read
git-gui: pass redirections as separate argument to _open_stdout_stderr
git-gui: convert git_read*, git_write to be non-variadic
git-gui: use git_read in githook_read
git-gui: break out a separate function git_read_nice
git-gui: remove option --stderr from git_read
git-gui: sanitize 'exec' arguments: background
git-gui: sanitize 'exec' arguments: simple cases
git-gui: treat file names beginning with "|" as relative paths
git-gui: remove git config --list handling for git < 1.5.3
git-gui: remove HEAD detachment implementation for git < 1.5.3
git-gui: remove Tcl 8.4 workaround on 2>@1 redirection
Signed-off-by: Johannes Sixt <j6t@kdbg.org>
This addresses CVE-2025-46334, Git GUI malicious command injection on
Windows.
A malicious repository can ship versions of sh.exe or typical textconv
filter programs such as astextplain. Due to the unfortunate design of
Tcl on Windows, the search path when looking for an executable always
includes the current directory. The mentioned programs are invoked when
the user selects "Git Bash" or "Browse Files" from the menu.
* ml/replace-auto-execok:
git-gui: override exec and open only on Windows
git-gui: sanitize $PATH on all platforms
git-gui: assure PATH has only absolute elements.
git-gui: cleanup git-bash menu item
git-gui: avoid auto_execok in do_windows_shortcut
git-gui: avoid auto_execok for git-bash menu item
git-gui: remove unused proc is_shellscript
git-gui: remove special treatment of Windows from open_cmd_pipe
git-gui: use only the configured shell
git-gui: make _shellpath usable on startup
git-gui: use [is_Windows], not bad _shellpath
git-gui: _which, only add .exe suffix if not present
Signed-off-by: Johannes Sixt <j6t@kdbg.org>
This addresses CVE-2025-27613, Gitk can create and truncate a user's
files:
When a user clones an untrusted repository and runs gitk without
additional command arguments, files for which the user has write
permission can be created and truncated. The option "Support per-file
encoding" must have been enabled before in Gitk's Preferences. This
option is disabled by default.
The same happens when "Show origin of this line" is used in the main
window (regardless of whether "Support per-file encoding" is enabled or
not).
* js/fix-open-exec:
gitk: sanitize 'open' arguments: revisit recently updated 'open' calls
gitk: sanitize 'open' arguments: command pipeline
gitk: collect construction of blameargs into a single conditional
gitk: sanitize 'open' arguments: simple commands, readable and writable
gitk: sanitize 'open' arguments: simple commands with redirections
gitk: sanitize 'open' arguments: simple commands
gitk: sanitize 'exec' arguments: redirect to process
gitk: sanitize 'exec' arguments: redirections and background
gitk: sanitize 'exec' arguments: redirections
gitk: sanitize 'exec' arguments: 'eval exec'
gitk: sanitize 'exec' arguments: simple cases
gitk: have callers of diffcmd supply pipe symbol when necessary
gitk: treat file names beginning with "|" as relative paths
Signed-off-by: Johannes Sixt <j6t@kdbg.org>
This addresses CVE-2025-27614, Arbitrary command execution with Gitk:
A Git repository can be crafted in such a way that with some social
engineering a user who has cloned the repository can be tricked into
running any script (e.g., Bourne shell, Perl, Python, ...) supplied by
the attacker by invoking `gitk filename`, where `filename` has a
particular structure. The script is run with the privileges of the user.
* ah/fix-open-with-stdin:
gitk: encode arguments correctly with "open"
Signed-off-by: Johannes Sixt <j6t@kdbg.org>
When "git daemon" sees a signal while attempting to accept() a new
client, instead of retrying, it skipped it by mistake, which has
been corrected.
* cb/daemon-retry-interrupted-accept:
daemon: correctly handle soft accept() errors in service_loop
Updating submodules from the upstream did not work well when
submodule's HEAD is detached, which has been improved.
* jk/submodule-remote-lookup-cleanup:
submodule: look up remotes by URL first
submodule: move get_default_remote_submodule()
submodule--helper: improve logic for fallback remote name
remote: remove the_repository from some functions
dir: move starts_with_dot(_dot)_slash to dir.h
remote: fix tear down of struct remote
remote: remove branch->merge_name and fix branch_release()
- Use `backticks` for keywords and more complex option
descriptions. The new rendering engine will apply synopsis rules to
these spans.
- Explain possible options in description list instead of in a paragraph.
Signed-off-by: Jean-Noël Avila <jn.avila@free.fr>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
- Use `backticks` for keywords and more complex option
descriptions. The new rendering engine will apply synopsis rules to
these spans.
- In description lists, put each option on its own line, to make them more
searchable and enable automatic translation of the options.
Signed-off-by: Jean-Noël Avila <jn.avila@free.fr>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
- Use _<placeholder>_ instead of <placeholder> in the description
- Use `backticks` for keywords and more complex option
descriptions. The new rendering engine will apply synopsis rules to
these spans.
For all the formats in the form of %(foo), the formatting needs to be
heavier because we not want the parentheses to be rendered as syntax
elements,but as keywords, i.e. we need to circumvent the syntax highlighting
of synopsis. In this particular case, this requires the heavy escaping of
the parts that contain parentheses with ++.
Signed-off-by: Jean-Noël Avila <jn.avila@free.fr>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
- Use _<placeholder>_ instead of <placeholder> in the description
- Use `backticks` for keywords and more complex option
descriptions. The new rendering engine will apply synopsis rules to
these spans.
Signed-off-by: Jean-Noël Avila <jn.avila@free.fr>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
- Fix some malformed synopis of options
- Use _<placeholder>_ instead of <placeholder> in the description
- Use `backticks` for keywords and more complex option
descriptions. The new rendering engine will apply synopsis rules to
these spans.
- Add the '%' sign to the characters of keywords.
Signed-off-by: Jean-Noël Avila <jn.avila@free.fr>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
- Use _<placeholder>_ instead of <placeholder> in the description
- Use `backticks` for keywords and more complex option
descriptions. The new rendering engine will apply synopsis rules to
these spans.
Signed-off-by: Jean-Noël Avila <jn.avila@free.fr>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
format placeholders in italics and keywords in monospace
- Use _<placeholder>_ instead of <placeholder> in the description
- Use `backticks` for keywords and more complex option
descriptions. The new rendering engine will apply synopsis rules to
these spans.
Signed-off-by: Jean-Noël Avila <jn.avila@free.fr>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Use `backticks` for commit ranges. The new rendering engine will apply
synopsis rules to these spans.
Signed-off-by: Jean-Noël Avila <jn.avila@free.fr>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
- Switch the synopsis to a synopsis block which will automatically
format placeholders in italics and keywords in monospace
- Use _<placeholder>_ instead of <placeholder> in the description
- Use `backticks` for keywords and more complex option
descriptions. The new rendering engine will apply synopsis rules to
these spans.
We also transform inline descriptions of possible values of option
--decorate into a list, which is more readable and extensible.
Signed-off-by: Jean-Noël Avila <jn.avila@free.fr>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Detaching the filename string from the tempfile structure used to cause
delete_tempfile() to fail and the temporary file was not cleaned up.
While it's possible to get rid of the allocation and copy from
xstrdup(), it keeps the code symetric with the other branch since
interpolate_path() also allocates and ssh_signing_key_file is freed
in both cases.
The exisiting test was updated to check if the temporary files are
properly deleted. To prevent TMPDIR from leaking into the other tests, a
new subshell is created, however this prevents test_config from working.
The cleanup of the config changed in the subshell is done by
test_unconfig in a call to test_when_finished outside of it.
Helped-by: brian m. carlson <sandals@crustytoothpaste.net>
Helped-by: Patrick Steinhardt <ps@pks.im>
Helped-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Signed-off-by: redoste <redoste@redoste.xyz>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The calls to sysctl() assume a 64-bit memory size for the variable
holding the value, but the actual size depends on the key name and
platform, at least for HW_PHYSMEM.
Detect any mismatched reads, and retry with a shorter variable
when needed.
Signed-off-by: Carlo Marcelo Arenas Belón <carenas@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In 6aec8d38fdd (t: refactor tests depending on Perl to print data,
2025-04-03) we have changed some of the tests in t4150 to use sed(1)
instead of Perl. One of the conversions is broken though:
sed: -e expression #1, char 41: unterminated `s' command
Curiously enough, the test itself still passes. This is caused by a
sequence of failures:
1. The output of sed(1) is piped into git-update-ref(1), and because
sed(1) is the upstream command we don't notice that it fails.
2. git-update-ref(1) does not receive any input and thus won't create
any references.
3. We then repack the repository with the configured pseudo merges
pattern, but as we didn't create any references the pattern doesn't
match anything.
4. We use `test_pseudo_merges()` to compute the list of pseudo-merges
and write it into a file. This file is empty as there are none.
5. The loop over the pseudo-merges becomes a no-op.
6. The final test succeeds as well because the number of lines in an
empty file is obviously the same as the number of unique lines,
namely zero.
Fix the issue by adding the terminating '|' to the sed(1) command.
Furthermore, make the test a tiny bit more robust by not using it as
part of a pipe.
Reported-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In 6aec8d38fdd (t: refactor tests depending on Perl to print data,
2025-04-03) we have changed one of the tests in t4150 to use awk(1)
instead of Perl. The test works, but at least gawk(1) prints a warning
now:
awk: cmd. line:3: warning: escape sequence `\@' treated as plain `@'
Fix this by removing the backslash.
Reported-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Documentation for git-merge incorrectly notes that
tip of the current branch on ascii diagram is C,
while it is actually G (current branch is master,
HEAD on diagram is G).
Additionally diagrams on the page are adjusted
to use spaces instead of tabs, so that they align
regardless of tab size. This is in line with
diagrams on other git documentation pages.
Signed-off-by: Timur Sultanaev <str.write@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Commit 50dec7c566 ("config.mak.uname: add sysinfo() configuration for
cygwin", 2025-04-17) and later commit 187ce0222f ("configure.ac: upgrade
to a compilation check for sysinfo", 2025-05-19) added a 'sysinfo()'
check to the autoconf build.
The FreeBSD system has an optional sysinfo compatibility library, used
to assist in porting software, which causes the build to fail when it
is installed. The reason for the failure is the lack of '-lsysinfo'
during the linking step.
Several solutions were considered:
- add a 'linking' check to configure.ac in order to determine the
need to link a separate library (-lsysinfo). (This would require
a similar change to meson.build).
- change the order of the preprocessor conditionals in the total_ram()
function in 'builtin/gc.c', so that the *BSD sysctl() function
(in the HAVE_BSD_SYSCTL block) takes priority over the sysinfo()
function (in the HAVE_SYSINFO block).
- suppress the setting of HAVE_SYSINFO when HAVE_BSD_SYSCTL has been
defined (in both configure.ac and meson.build).
The first solution above, while simple, adds unnecessary code (the
sysinfo compat function is likely implemented using sysctl() anyway)
when git is happy to use sysctl() on *BSD systems.
The second solution would only be required by the autoconf and meson
build systems, the Makefile already sets the build variables to the
required values (since they are not 'auto-detected').
Here we opt for the final solution above, since it only requires that
we prioritise the 'auto-detected' build variables in the autoconf and
meson builds.
In order to fix the FreeBSD build, move the sysinfo() check after the
determination of the HAVE_BSD_SYSCTL build variable, suppressing the
setting of HAVE_SYSINFO if HAVE_BSD_SYSCTL is defined. Apply this logic
to both the configure.ac and meson.build file.
[Thanks go to Renato Botelho <garga@FreeBSD.org> for testing this patch
on FreeBSD.]
Tested-by: Renato Botelho <garga@FreeBSD.org>
Signed-off-by: Ramsay Jones <ramsay@ramsayjones.plus.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Refactor builtin/prune.c to remove the dependency on the global
'the_repository'. Replace all the occurrences of 'the_repository' with
repo and thus remove the definition '#define
USE_THE_REPOSITORY_VARIABLE'. Also, add a test to make sure that 'git
prune -h' can be called when the repository is `NULL`.
Mentored-by: Christian Couder <christian.couder@gmail.com>
Mentored-by: Ghanshyam Thakkar <shyamthakkar001@gmail.com>
Signed-off-by: Ayush Chandekar <ayu.chandekar@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The 'extensions.preciousObjects' setting when set true, prevents
operations that might drop objects from the object storage. This setting
is populated in the global variable
'repository_format_precious_objects'.
Move this global variable to repo scope by adding it to 'struct
repository and also refactor all the occurences accordingly.
This change is part of an ongoing effort to eliminate global variables,
improve modularity and help libify the codebase.
Mentored-by: Christian Couder <christian.couder@gmail.com>
Mentored-by: Ghanshyam Thakkar <shyamthakkar001@gmail.com>
Signed-off-by: Ayush Chandekar <ayu.chandekar@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We use "test-tool string-list remove_duplicates" to test the
"string_list_remove_duplicates" function. As we have introduced the unit
test, we'd better remove the logic from shell script to C program to
improve test speed and readability.
As all the tests in shell script are removed, let's just delete the
"t0063-string-list.sh" and update the "meson.build" file to align with
this change.
Also we could simply remove "DISABLE_SIGN_COMPARE_WARNINGS" due to we
have already deleted related code.
Unfortunately, we cannot totally remove "test-string-list.c" due to that
we would test the performance of sorting about string list by executing
"test-tool string-list sort" in "p0071-sort.sh".
Signed-off-by: shejialuo <shejialuo@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We use "test-tool string-list filter" to test the "filter_string_list"
function. As we have introduced the unit test, we'd better remove the
logic from shell script to C program to improve test speed and
readability.
Signed-off-by: shejialuo <shejialuo@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We use "test-tool string-list split_in_place" to test the
"string_list_split_in_place" function. As we have introduced the unit
test, we'd better remove the logic from shell script to C program to
improve test speed and readability.
Signed-off-by: shejialuo <shejialuo@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We rely on "test-tool string-list" command to test the functionality of
the "string-list". However, as we have introduced clar test framework,
we'd better move the shell script into C program to improve speed and
readability.
Create a new file "u-string-list.c" under "t/unit-tests", then update
the Makefile and "meson.build" to build the file. And let's first move
"test_split" into unit test and gradually convert the shell script into
C program.
In order to create `string_list` easily by simply specifying strings in
the function call, create "t_vcreate_string_list_dup" function to do
this.
Then port the shell script tests to C program and remove unused
"test-tool" code and tests.
Signed-off-by: shejialuo <shejialuo@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In "add_entry", we call "get_entry_index" function to get the inserted
position. However, as the return type of "get_entry_index" function is
`int`, there is a sign compare warning when comparing the `index` with
the `list-nr` of unsigned type.
"get_entry_index" would always return unsigned index. However, the
current binary search algorithm initializes "left" to be "-1", which
necessitates the use of signed `int` return type.
The reason why we need to assign "left" to be "-1" is that in the
`while` loop, we increment "left" by 1 to determine whether the loop
should end. This design choice, while functional, forces us to use
signed arithmetic throughout the function.
To resolve this sign comparison issue, let's modify the binary search
algorithm with the following approach:
1. Initialize "left" to 0 instead of -1
2. Use `left < right` as the loop termination condition instead of
`left + 1 < right`
3. When searching the right part, set `left = middle + 1` instead of
`middle`
Then, we could delete "#define DISABLE_SIGN_COMPARE_WARNING" to enable
sign warnings check for "string-list".
Signed-off-by: shejialuo <shejialuo@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When inserting an existing element, "add_entry" would convert "index"
value to "-1-index" to indicate the caller that this element is in the
list already. However, in "string_list_insert", we would simply convert
this to the original positive index without any further action.
In 8fd2cb4069 (Extract helper bits from c-merge-recursive work,
2006-07-25), we create "path-list.c" and then introduce above code path.
Let's directly return the index as we don't care about whether the
element is in the list by using "add_entry". In the future, if we want
to let "add_entry" tell the caller, we may add "int *exact_match"
parameter to "add_entry" instead of converting the index to negative to
indicate.
Signed-off-by: shejialuo <shejialuo@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In "add_entry", we accept "insert_at" parameter which must be either -1
(auto) or between 0 and `list->nr` inclusive. Any other value is
invalid. When caller specify any invalid "insert_at" value, we won't
check the range and move the element, which would definitely cause the
trouble.
However, we only use "add_entry" in "string_list_insert" function and we
always pass the "-1" for "insert_at" parameter. So, we never use this
parameter to insert element in a user specified position.
And we should know why there is such code path in the first place. We
used to have another function "string_list_insert_at_index()", which
uses the extra "insert_at" parameter. And in f8c4ab611a (string_list:
remove string_list_insert_at_index() from its API, 2014-11-24), we
remove this function but we don't clean all the code path.
Let's simply delete this parameter as we'd better use "strmap" for such
functionality.
Signed-off-by: shejialuo <shejialuo@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
There are a couple of "-Wsign-compare" warnings in "string-list.c". Fix
trivial ones that result from a mismatched loop iterator type.
There is a single warning left after these fixes. This warning needs
a bit more care and is thus handled in subsequent commits.
Signed-off-by: shejialuo <shejialuo@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In the repo_refresh_and_write_index of read-cache.c, we return -1 to
indicate that writing the index to disk failed.
However, callers do not use this information. Commands such as stash print
"could not write index"
and then exit, which does not help to discover the exact problem.
We can let repo_hold_locked_index print the error message if the locking
failed.
Signed-off-by: Han Young <hanyang.tony@bytedance.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Avoid using a double negative, and keep in mind that --index and
--cached are distinct modes of operation.
Signed-off-by: Raymond E. Pasco <ray@ameretat.dev>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Test that applying a new file creation patch with --intent-to-add to
an existing index does not modify the index outside adding the correct
intents-to-add, and that applying a patch with both modifications
and new file creations with --intent-to-add correctly only adds
intents-to-add to the index.
Signed-off-by: Raymond E. Pasco <ray@ameretat.dev>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In the "apply only to files" mode (i.e., neither --index nor --cached
mode), the index should not be touched except to record intents to
add when --intent-to-add is on. Because having --intent-to-add on sets
update_index, to indicate that we may touch the index, we can't rely
only on that flag in create_file() (which is called to write both new
files and updated files) to decide whether to write an index entry;
if we did, we would write an index entry for every file being patched
(which would moreover be an intent-to-add entry despite not being a
new file, because we are going to turn on the CE_INTENT_TO_ADD flag
in add_index_entry() if we enter it here and ita_only is true).
To decide whether to touch the index, we need to check the
specific reason the index would be updated, rather than merely
their aggregate in the update_index flag. Because we have already
entered write_out_results() and are performing writes, we know that
state->apply is true. If state->check_index is additionally true, we
are in --index or --cached mode, which updates the index and should
always write, whereas if we are merely in ita_only mode we must only
write if the patch is a new file creation patch.
Signed-off-by: Raymond E. Pasco <ray@ameretat.dev>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
There are three main modes of operation for apply: applying only to the
worktree, applying to the worktree and index (--index), and applying
only to the index (--cached).
The --intent-to-add flag modifies the first of these modes, applying
only to the worktree, in a way which touches the index, because intents
to add are special index entries. However, since its introduction
in cff5dc09ed (apply: add --intent-to-add, 2018-05-26), it has not
worked correctly in any but the most trivial (empty repository)
cases, because the index is never read in (in apply, this is done in
read_apply_cache()) before writing to it.
This causes the operation to clobber the old, correct index with a
new empty-tree index before writing intent-to-add entries to this
empty index; the final result is that the index now records every
existing file in the repository as deleted, which is incorrect.
This error can be corrected by first reading the index. The
update_index flag is correctly set if ita_only is true, because this
flag causes the index to be updated. However, if we merely gate the
call to read_apply_cache() behind update_index, then it will not be
read when state->apply is false, even if it must be checked due to
being in --index or --cached mode. Therefore, we instead read the
index if it will be either checked or updated, because reading the
index is a prerequisite to either.
Reported-by: Ryan Hodges <rhodges@cisco.com>
Original-patch-by: Johannes Altmanninger <aclopte@gmail.com>
Signed-off-by: Raymond E. Pasco <ray@ameretat.dev>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When the user gives us a diff filter like --diff-filter=D, we need to do
a tree diff even if we're not planning to show the diff result itself,
in order to decide whether to show the commit at all. So there's an
explicit check of revs->diffopt.filter in setup_revisions(), and we set
revs->diff if any bits are set.
Originally that "filter" field covered both positive capital-letter
filters (like "D") and also negative lowercase filters (like "d"), so it
was sufficient for both cases. But later, 75408ca949 (diff-filter: be
more careful when looking for negative bits, 2022-01-28) split the
negative bits out into a "filter_not" field.
We eventually fold those into "filter", but not until diff_setup_done()
is called, which happens after our explicit check. As a result, a purely
negative filter like:
git log --diff-filter=d
failed to turn on diffs at all. But rather than fail to filter by diff,
because the filter variable is eventually set, we mistakenly show no
commits at all, thinking that the empty diffs were cases where nothing
passed through the filter.
The smallest fix here is to just have our check look for any bits in
either "filter" or "filter_not". I suspect it would also be OK to
reorder the function a bit to call diff_setup_done() earlier, but that
risks violating some other subtle ordering dependency. So I went with
the simple and safe solution here.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
With the preceding commit we have announced the switch to the "reftable"
format in Git 3.0 for newly created repositories. The format is being
battle tested by GitLab and a couple of other developers, and except for
a small handful of issues exposed early after it has been merged it has
been rock solid. Regardless of that though the test user base is still
comparatively small, which increases the risk that we miss critical
bugs.
Address this by enabling the reftable format when experimental features
are enabled. This should increase the test user base by some margin and
thus give us more input before making the format the default.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The "reftable" format has come a long way and has matured nicely since
it has been merged into git via 57db2a094d5 (refs: introduce reftable
backend, 2024-02-07). It fixes longstanding issues that cannot be fixed
with the "files" format in a backwards-compatible way and performs
significantly better in many use cases.
Announce that we will switch to the "reftable" format in Git 3.0 for
newly created repositories and wire up the change, hidden behind the
WITH_BREAKING_CHANGES preprocessor define.
This switch is dependent on support in the larger Git ecosystem. Most
importantly, libraries like JGit, libgit2 and Gitoxide should support
the reftable backend so that we don't break all applications and tools
built on top of those libraries.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git imap-send" has been broken for a long time, which has been
resurrected and then taught to talk OAuth2.0 etc.
* ag/imap-send-resurrection:
imap-send: fix minor mistakes in the logs
imap-send: display the destination mailbox when sending a message
imap-send: display port alongwith host when git credential is invoked
imap-send: add ability to list the available folders
imap-send: enable specifying the folder using the command line
imap-send: add PLAIN authentication method to OpenSSL
imap-send: add support for OAuth2.0 authentication
imap-send: gracefully fail if CRAM-MD5 authentication is requested without OpenSSL
imap-send: fix memory leak in case auth_cram_md5 fails
imap-send: fix bug causing cfg->folder being set to NULL
A previous commit removed the last user of it, and it is no
longer useful with the codebase moving towards C99, which
specifies its definition.
Signed-off-by: Carlo Marcelo Arenas Belón <carenas@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
FreeBSD 6 introduced memmem(), but the implementation diverged
from what was standard everywhere else (including our "compat"
fallback).
FreeBSD 10.4 (went EOL in 2018) corrected the functionality bugs
but kept a suboptimal implementation until FreeBSD 11.4 (the last
version of FreeBSD 11, that went EOL in September 2021).
Let's draw the line to require FreeBSD 12 or newer, which allows us
to drop the special casing of FreeBSD 4.x and rely on the platform
implementation of memmem() unconditionally for all versions that are
still being supported.
Suggested-by: Brad Smith <brad@comstyle.com>
Helped-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Carlo Marcelo Arenas Belón <carenas@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The Makefile has a 'style' rule to run 'git clang-format'. While Meson
intrinsically supports a 'clang-format' target, which can be run when
using the ninja backend by running 'ninja clang-format', this runs the
formatting on all existing files.
Our Meson build doesn't yet support a way to run 'git clang-format',
which runs the formatter between the working directory and commit
provided. Add a new 'style' target to Meson to mimic the target in the
Makefile.
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In 1b8f306612 (ci/style-check: add `RemoveBracesLLVM` in CI job,
2024-07-23) we added 'RemoveBracesLLVM' to the CI job of running the
clang formatter.
This rule checks and warns against using braces on simple
single-statement bodies of statements. Since we haven't had any issues
regarding this rule, we can now move it into the main clang-format
config and remove it from being CI exclusive.
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When clang-format was introduced to the Git project in
6134de6ac1 (clang-format: outline the git project's coding style,
2017-08-14), the 'ColumnLimit' was set to 80. This is inline with our
recommendation in 'Documentation/CodingGuidelines', which states:
We try to keep to at most 80 characters per line.
However while this is recommended limit, this is not the enforced
limit. In some cases in we do overflow this limit to prioritize
readability. Setting the 'ColumnLimit' also means that shorter lines are
concatenated to simply as the result would still be below 80 characters,
which is undesirable.
In the past, we tried to adjust the penalties around line wrapping, once
in 42efde4c29 (clang-format: adjust line break penalties, 2017-09-29)
and another time in 5e9fa0f9fa (clang-format: re-adjust line break
penalties, 2024-10-18). While these settings help tweak the line break
penalties to be more in-line with the requirements of the Git project,
using 'clang-format' still produces a lot of false positives.
So to make 'clang-format' more usable, set the 'ColumnLimit' to 0. This
means that line-wrapping is no-longer a concern of the formatter and
something that the user needs to take care of. The previous commit also
added a more flexible guideline to the '.editorconfig' setting a
'max_line_length' of 120 characters. This should provide some guidance
to users.
In the future, it would be nice to re-instate this limit with adequate
penalties which would follow our guidelines, but currently, it makes
more sense to have a working formatter which we can rely on and which
doesn't create too many false positives.
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The refs_warn_dangling_symrefs interface is a bit fragile as it passes
in printf-formatting strings with expectations about the number of
arguments. This patch series made it worse by adding a 2nd positional
argument. But there are only two call sites, and they both use almost
identical display options.
Make this safer by moving the format strings into the function that uses
them to make it easier to see when the arguments don't match. Pass a
prefix string and a dry_run flag so the decision logic can be handled
where needed.
Signed-off-by: Phil Hord <phil.hord@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The dangling warning function that takes a single ref to search for
is no longer used. Remove it.
Signed-off-by: Phil Hord <phil.hord@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When pruning during `git fetch` we check each pruned ref against the
ref_store one at a time to decide whether to report it as dangling.
This causes every local ref to be scanned for each ref being pruned.
If there are N refs in the repo and M refs being pruned, this code is
O(M*N). However, `git remote prune` uses a very similar function that
is only O(N*log(M)).
Remove the wasteful ref scanning for each pruned ref and use the faster
version already available in refs_warn_dangling_symrefs. Change the
message to include the original refname since the message is no longer
printed immediately after the line that did just print the refname.
In a repo with 126,000 refs, where I was pruning 28,000 refs, this
code made about 3.6 billion calls to strcmp and consumed 410 seconds
of CPU. (Invariably in that time, my remote would timeout and the
fetch would fail anyway.)
After this change, the same operation completes in under a second.
Signed-off-by: Phil Hord <phil.hord@gmail.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Our document on breaking changes indicates that we intend to default to
SHA-256 in Git 3.0. Since most people choose the default option, this
is an important security upgrade to our defaults.
To allow people to test this case, when WITH_BREAKING_CHANGES is set in
the configuration, build Git with SHA-256 as the default hash. Update
the testsuite to use the build options information to automatically
choose the right value.
Note that if the command substitution for GIT_TEST_BUILTIN_HASH fails,
so does the testsuite—and quite spectacularly at that. Thus, the case
where the Git binary is somehow subtly broken will not go undetected.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We'd like users to be able to determine the hash algorithm that is the
builtin default in their version of Git. This is useful for
troubleshooting, especially when we decide to change the default. Add
an entry for the default hash in the output of git version
--build-options so that users can easily access that information and
include it in bug reports.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Right now, the built-in default hash is always SHA-1, but that will
change in a future commit. Instead of assuming that operating outside
of a repository will always use SHA-1, look up the default hash
algorithm for operating outside of a repository using an appropriate
environment variable, which will always be correct.
Additionally, for operations outside of a repository, use the
DEFAULT_HASH_ALGORITHM prerequisite rather than SHA1.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Right now, the built-in default hash is always SHA-1, but that will
change in a future commit. Instead of assuming that operating outside
of a repository will always use SHA-1, provide constants for both
algorithms and then simply ask test_oid for the built-in hash instead,
which will always be correct.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Right now, the built-in default hash is always SHA-1, but that will
change in a future commit. Instead of assuming that operating outside
of a repository will always use SHA-1, simply ask test_oid for the
built-in hash instead, which will always be correct.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Right now, the default compile-time hash is SHA-1. However, in the
future, this might change and it would be helpful to gracefully handle
this case in our testsuite.
To avoid making these assumptions, let's introduce a variable that
contains the built-in default hash and use it in our setup code as the
fallback value if no hash was explicitly set. For now, this is always
SHA-1, but in a future commit, we'll allow adjusting this and the
variable will be more useful.
To allow us to make our tests more robust, allow test_oid to take the
--hash=builtin option to specify this hash, whatever it is.
Additionally, add a DEFAULT_HASH_ALGORITHM prerequisite to check for the
compile-time hash.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When we define a new repository format with REPOSITORY_FORMAT_INIT, we
always use GIT_HASH_SHA1, and this value ends up getting used as the
default value to initialize a repository if none of the command line,
environment, or config tell us to do otherwise.
Because we might not always want to use SHA-1 as the default, let's
instead specify the default hash algorithm constant so that we will use
whatever the specified default is.
However, we also need to continue to read older repositories. If we're
in a v0 repository or extensions.objectformat is not set, then we must
continue to default to the original hash algorithm: SHA-1. If an
algorithm is set explicitly, however, it will override the hash_algo
member of the repository_format struct and we'll get the right value.
Similarly, if the repository was initialized before Git 0.99.3, then it
may lack a core.repositoryformatversion key, and some repositories lack
a config file altogether. In both cases, format->version is -1 and we
need to assume that SHA-1 is in use.
Because clear_repository_format reinitializes the struct
repository_format and therefore sets the hash_algo member to the default
(which could in the future not be SHA-1), we need to reset this member
explicitly. We know, however, that at the point we call
read_repository_format, we are actually reading an existing repository
and not initializing a new one or operating outside of a repository, so
we are not changing the default behavior back to SHA-1 if the default
algorithm is different.
It is potentially questionable that we ignore all repository
configuration if there is a config file but it doesn't have
core.repositoryformatversion set, in which case we reset all of the
configuration to the default. However, it is unclear what the right
thing to do instead with such an old repository is and a simple git init
will add the missing entry, so for now, we simply honor what the
existing code does and reset the value to the default, simply adding our
initialization to SHA-1.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We have a large variety of data formats and protocols where no hash
algorithm was defined and the default was assumed to always be SHA-1.
Instead of explicitly stating SHA-1, let's use the constant to represent
the legacy hash algorithm (which is still SHA-1) so that it's clear
for documentary purposes that it's a legacy fallback option and not an
intentional choice to use SHA-1.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We have some commands that can operate inside or outside a repository.
If we're operating outside a repository, we clearly cannot use the
repository's hash algorithm as a default since it doesn't exist, so
instead, let's pick the default instead of specifically SHA-1. Right
now this results in no functional change since the default is SHA-1, but
that may change in the future.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We have a a variety of uses of GIT_HASH_SHA1 littered throughout our
code. Some of these really mean to represent specifically SHA-1, but
some actually represent the original hash algorithm used in Git which is
implied by older, legacy formats and protocols which do not contain hash
information. For instance, the bundle v1 and v2 formats do not contain
hash algorithm information, and thus SHA-1 is implied by the use of
these formats.
Add a constant for documentary purposes which indicates this value. It
will always be the same as SHA-1, since this is an essential part of
these formats, but its use indicates this particular reason and not any
other reason why SHA-1 might be used.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Right now, SHA-1 is the default hash algorithm in Git. However, this
may change in the future.
We have many places in our code that use the SHA-1 constant to indicate
the default hash if none is specified, but it will end up being more
practical to specify this explicitly and clearly using a constant for
whatever the default hash algorithm is. Then, if we decide to change it
in the future, we can simply replace the constant representing the
default with a new value.
For these reasons, introduce GIT_HASH_DEFAULT to represent the default
hash algorithm.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Rename `read_object_with_reference()` to `odb_read_object_peeled()` to
match other functions related to the object database and our modern
coding guidelines. Furthermore though, the old name didn't really
describe very well what this function actually does, which is to walk
down any commit and tag objects until an object of the required type has
been found. This is generally referred to as "peeling", so the new name
should be way more descriptive.
No compatibility wrapper is introduced as the function is not used a lot
throughout our codebase.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Rename `pretend_object_file()` to `odb_pretend_object()` to match other
functions related to the object database and our modern coding
guidelines.
No compatibility wrapper is introduced as the function is not used a lot
throughout our codebase.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Rename `has_object()` to `odb_has_object()` to match other functions
related to the object database and our modern coding guidelines.
Introduce a compatibility wrapper so that any in-flight topics will
continue to compile.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Rename `repo_read_object_file()` to `odb_read_object()` to match other
functions related to the object database and our modern coding
guidelines.
Introduce a compatibility wrapper so that any in-flight topics will
continue to compile.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Rename `oid_object_info()` to `odb_read_object_info()` as well as their
`_extended()` variant to match other functions related to the object
database and our modern coding guidelines.
Introduce compatibility wrappers so that any in-flight topics will
continue to compile.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
All of the external functions provided by the object database subsystem
don't depend on `the_repository` anymore, but some internal functions
still do. Refactor those cases by plumbing through the repository that
owns the object database.
This change allows us to get rid of the `USE_THE_REPOSITORY_VARIABLE`
preprocessor define.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The "--recursive" flag for git-grep(1) allows users to grep for a string
across submodule boundaries. To make this work we add each submodule's
object sources to our own object database so that the objects can be
accessed directly.
The infrastructure for this depends on a global string list of submodule
paths. The caller is expected to call `add_submodule_odb_by_path()` for
each source and the object database will then eventually register all
submodule sources via `do_oid_object_info_extended()` in case it isn't
able to look up a specific object.
This reliance on global state is of course suboptimal with regards to
our libification efforts.
Refactor the logic so that the list of submodule sources is instead
tracked in the object database itself. This allows us to lose the
condition of `r == the_repository` before registering submodule sources
as we only ever add submodule sources to `the_repository` anyway. As
such, behaviour before and after this refactoring should always be the
same.
Rename the functions accordingly.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The functions `set_temporary_primary_odb()` and `restore_primary_odb()`
are responsible for managing a temporary primary source for the
database. Both of these functions implicitly rely on `the_repository`.
Refactor them to instead take an explicit object database parameter as
argument and adjust callers. Rename the functions accordingly.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
There are a couple of iterator-style functions that execute a callback
for each instance of a given set, all of which currently depend on
`the_repository`. Refactor them to instead take an object database as
parameter so that we can get rid of this dependency.
Rename the functions accordingly.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The functions to manage alternates all depend on `the_repository`.
Refactor them to accept an object database as a parameter and adjust all
callers. The functions are renamed accordingly.
Note that right now the situation is still somewhat weird because we end
up using the object store path provided by the object store's repository
anyway. Consequently, we could have instead passed in a pointer to the
repository instead of passing in the pointer to the object store. This
will be addressed in subsequent commits though, where we will start to
use the path owned by the object store itself.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Get rid of our dependency on `the_repository` in `odb_mkstemp()` by
passing in the object database as a parameter and adjusting all callers.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Get rid of our dependency on `the_repository` in `assert_oid_type()` by
passing in the object database as a parameter and adjusting all callers.
Rename the function to `odb_assert_oid_type()`.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Get rid of our dependency on `the_repository` in `find_odb()` by passing
in the object database in which we want to search for the source and
adjusting all callers.
Rename the function to `odb_find_source()`.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In subsequent commits we'll get rid of our use of `the_repository` in
"odb.c" in favor of explicitly passing in a `struct object_database` or
a `struct odb_source`. In some cases though we'll need access to the
repository, for example to read a config value from it, but we don't
have a way to access the repository owning a specific object database.
Introduce parent pointers for `struct object_database` to its owning
repository as well as for `struct odb_source` to its owning object
database, which will allow us to adapt those use cases.
Note that this change requires us to pass through the object database to
`link_alt_odb_entry()` so that we can set up the parent pointers for any
source there. The callchain is adapted to pass through the object
database accordingly.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In the preceding commits we have renamed the structures contained in
"object-store.h" to `struct object_database` and `struct odb_backend`.
As such, the code files "object-store.{c,h}" are confusingly named now.
Rename them to "odb.{c,h}" accordingly.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The `object_directory` structure is used as an access point for a single
object directory like ".git/objects". While the structure isn't yet
fully self-contained, the intent is for it to eventually contain all
information required to access objects in one specific location.
While the name "object directory" is a good fit for now, this will
change over time as we continue with the agenda to make pluggable object
databases a thing. Eventually, objects may not be accessed via any kind
of directory at all anymore, but they could instead be backed by any
kind of durable storage mechanism. While it seems quite far-fetched for
now, it is thinkable that eventually this might even be some form of a
database, for example.
As such, the current name of this structure will become worse over time
as we evolve into the direction of pluggable ODBs. Immediate next steps
will start to carve out proper self-contained object directories, which
requires us to pass in these object directories as parameters. Based on
our modern naming schema this means that those functions should then be
named after their subsystem, which means that we would start to bake the
current name into the codebase more and more.
Let's preempt this by renaming the structure. There have been a couple
alternatives that were discussed:
- `odb_backend` was discarded because it led to the association that
one object database has a single backend, but the model is that one
alternate has one backend. Furthermore, "backend" is more about the
actual backing implementation and less about the high-level concept.
- `odb_alternate` was discarded because it is a bit of a stretch to
also call the main object directory an "alternate".
Instead, pick `odb_source` as the new name. It makes it sufficiently
clear that there can be multiple sources and does not cause confusion
when mixed with the already-existing "alternate" terminology.
In the future, this change allows us to easily introduce for example a
`odb_files_source` and other format-specific implementations.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The `raw_object_store` structure is the central entry point for reading
and writing objects in a repository. The main purpose of this structure
is to manage object directories and provide an interface to access and
write objects in those object directories.
Right now, many of the functions associated with the raw object store
implicitly rely on `the_repository` to get access to its `objects`
pointer, which is the `raw_object_store`. As we want to generally get
rid of using `the_repository` across our codebase we will have to
convert this implicit dependency on this global variable into an
explicit parameter.
This conversion can be done by simply passing in an explicit pointer to
a repository and then using its `->objects` pointer. But there is a
second effort underway, which is to make the object subsystem more
selfcontained so that we can eventually have pluggable object backends.
As such, passing in a repository wouldn't make a ton of sense, and the
goal is to convert the object store interfaces such that we always pass
in a reference to the `raw_object_store` instead.
This will expose the `raw_object_store` type to a lot more callers
though, which surfaces that this type is named somewhat awkwardly. The
"raw_" prefix makes readers wonder whether there is a non-raw variant of
the object store, but there isn't. Furthermore, we nowadays want to name
functions in a way that they can be clearly attributed to a specific
subsystem, but calling them e.g. `raw_object_store_has_object()` is just
too unwieldy, even when dropping the "raw_" prefix.
Instead, rename the structure to `object_database`. This term is already
used a lot throughout our codebase, and it cannot easily be mistaken for
"object directories", either. Furthermore, its acronym ODB is already
well-known and works well as part of a function's name, like for example
`odb_has_object()`.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
t5310 lacks a test to ensure git works correctly when commit bitmap
data is corrupted. So this patch add test helper in pack-bitmap.c to
list each commit bitmap position in bitmap file and `load corrupt bitmap`
test case in t/t5310 to corrupt a commit bitmap before loading it.
Signed-off-by: Lidong Yan <502024330056@smail.nju.edu.cn>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The comment in pack-bitmap.c:test_bitmap_commits(), suggests that
we can avoid reading the commit table altogether. However, this
comment is misleading. The reason we load bitmap entries here is
because test_bitmap_commits() needs to print the commit IDs from the
bitmap, and we must read the bitmap entries to obtain those commit IDs.
So reword this comment.
Signed-off-by: Lidong Yan <502024330056@smail.nju.edu.cn>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
After going through the "failed" label, load_bitmap() will return -1,
and its caller (either prepare_bitmap_walk() or prepare_bitmap_git())
will then call free_bitmap_index().
That function would have done:
struct stored_bitmap *sb;
kh_foreach_value(b->bitmaps, sb {
ewah_pool_free(sb->root);
free(sb);
});
, but won't since load_bitmap() already called kh_destroy_oid_map() and
NULL'd the "bitmaps" pointer from within its "failed" label. Thus if you
got part of the way through loading bitmap entries and then failed, you
would leak all of the previous entries that you were able to load
successfully.
The solution is to remove the error handling code in load_bitmap(), because
its caller will always call free_bitmap_index() in case of an error.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Lidong Yan <502024330056@smail.nju.edu.cn>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Previous commit has plugged one leak in the normal code path, but
there is an early exit that leaves without releasing any resources
acquired in the function.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
4e513890008 (builtin/config: introduce "get" subcommand, 2024-05-06)
introduced `get` and `--url` but didn’t add `--url` to the synopsis.
Acked-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Kristoffer Haugsbakk <code@khaugsbakk.name>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This option was introduced in a series of commits from fe3ccc7aab (Merge
branch 'ps/config-subcommands', 2024-05-15) and deprecated
`value-pattern`. But `value-pattern` is still used throughout the doc.
The deprecated modes have been quarantined in the “Deprecated Modes”
section. So let’s only use `--value=<pattern>` in the rest of the doc.
Acked-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Kristoffer Haugsbakk <code@khaugsbakk.name>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
These options were introduced in a series of commits from
fe3ccc7aab (Merge branch 'ps/config-subcommands', 2024-05-15).[1]
But they were not documented here.
Document this option and the negated form according to the current
convention.[2]
[1]: `--value` is a replacement for the `value-pattern`
positional argument
[2]: https://lore.kernel.org/git/xmqqcyct1mtq.fsf@gitster.g/
Acked-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Kristoffer Haugsbakk <code@khaugsbakk.name>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This option was introduced in a series of commits from fe3ccc7aab (Merge
branch 'ps/config-subcommands', 2024-05-15). But two styles were used
for the value provided to the option:
1. Synopsis: `--value=<value>`
2. Deprecated Modes: `--value=<pattern>`
(2) is also used in the synopsis on the command.
Use (2) consistently throughout since it’s a pattern in the general
case (`value` sounds more generic).
Acked-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Kristoffer Haugsbakk <code@khaugsbakk.name>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
These options were introduced in 4e513890008 (builtin/config:
introduce "get" subcommand, 2024-05-06) but not documented here.
Use the description from the source code.
Document this option and the negated form according to the current
convention.[1]
`--show-names` is also the default when `--get-regexp` is given. But
don’t mention it here since all the deprecated modes are quarantined in
the “Deprecated Modes” section.
[1]: https://lore.kernel.org/git/xmqqcyct1mtq.fsf@gitster.g/
Acked-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Kristoffer Haugsbakk <code@khaugsbakk.name>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
FreeBSD 13.4 is no longer supported, and 13.5 will be the last
release from that series, so jump instead to 14.3 which should
be supported for another 10 months and will be at that point
the oldest supported release with the interim release of 15.
While at it, move some variables to the environment and make
sure to skip a git grep test that assumes glibc regex.
Signed-off-by: Carlo Marcelo Arenas Belón <carenas@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
A test helper "test_seq" function learned the "-f <fmt>" option,
which allowed us to simplify a lot of test scripts.
* jk/test-seq-format:
test-lib: teach test_seq the -f option
t7422: replace confusing printf with echo
"git merge/pull" has been taught the "--compact-summary" option to
use the compact-summary format, intead of diffstat, when showing
the summary of the incoming changes.
* jc/merge-compact-summary:
merge/pull: extend merge.stat configuration variable to cover --compact-summary
merge/pull: add the "--compact-summary" option
An interchange format for stash entries is defined, and subcommand
of "git stash" to import/export has been added.
* bc/stash-export-import:
builtin/stash: provide a way to import stashes from a ref
builtin/stash: provide a way to export stashes to a ref
builtin/stash: factor out revision parsing into a function
object-name: make get_oid quietly return an error
Avoid regexp_constraint and instead use comparison_constraint when
listing functions to exclude from application of coccinelle rules,
as spatch can be built with different regexp engine X-<.
* jc/cocci-avoid-regexp-constraint:
cocci: matching (multiple) identifiers
Proton Mail is an privacy-focused email service gaining popularity.
Unfortunately, it does not provide an SMTP server to send emails.
Proton Mail Bridge is an official solution for paid users, and for free
users, a client named git-protonmail is available. Mention the same in the
docs.
Signed-off-by: Aditya Garg <gargaditya08@live.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
`sendmailCmd` is a configuration option in `git-send-email` that allows
users to send emails using an external application that supports
sendmail-like commands. This ability has been very useful to support
proprietary email APIs without modifying the `git-send-email` codebase.
It is also useful for users who prefer to use another SMTP client
instead of the SMTP perl library used by `git-send-email`.
This commit adds a paragraph to the documentation explaining this
option.
Signed-off-by: Aditya Garg <gargaditya08@live.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Yahoo and AOL, both advertise that they support app passwords for third-party
applications. But generating app passwords for them is broken and unreliable
for quite some time now. Yahoo already had an OAuth2.0 credential helper
added in the documentation, so I thought it would be a good idea to add one
for AOL accounts as well, which is more reliable and secure.
Signed-off-by: Aditya Garg <gargaditya08@live.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The documentation for command line option `--outlook-id-fix` is there in
the sendemail documentation, but the config option `sendemail.outlookidfix`
was missing. Add the same to the documentation.
White at it, also enclose the values `true` and `false` in backticks in
the documentation for `sendemail.mailmap`.
Signed-off-by: Aditya Garg <gargaditya08@live.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The description of `--smtp-ssl-cert-path` in the git-send-email documentation
mentions consulting OpenSSL's verify(1) manual page for details about the
`-CAfile` and `-CApath` options. However, the way it was written was quite
confusing, and it didn't mention that OpenSSL's verify(1) is the manual page
to refer to.
Fix this by slightly rewording the description and also add a link to the
OpenSSL verify(1) manual page.
Signed-off-by: Aditya Garg <gargaditya08@live.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The 'branch' section of the git-config documentation was missing
inline code formatting and emphasis for the <name> placeholder.
Both changes improve readability, especially when viewed online.
Signed-off-by: Jakub Ječmínek <kuba@kubajecminek.cz>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Since df076bdbcc ([PATCH] GIT: Listen on IPv6 as well, if available.,
2005-07-23), the original error checking was included in an inner loop
unchanged, where its effect was different.
Instead of retrying, after a EINTR during accept() in the listening
socket, it will advance to the next one and try with that instead,
leaving the client waiting for another round.
Make sure to retry with the same listener socket that failed originally.
To avoid an unlikely busy loop, fallback to the old behaviour after a
couple of attempts.
Signed-off-by: Carlo Marcelo Arenas Belón <carenas@gmail.com>
Acked-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Commit c8009635785e ("fetch-pack, send-pack: clean up shallow oid
array", 2024-09-25) cleaned up the shallow oid array in cmd_send_pack,
but didn't clean up extra_have, which is still leaked at program exit.
I suspect the particular tests in t5539 don't trigger any additions to
the extra_have array, which explains why the tests can pass leak free
despite this gap.
Signed-off-by: Jacob Keller <jacob.keller@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Since df076bdbcc ([PATCH] GIT: Listen on IPv6 as well, if available.,
2005-07-23), any file descriptor assigned to a listening socket was
validated to be within the range to be used in an FDSET later.
6573faff34 (NO_IPV6 support for git daemon, 2005-09-28), moves to
use poll() instead of select(), that doesn't have that restriction,
so remove the original check.
Signed-off-by: Carlo Marcelo Arenas Belón <carenas@gmail.com>
Acked-by: Phillip Wood <phillip.wood123@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Recent code added a direct access to the d_type member in "struct
dirent", but some platforms lack it, which has been corrected.
* jc/diff-no-index-with-pathspec-fix:
diff-no-index: do not reference .d_type member of struct dirent
"git maintenance" lacked the care "git gc" had to avoid holding
onto the repository lock for too long during packing refs, which
has been remedied.
* ps/maintenance-ref-lock:
builtin/maintenance: fix locking race when handling "gc" task
builtin/gc: avoid global state in `gc_before_repack()`
usage: allow dying without writing an error message
builtin/maintenance: fix locking race with refs and reflogs tasks
builtin/maintenance: split into foreground and background tasks
builtin/maintenance: fix typedef for function pointers
builtin/maintenance: extract function to run tasks
builtin/maintenance: stop modifying global array of tasks
builtin/maintenance: mark "--task=" and "--schedule=" as incompatible
builtin/maintenance: centralize configuration of explicit tasks
builtin/gc: drop redundant local variable
builtin/gc: use designated field initializers for maintenance tasks
"git whatchanged" that is longer to type than "git log --raw"
which is its modern rough equivalent has outlived its usefulness
more than 10 years ago. Plan to deprecate and remove it.
* jc/you-still-use-whatchanged:
whatschanged: list it in BreakingChanges document
whatchanged: remove when built with WITH_BREAKING_CHANGES
whatchanged: require --i-still-use-this
tests: prepare for a world without whatchanged
doc: prepare for a world without whatchanged
you-still-use-that??: help deprecating commands for removal
To improve support for symbolic port names in netrc files, this
changes does the following:
- Treat symbolic port names as ports, not protocols in git-credential-netrc
- Validate the SMTP server port provided to send-email
- Convert the above symbolic port names to their numerical values.
Before this change, it was not possible to have a SMTP server port set
to "smtps" in a netrc file (e.g. Emacs' ~/.authinfo.gpg), as it would
be registered as a protocol and break the match for a "smtp" protocol
host, as queried for by git-send-email.
Signed-off-by: Maxim Cournoyer <maxim@guixotic.coop>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Invalid ports were previously silently dropped; now a warning message
is produced.
Signed-off-by: Maxim Cournoyer <maxim@guixotic.coop>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
While the installed scripts have their Perl shebang set to PERL_PATH,
it is nevertheless useful to be able to run the uninstalled script for
manual tests while developing. This change makes the shebang more
portable by having the perl command looked from PATH instead of from a
fixed location.
Signed-off-by: Maxim Cournoyer <maxim@guixotic.coop>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In 9d2962a7c4 (receive-pack: use batched reference updates, 2025-05-19)
we updated the 'git-receive-pack(1)' command to use batched reference
updates. One edge case which was missed during this implementation was
when a user pushes multiple branches such as:
delete refs/heads/branch/conflict
create refs/heads/branch
Before using batched updates, the references would be applied
sequentially and hence no conflicts would arise. With batched updates,
while the first update applies, the second fails due to D/F conflict. A
similar issue was present in 'git-fetch(1)' and was fixed by separating
out reference pruning into a separate transaction in the commit 'fetch:
use batched reference updates'. Apply a similar mechanism for
'git-receive-pack(1)' and separate out reference deletions into its own
batch.
This means 'git-receive-pack(1)' will now use up to two transactions,
whereas before using batched updates it would use _at least_ two
transactions. So using batched updates is still the better option.
Add a test to validate this behavior.
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The commit 23fc8e4f61 (refs: implement batch reference update support,
2025-04-08) introduced support for batched reference updates. This
allows users to batch updates together, while allowing some of the
updates to fail.
Under the hood, batched updates use the reference transaction mechanism.
Each update which fails is marked as such. Any failed updates must be
skipped over in the rest of the code, as they wouldn't apply any more.
In two of the loops within 'files_transaction_finish()' of the files
backend, the failed updates aren't skipped over. This can cause a
SEGFAULT otherwise. Add the missing skips and a test to validate the
same.
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When asking to apply mailmap to both author and committer field
while showing a commit object, the field that appears later was not
correctly parsed and replaced, which has been corrected.
* sa/multi-mailmap-fix:
cat-file: fix mailmap application for different author and committer
Code clean-up.
* ac/preload-index-wo-the-repository:
preload-index: stop depending on 'the_repository'
environment: remove the global variable 'core_preload_index'
"git stash" recorded a wrong branch name when submodules are
present in the current checkout, which has been corrected.
* kj/stash-onbranch-submodule-fix:
stash: fix incorrect branch name in stash message
"git send-email" incremented its internal message counter when a
message was edited, which made logic that treats the first message
specially misbehave, which has been corrected.
* ag/send-email-edit-threading-fix:
send-email: show the new message id assigned by outlook in the logs
send-email: fix bug resulting in broken threads if a message is edited
"git subtree" (in contrib/) learns to grok GPG signing its commits.
* pw/subtree-gpg-sign:
contrib/subtree: add -S/--gpg-sign
contrib/subtree: parse using --stuck-long
The "seq" tool has a "-f" option to produce printf-style formatted
lines. Let's teach our test_seq helper the same trick. This lets us get
rid of some shell loops in test snippets (which are particularly verbose
in our test suite because we have to "|| return 1" to keep the &&-chain
going).
This converts a few call-sites I found by grepping around the test
suite. A few notes on these:
- In "seq", the format specifier is a "%g" float. Since test_seq only
supports integers, I've kept the more natural "%d" (which is what
these call sites were using already).
- Like "seq", test_seq automatically adds a newline to the specified
format. This is what all callers are doing already except for t0021,
but there we do not care about the exact format. We are just trying
to printf a large number of bytes to a file. It's not worth
complicating other callers or adding an option to avoid the newline
in that caller.
- Most conversions are just replacing a shell loop (which does get rid
of an extra fork, since $() requires a subshell). In t0612 we can
replace an awk invocation, which I think makes the end result more
readable, as there's less quoting.
- In t7422 we can replace one loop, but sadly we have to leave the
loop directly above it. This is because that earlier loop wants to
include the seq value twice in the output, which test_seq does not
support (nor does regular seq). If you run:
test_seq -f "foo-%d %d" 10
the second "%d" will always be the empty string. You might naively
think that test_seq could add some extra arguments, like:
# 3 ought to be enough for anyone...
printf "$fmt\n" "$i "$i" $i"
but that just triggers printf to format multiple lines, one per
extra set of arguments.
So we'd have to actually parse the format string, figure out how
many "%" placeholders are there, and then feed it that many
instances of the sequence number. The complexity isn't worth it.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The get_default_remote_submodule() function performs a lookup to find
the appropriate remote to use within a submodule. The function first
checks to see if it can find the remote for the current branch. If this
fails, it then checks to see if there is exactly one remote. It will use
this, before finally falling back to "origin" as the default.
If a user happens to rename their default remote from origin, either
manually or by setting something like clone.defaultRemoteName, this
fallback will not work.
In such cases, the submodule logic will try to use a non-existent
remote. This usually manifests as a failure to trigger the submodule
update.
The parent project already knows and stores the submodule URL in either
.gitmodules or its .git/config.
Add a new repo_remote_from_url() helper which will iterate over all the
remotes in a repository and return the first remote which has a matching
URL.
Refactor get_default_remote_submodule to find the submodule and get its
URL. If a valid URL exists, first try to obtain a remote using the new
repo_remote_from_url(). Fall back to the repo_default_remote()
otherwise.
The fallback logic is kept in case for some reason the user has manually
changed the URL within the submodule. Additionally, we still try to use
a remote rather than directly passing the URL in the
fetch_in_submodule() logic. This ensures that an update will properly
update the remote refs within the submodule as expected, rather than
just fetching into FETCH_HEAD.
Signed-off-by: Jacob Keller <jacob.keller@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
A future refactor got get_default_remote_submodule() is going to depend on
resolve_relative_url(). That function depends on get_default_remote().
Move get_default_remote_submodule() after resolve_relative_url() first
to make the additional functionality easier to review.
Signed-off-by: Jacob Keller <jacob.keller@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The repo_get_default_remote() function in submodule--helper currently
tries to figure out the proper remote name to use for a submodule based
on a few factors.
First, it tries to find the remote for the currently checked out branch.
This works if the submodule is configured to checkout to a branch
instead of a detached HEAD state.
In the detached HEAD state, the code calls back to using "origin", on
the assumption that this is the default remote name. Some users may
change this, such as by setting clone.defaultRemoteName, or by changing
the remote name manually within the submodule repository.
As a first step to improving this situation, refactor to reuse the logic
from remotes_remote_for_branch(). This function uses the remote from the
branch if it has one. If it doesn't then it checks to see if there is
exactly one remote. It uses this remote first before attempting to fall
back to "origin".
To allow using this helper function, introduce a repo_default_remote()
helper to remote.c which takes a repository structure. This helper will
load the remote configuration and get the "HEAD" branch. Then it will
call remotes_remote_for_branch to find the default remote.
Replace calls of repo_get_default_remote() with the calls to this new
function. To maintain consistency with the existing callers, continue
copying the returned string with xstrdup.
This isn't a perfect solution for users who change remote names, but it
should help in cases where the remote name is changed but users haven't
added any additional remotes.
Signed-off-by: Jacob Keller <jacob.keller@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The remotes_remote_get_1 (and its caller, remotes_remote_get, have an
implicit dependency on the_repository due to calling
read_branches_file() and read_remotes_file(), both of which use
the_repository. The branch_get() function calls set_merge() which has an
implicit dependency on the_repository as well.
Because of this use of the_repository, the helper functions cannot be
used in code paths which operate on other repositories. A future
refactor of the submodule--helper will want to make use of some of these
functions.
Refactor to break the dependency by passing struct repository *repo
instead of struct remote_state *remote_state in a few places.
The public callers and many other helper functions still depend on
the_repository. A repo-aware function will be exposed in a following
change for git submodule--helper.
Signed-off-by: Jacob Keller <jacob.keller@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Both submodule--helper.c and submodule-config.c have an implementation
of starts_with_dot_slash and starts_with_dot_dot_slash. The dir.h header
has starts_with_dot(_dot)_slash_native, which sets PATH_MATCH_NATIVE.
Move the helpers to dir.h as static inlines. I thought about renaming
them to postfix with _platform but that felt too long and ugly. On the
other hand it might be slightly confusing with _native.
This simplifies a submodule refactor which wants to use the helpers
earlier in the submodule--helper.c file.
Signed-off-by: Jacob Keller <jacob.keller@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The remote_clear() function failed to free the remote->push and
remote->fetch refspec fields.
This should be caught by the leak sanitizer. However, for callers which
use ``the_repository``, the values never go out of scope and the
sanitizer doesn't complain.
A future change is going to add a caller of read_config() for a
submodule repository structure, which would result in the leak sanitizer
complaining.
Fix remote_clear(), updating it to properly call refspec_clear() for
both the push and fetch members.
Signed-off-by: Jacob Keller <jacob.keller@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The branch structure has both branch->merge_name and branch->merge for
tracking the merge information. The former is allocated by add_merge()
and stores the names read from the configuration file. The latter is
allocated by set_merge() which is called by branch_get() when an
external caller requests a branch.
This leads to the confusing situation where branch->merge_nr tracks both
the size of branch->merge (once its allocated) and branch->merge_name.
The branch_release() function incorrectly assumes that branch->merge is
always set when branch->merge_nr is non-zero, and can potentially crash
if read_config() is called without branch_get() being called on every
branch.
In addition, branch_release() fails to free some of the memory
associated with the structure including:
* Failure to free the refspec_item containers in branch->merge[i]
* Failure to free the strings in branch->merge_name[i]
* Failure to free the branch->merge_name parent array.
The set_merge() function sets branch->merge_nr to 0 when there is no
valid remote_name, to avoid external callers seeing a non-zero merge_nr
but a NULL merge array. This results in failure to release most of the
merge data as well.
These issues could be fixed directly, and indeed I initially proposed
such a change at [1] in the past. While this works, there was some
confusion during review because of the inconsistencies.
Instead, its time to clean up the situation properly. Remove
branch->merge_name entirely. Instead, allocate branch->merge earlier
within add_merge() instead of within set_merge(). Instead of having
set_merge() copy from merge_name[i] to merge[i]->src, just have
add_merge() directly initialize merge[i]->src.
Modify the add_merge() to call xstrdup() itself, instead of having
the caller of add_merge() do so. This makes it more obvious which code
owns the memory.
Update all callers which use branch->merge_name[i] to use
branch->merge[i]->src instead.
Add a merge_clear() function which properly releases all of the
merge-related memory, and which sets branch->merge_nr to zero. Use this
both in branch_release() and in set_merge(), fixing the leak when
set_merge() finds no valid remote_name.
Add a set_merge variable to the branch structure, which indicates
whether set_merge() has been called. This replaces the previous use of a
NULL check against the branch->merge array.
With these changes, the merge array is always allocated when merge_nr is
non-zero.
This use of refspec_item to store the names should be safe. External
callers should be using branch_get() to obtain a pointer to the branch,
which will call set_merge(), and the callers internal to remote.c
already handle the partially initialized refpsec_item structure safely.
This end result is cleaner, and avoids duplicating the merge names
twice.
Signed-off-by: Jacob Keller <jacob.keller@gmail.com>
Link: [1] https://lore.kernel.org/git/20250617-jk-submodule-helper-use-url-v2-1-04cbb003177d@gmail.com/
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In ddee3703b3 (builtin/repack.c: add cruft packs to MIDX during
geometric repack, 2022-05-20), repack began adding cruft pack(s) to the
MIDX with '--write-midx' to ensure that the resulting MIDX was always
closed under reachability in order to generate reachability bitmaps.
While the previous patch added the '--stdin-packs=follow' option to
pack-objects, it is not yet on by default. Given that, suppose you have
a once-unreachable object packed in a cruft pack, which later becomes
reachable from one or more objects in a geometrically repacked pack.
That once-unreachable object *won't* appear in the new pack, since the
cruft pack was not specified as included or excluded when the
geometrically repacked pack was created with 'pack-objects
--stdin-packs' (*not* '--stdin-packs=follow', which is not on). If that
new pack is included in a MIDX without the cruft pack, then trying to
generate bitmaps for that MIDX may fail. This happens when the bitmap
selection process picks one or more commits which reach the
once-unreachable objects.
To mitigate this failure mode, commit ddee3703b3 ensures that the MIDX
will be closed under reachability by including cruft pack(s). If cruft
pack(s) were not included, we would fail to generate a MIDX bitmap. But
ddee3703b3 alludes to the fact that this is sub-optimal by saying
[...] it's desirable to avoid including cruft packs in the MIDX
because it causes the MIDX to store a bunch of objects which are
likely to get thrown away.
, which is true, but hides an even larger problem. If repositories
rarely prune their unreachable objects and/or have many of them, the
MIDX must keep track of a large number of objects which bloats the MIDX
and slows down object lookup.
This is doubly unfortunate because the vast majority of objects in cruft
pack(s) are unlikely to be read. But any object lookups that go through
the MIDX must binary search over them anyway, slowing down object
lookups using the MIDX.
This patch causes geometrically-repacked packs to contain a copy of any
once-unreachable object(s) with 'git pack-objects --stdin-packs=follow',
allowing us to avoid including any cruft packs in the MIDX. This is
because a sequence of geometrically-repacked packs that were all
generated with '--stdin-packs=follow' are guaranteed to have their union
be closed under reachability.
Note that you cannot guarantee that a collection of packs is closed
under reachability if not all of them were generated with "following" as
above. One tell-tale sign that not all geometrically-repacked packs in
the MIDX were generated with "following" is to see if there is a pack in
the existing MIDX that is not going to be somehow represented (either
verbatim or as part of a geometric rollup) in the new MIDX.
If there is, then starting to generate packs with "following" during
geometric repacking won't work, since it's open to the same race as
described above.
But if you're starting from scratch (e.g., building the first MIDX after
an all-into-one '--cruft' repack), then you can guarantee that the union
of subsequently generated packs from geometric repacking *is* closed
under reachability.
(One exception here is when "starting from scratch" results in a noop
repack, e.g., because the non-cruft pack(s) in a repository already form
a geometric progression. Since we can't tell whether or not those were
generated with '--stdin-packs=follow', they may depend on
once-unreachable objects, so we have to include the cruft pack in the
MIDX in this case.)
Detect when this is the case and avoid including cruft packs in the MIDX
where possible. The existing behavior remains the default, and the new
behavior is available with the config 'repack.midxMustIncludeCruft' set
to 'false'.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When invoked with '--stdin-packs', pack-objects will generate a pack
which contains the objects found in the "included" packs, less any
objects from "excluded" packs.
Packs that exist in the repository but weren't specified as either
included or excluded are in practice treated like the latter, at least
in the sense that pack-objects won't include objects from those packs.
This behavior forces us to include any cruft pack(s) in a repository's
multi-pack index for the reasons described in ddee3703b3
(builtin/repack.c: add cruft packs to MIDX during geometric repack,
2022-05-20).
The full details are in ddee3703b3, but the gist is if you
have a once-unreachable object in a cruft pack which later becomes
reachable via one or more commits in a pack generated with
'--stdin-packs', you *have* to include that object in the MIDX via the
copy in the cruft pack, otherwise we cannot generate reachability
bitmaps for any commits which reach that object.
Note that the traversal here is best-effort, similar to the existing
traversal which provides name-hash hints. This means that the object
traversal may hand us back a blob that does not actually exist. We
*won't* see missing trees/commits with 'ignore_missing_links' because:
- missing commit parents are discarded at the commit traversal stage by
revision.c::process_parents()
- missing tag objects are discarded by revision.c::handle_commit()
- missing tree objects are discarded by the list-objects code in
list-objects.c::process_tree()
But we have to handle potentially-missing blobs specially by making a
separate check to ensure they exist in the repository. Failing to do so
would mean that we'd add an object to the packing list which doesn't
actually exist, rendering us unable to write out the pack.
This prepares us for new repacking behavior which will "resurrect"
objects found in cruft or otherwise unspecified packs when generating
new packs. In the context of geometric repacking, this may be used to
maintain a sequence of geometrically-repacked packs, the union of which
is closed under reachability, even in the case described earlier.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
show_commit_pack_hint() has heretofore been a noop, so its position
within its compilation unit only needs to appear before its first use.
But the following commit will sometimes have `show_commit_pack_hint()`
call `show_object_pack_hint()`, so reorder the former to appear after
the latter to minimize the code movement in that patch.
Suggested-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
With '--unpacked', pack-objects adds loose objects (which don't appear
in any of the excluded packs from '--stdin-packs') to the output pack
without considering them as reachability tips for the name-hash
traversal.
This was an oversight in the original implementation of '--stdin-packs',
since the code which enumerates and adds loose objects to the output
pack (`add_unreachable_loose_objects()`) did not have access to the
'rev_info' struct found in `read_packs_list_from_stdin()`.
Excluding unpacked objects from that traversal doesn't affect the
correctness of the resulting pack, but it does make it harder to
discover good deltas for loose objects.
Now that the 'rev_info' struct is declared outside of
`read_packs_list_from_stdin()`, we can pass it to
`add_objects_in_unpacked_packs()` and add any loose objects as tips to
the above-mentioned traversal, in theory producing slightly tighter
packs as a result.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Once 'read_packs_list_from_stdin()' has called for_each_object_in_pack()
on each of the input packs, we do a reachability traversal to discover
names for any objects we picked up so we can generate name hash values
and hopefully get higher quality deltas as a result.
A future commit will change the purpose of this reachability traversal
to find and pack objects which are reachable from commits in the input
packs, but are packed in an unknown (not included nor excluded) pack.
Extract the code which initializes and performs the reachability
traversal to take place in the caller, not the callee, which prepares us
to share this code for the '--unpacked' case (see the function
add_unreachable_loose_objects() for more details).
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
At the bottom of cmd_pack_objects() we check which mode the command is
running in (e.g., generating a cruft pack, handling '--stdin-packs',
using the internal rev-list, etc.) and handle the mode appropriately.
The '--stdin-packs' case is handled inline (dating back to its
introduction in 339bce27f4 (builtin/pack-objects.c: add '--stdin-packs'
option, 2021-02-22)) since it is relatively short. Extract the body of
"if (stdin_packs)" into its own function to prepare for the
implementation to become lengthier in a following commit.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In add_object_entry_from_pack() we declare 'revs' (given to us through
the miscellaneous context argument) earlier in the "if (p)" conditional
than is necessary. Move it down as far as it can go to reduce its
scope.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
pack-objects has a handful of explicit checks for pairs of command-line
options which are mutually incompatible. Many of these pre-date
a699367bb8 (i18n: factorize more 'incompatible options' messages,
2022-01-31).
Convert the explicit checks into die_for_incompatible_opt2() calls,
which simplifies the implementation and standardizes pack-objects'
output when given incompatible options (e.g., --stdin-packs with
--filter gives different output than --keep-unreachable with
--unpack-unreachable).
There is one minor piece of test fallout in t5331 that expects the old
format, which has been corrected.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
While looping over a counter "i", we do:
printf "[submodule \"sm-$i\"]\npath = recursive-submodule-path-$i\n" "$i"
So we are passing "$i" as an argument to be filled in, but there is no
"%" placeholder in the format string, which is a bit confusing to read.
We could switch both instances of "$i" to "%d" (and pass $i twice). But
that makes the line even longer. Let's just keep interpolating the value
in the string, and drop the confusing extra "$i" argument.
And since we are not using any printf specifiers at all, it becomes
clear that we can swap it out for echo. We do use a "\n" in the middle
of the string, but breaking this into two separate echo statements
actually makes it easier to read.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
With "make coccicheck", we generate contrib/coccinelle/*.cocci.patch
files that contain changes suggested by semantic patches, but "make"
succeeds. Admittedly, not many developers may run "make coccicheck"
in the first place, but it makes it harder to notice when they do
run it after they introduced an iffy piece of code.
Check that the resulting cocci.patch files are all empty.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"make coccicheck" seems to work OK at GitHub CI using
$ spatch --version
spatch version 1.1.1 compiled with OCaml version 4.13.1
OCaml scripting support: yes
Python scripting support: yes
Syntax of regular expressions: PCRE
but not with
$ spatch --version
spatch version 1.3 compiled with OCaml version 5.3.0
OCaml scripting support: yes
Python scripting support: yes
Syntax of regular expressions: Str
Judging from https://ocaml.org/manual/5.3/api/Str.html, I suspect
that this probably is caused by the distinction between BRE vs PCRE.
As there is no reasonably clean way to write the multiple choice
matches portably between these two pattern languages, let's stop
using regexp_constraint and use compare_constraint instead when
listing the function names to exclude.
There are other uses of "!~" but they all want to match a single
simple token, that should work fine either with BRE or PCRE.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Replace basic error messages with more helpful ones that guide users
on how to resolve configuration issues. When imap.host or imap.folder
are missing, provide the exact git config commands needed to fix the
problem, along with examples of typical values.
Use the advise() API to display hints in a multi-line format with
proper "hint:" prefixes for each line.
Signed-off-by: Jörg Thalheim <joerg@thalheim.io>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The error message 'no imap store specified' is misleading because
it refers to 'store' when the actual missing configuration is
'imap.folder'. Update the message to use the correct terminology
that matches the configuration variable name.
This reduces confusion for users who might otherwise look for
non-existent 'imap.store' configuration when they see this error.
Signed-off-by: Jörg Thalheim <joerg@thalheim.io>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* ag/imap-send-resurrection:
imap-send: fix minor mistakes in the logs
imap-send: display the destination mailbox when sending a message
imap-send: display port alongwith host when git credential is invoked
imap-send: add ability to list the available folders
imap-send: enable specifying the folder using the command line
imap-send: add PLAIN authentication method to OpenSSL
imap-send: add support for OAuth2.0 authentication
imap-send: gracefully fail if CRAM-MD5 authentication is requested without OpenSSL
imap-send: fix memory leak in case auth_cram_md5 fails
imap-send: fix bug causing cfg->folder being set to NULL
Some minor mistakes have been found in the logs. Most of them include
error messages starting with a capital letter, and ending with a period.
Abbreviations like "IMAP" and "OK" should also be in uppercase. Another
mistake was that the error message showing unknown authentication
mechanism used was displaying the host rather than the mechanism in the
logs. Fix them.
Signed-off-by: Aditya Garg <gargaditya08@live.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Whenever we sent a message using the `imap-send` command, it would
display a log showing the number of messages which are to be sent.
For example:
sending 1 message
100% (1/1) done
This had been made more informative by adding the name of the destination
folder as well:
Sending 1 message to Drafts folder...
100% (1/1) done
Signed-off-by: Aditya Garg <gargaditya08@live.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When requesting for passsword, git credential helper used to display
only the host name. For example:
Password for 'imaps://gargaditya08%40live.com@outlook.office365.com':
Now, it will display the port along with the host name:
Password for 'imaps://gargaditya08%40live.com@outlook.office365.com:993':
This has been done to make credential helpers more specific for ports.
Also, this behaviour will also mimic git send-email, which displays
the port along with the host name when requesting for a password.
Signed-off-by: Aditya Garg <gargaditya08@live.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Various IMAP servers have different ways to name common folders.
For example, the folder where all deleted messages are stored is often
named "[Gmail]/Trash" on Gmail servers, and "Deleted" on Outlook.
Similarly, the Drafts folder is simply named "Drafts" on Outlook, but
on Gmail it is named "[Gmail]/Drafts".
This commit adds a `--list` command to the `imap-send` tool that lists
the available folders on the IMAP server, allowing users to see
which folders are available and how they are named. A sample output
looks like this when run against a Gmail server:
Fetching the list of available folders...
* LIST (\HasNoChildren) "/" "INBOX"
* LIST (\HasChildren \Noselect) "/" "[Gmail]"
* LIST (\All \HasNoChildren) "/" "[Gmail]/All Mail"
* LIST (\Drafts \HasNoChildren) "/" "[Gmail]/Drafts"
* LIST (\HasNoChildren \Important) "/" "[Gmail]/Important"
* LIST (\HasNoChildren \Sent) "/" "[Gmail]/Sent Mail"
* LIST (\HasNoChildren \Junk) "/" "[Gmail]/Spam"
* LIST (\Flagged \HasNoChildren) "/" "[Gmail]/Starred"
* LIST (\HasNoChildren \Trash) "/" "[Gmail]/Trash"
For OpenSSL, this is achived by running the 'IMAP LIST' command and
parsing the response. This command is specified in RFC6154:
https://datatracker.ietf.org/doc/html/rfc6154#section-5.1
For libcurl, the example code published in the libcurl documentation
is used to implement this functionality:
https://curl.se/libcurl/c/imap-list.html
Signed-off-by: Aditya Garg <gargaditya08@live.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Some users may very often want to imap-send messages to a folder
other than the default set in the config. Add a command line
argument for the same.
While at it, fix minor mark-up inconsistencies in the existing
documentation text.
Signed-off-by: Aditya Garg <gargaditya08@live.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The current implementation for PLAIN in imap-send works just fine
if using curl, but if attempted to use for OpenSSL, it is treated
as an invalid mechanism. The default implementation for OpenSSL is
IMAP LOGIN command rather than AUTH PLAIN. Since AUTH PLAIN is
still used today by many email providers in form of app passwords,
lets add an implementation that can use AUTH PLAIN if specified.
Signed-off-by: Aditya Garg <gargaditya08@live.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Unlike PLAIN, XOAUTH2 and OAUTHBEARER, CRAM-MD5 authentication is not
supported by libcurl and requires OpenSSL. If the user tries to use
CRAM-MD5 authentication without OpenSSL, the previous behaviour was to
attempt to authenticate and fail with a die(error). Handle this in a
better way by first checking if OpenSSL is available and then attempting
to authenticate. If OpenSSL is not available, print an error message and
exit gracefully.
Signed-off-by: Aditya Garg <gargaditya08@live.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This patch fixes a memory leak by running free(response) in case
auth_cram_md5 fails.
Signed-off-by: Aditya Garg <gargaditya08@live.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
6d1f198f34 (imap-send: fix leaking memory in `imap_server_conf`, 2024-06-07)
resulted a change in static int git_imap_config which resulted in cfg->folder
being incorrectly set to NULL in case imap.user, imap.pass, imap.tunnel and
imap.authmethod were defined. Because of this, since Git 2.46.0,
git-imap-send is not usable at all. The bug seems to have been unnoticed for
a long time, likely due to better options like git-send-email.
Signed-off-by: Aditya Garg <gargaditya08@live.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
An earlier test update incorrectly lost three prerequisites on
macOS, which has been corrected.
* rj/meson-tap-parse-fixup:
test-lib: add missing prerequisites for Darwin
A memory leak on an error code path has been plugged.
* ly/submodule-update-failure-leakfix:
builtin/submodule--helper: fix leak when remote_submodule_branch() failed
Some leftover references to documentation source files that no
longer exist, due to recent ".txt" -> ".adoc" renaming, have been
corrected.
* jw/doc-txt-to-adoc-refs:
doc: update references to renamed AsciiDoc files
Document that related `git config` variables should be placed
one-per-line instead of separated by commas.
Suggested-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Collin Funk <collin.funk1@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Some platforms like AIX lack .d_type member in "struct dirent"; use
the DTYPE(e) macro instead of a direct reference to e->d_type and
when it yields DT_UNKNOWN, find the real type with get_dtype().
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Meson-based build/test framework now understands TAP output
generated by our tests.
* ps/meson-tap-parse:
meson: parse TAP output generated by our tests
meson: introduce kwargs variable for tests
test-lib: fail on unexpectedly passing tests
t7815: fix unexpectedly passing test on macOS
t/test-lib: fix TAP format for BASH_XTRACEFD warning
t/test-lib: don't print shell traces to stdout
t983*: use prereq to check for Python-specific git-p4(1) support
t9822: use prereq to check for ISO-8859-1 support
t: silence output from `test_create_repo()`
t: stop announcing prereqs
"git diff --no-index dirA dirB" can limit the comparison with
pathspec at the end of the command line, just like normal "git
diff".
* jk/diff-no-index-with-pathspec:
diff --no-index: support limiting by pathspec
pathspec: add flag to indicate operation without repository
pathspec: add match_leading_pathspec variant
"git cat-file --batch" learns to understand %(objectmode) atom to
allow the caller to tell missing objects (due to repository
corruption) and submodules (whose commit objects are OK to be
missing) apart.
* vd/cat-file-objectmode-update:
cat-file.c: add batch handling for submodules
cat-file: add %(objectmode) atom
t1006: update 'run_tests' to test generic object specifiers
Documentation for "git send-email" has been updated with a bit more
credential helper and OAuth information.
* ag/send-email-docs:
docs: make the purpose of using app password for Gmail more clear in send-email
docs: remove credential helper links for emails from gitcredentials
docs: improve formatting in git-send-email documentation
docs: add credential helper for yahoo and link Google's sendgmail tool
"git pack-objects" learns to find delta bases from blobs at the
same path, using the --path-walk API.
* ds/path-walk-2:
pack-objects: allow --shallow and --path-walk
path-walk: add new 'edge_aggressive' option
pack-objects: thread the path-based compression
pack-objects: refactor path-walk delta phase
scalar: enable path-walk during push via config
pack-objects: enable --path-walk via config
repack: add --path-walk option
t5538: add tests to confirm deltas in shallow pushes
pack-objects: introduce GIT_TEST_PACK_PATH_WALK
p5313: add performance tests for --path-walk
pack-objects: update usage to match docs
pack-objects: add --path-walk option
pack-objects: extract should_attempt_deltas()
Doc update to the more recent world order.
* lo/my-first-ow-doc-update:
MyFirstContribution: add walken.c to meson.build
MyFirstContribution: use struct repository in examples
'test_path_is_file', 'test_path_is_dir' and 'test_file_is_missing'
are test helpers used in Git's development, that emit useful
diagnostic information when they detect a failing condition, while
test -[efd] does not.
Replace the basic shell commands 'test -f', 'test -d' and 'test -e',
with these test helpers.
Co-authored-by: Isabella Caselli <icaselli@usp.br>
Signed-off-by: Isabella Caselli <icaselli@usp.br>
Signed-off-by: Rodrigo Michelassi <rodmichelassi@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
run_builtin() takes a repo parameter, so the use of the_repository
is no longer necessary. Removed the usage of the_repository.
Signed-off-by: Lidong Yan <502024330056@smail.nju.edu.cn>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The function handle_content_type allocates memory for boundary
using xmalloc(sizeof(struct strbuf)). If (++mi->content_top >=
&mi->content[MAX_BOUNDARIES]) is true, the function returns
without freeing boundary.
Signed-off-by: Jinyao Guo <guo846@purdue.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Fixes for GitHub Actions Coverity job.
* js/github-ci-win-coverity-fix:
ci(coverity): output the build log upon error
ci(coverity): fix building on Windows
Revert a botched bswap.h change that broke ntohll() functions on
big-endian systems with __builtin_bswap32/64().
* ss/revert-builtin-bswap-stuff:
Revert "bswap.h: add support for built-in bswap functions"
Existing `merge.stat` configuration variable is a Boolean that
defaults to `true` to control `git merge --[no-]stat` behaviour.
Extend it to be "Boolean or text", that takes false, true, or
"compact", with the last one triggering the --compact-summary option
introduced earlier. Any other values are taken as the same as true,
instead of signaling an error---it is not a grave enough offence to
stop their merge.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git merge" and "git pull" shows "git diff --stat --summary @{1}"
when they finish to indicate the extent of the changes brought into
the history by default. While it gives a good overview, it becomes
annoying when there are very many created or deleted paths.
Introduce "--compact-summary" option to these two commands that
tells it to instead show "git diff --compact-summary @{1}", which
gives the same information in a lot more compact form in such a
situation.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The git cat-file command with --mailmap option fails to apply mailmap
transformations to the committer field when the author and committer
identities are different. This occurs due to a missing newline handling
in apply_mailmap_to_header() after processing each identity line.
When rewrite_ident_line() processes an identity, it stops at the end
of the identity data (e.g., "Author Name <email> timestamp"), but
doesn't account for the trailing newline. The current code adds the
identity length to buf_offset but fails to advance past the newline
character. This causes the next iteration to start parsing from the
newline instead of the beginning of the next header line, making it
impossible to match subsequent headers like "committer".
Additionally, rewrite_ident_line() may reallocate the buffer during
its operation. Any code using pointers into the old buffer would be
using invalid memory after such a reallocation.
This bug was introduced in e9c1b0e3 (revision: improve
commit_rewrite_person(), 2022-07-19) when the much simpler version of
commit_rewrite_person() that worked on one "person header" at a time
was rewritten to use the current apply_mailmap_to_header() function.
The original implementation processed author and committer separately,
but the rewrite introduced this loop-based approach that failed to
properly handle the transition between identity lines.
Let's fix this by addressing both issues:
1. After processing an identity line, we now check if we're at a
newline and advance past it, ensuring the next header line is
parsed correctly.
2. We recompute the buffer position after rewrite_ident_line() to
handle potential buffer reallocation.
This ensures that all identity headers in commit and tag objects are
consistently processed regardless of whether the author and committer
are the same person.
Reported-by: Vasilii Iakliushin <viakliushin@gitlab.com>
Reviewed-by: Christian Couder <christian.couder@gmail.com>
Signed-off-by: Siddharth Asthana <siddharthasthana31@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* maint-2.48:
Git 2.48.2
Git 2.47.3
Git 2.46.4
Git 2.45.4
Git 2.44.4
Git 2.43.7
wincred: avoid buffer overflow in wcsncat()
bundle-uri: fix arbitrary file writes via parameter injection
config: quote values containing CR character
git-gui: sanitize 'exec' arguments: convert new 'cygpath' calls
git-gui: do not mistake command arguments as redirection operators
git-gui: introduce function git_redir for git calls with redirections
git-gui: pass redirections as separate argument to git_read
git-gui: pass redirections as separate argument to _open_stdout_stderr
git-gui: convert git_read*, git_write to be non-variadic
git-gui: override exec and open only on Windows
gitk: sanitize 'open' arguments: revisit recently updated 'open' calls
git-gui: use git_read in githook_read
git-gui: sanitize $PATH on all platforms
git-gui: break out a separate function git_read_nice
git-gui: assure PATH has only absolute elements.
git-gui: remove option --stderr from git_read
git-gui: cleanup git-bash menu item
git-gui: sanitize 'exec' arguments: background
git-gui: avoid auto_execok in do_windows_shortcut
git-gui: sanitize 'exec' arguments: simple cases
git-gui: avoid auto_execok for git-bash menu item
git-gui: treat file names beginning with "|" as relative paths
git-gui: remove unused proc is_shellscript
git-gui: remove git config --list handling for git < 1.5.3
git-gui: remove special treatment of Windows from open_cmd_pipe
git-gui: remove HEAD detachment implementation for git < 1.5.3
git-gui: use only the configured shell
git-gui: remove Tcl 8.4 workaround on 2>@1 redirection
git-gui: make _shellpath usable on startup
git-gui: use [is_Windows], not bad _shellpath
git-gui: _which, only add .exe suffix if not present
gitk: encode arguments correctly with "open"
gitk: sanitize 'open' arguments: command pipeline
gitk: collect construction of blameargs into a single conditional
gitk: sanitize 'open' arguments: simple commands, readable and writable
gitk: sanitize 'open' arguments: simple commands with redirections
gitk: sanitize 'open' arguments: simple commands
gitk: sanitize 'exec' arguments: redirect to process
gitk: sanitize 'exec' arguments: redirections and background
gitk: sanitize 'exec' arguments: redirections
gitk: sanitize 'exec' arguments: 'eval exec'
gitk: sanitize 'exec' arguments: simple cases
gitk: have callers of diffcmd supply pipe symbol when necessary
gitk: treat file names beginning with "|" as relative paths
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Recently generating the version-def.h file and the config-list.h
file have been updated, which broke versions of "sed" that do not
want to be fed a file that ends with an incomplete line, and/or that
do not understand the more recent "-E" option to use extended
regular expression.
Fix them in response to a build-failure reported on Solaris boxes.
cf. https://lore.kernel.org/git/09f954b8-d9c3-418f-ad4b-9cb9b063f4ae@comstyle.com/
Reported-by: Brad Smith <brad@comstyle.com>
Reviewed-by: Collin Funk <collin.funk1@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Now that we have a way to export stashes to a ref, let's provide a way
to import them from such a ref back to the stash. This works much the
way the export code does, except that we strip off the first parent
chain commit and then store each resulting commit back to the stash.
We don't clear the stash first and instead add the specified stashes to
the top of the stash. This is because users may want to export just a
few stashes, such as to share a small amount of work in progress with a
colleague, and it would be undesirable for the receiving user to lose
all of their data. For users who do want to replace the stash, it's
easy to do to: simply run "git stash clear" first.
We specifically rely on the fact that we'll produce identical stash
commits on both sides in our tests. This provides a cheap,
straightforward check for our tests and also makes it easy for users to
see if they already have the same data in both repositories.
Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
A common user problem is how to sync in-progress work to another
machine. Users currently must use some sort of transfer of the working
tree, which poses security risks and also necessarily causes the index
to become dirty. The experience is suboptimal and frustrating for
users.
A reasonable idea is to use the stash for this purpose, but the stash is
stored in the reflog, not in a ref, and as such it cannot be pushed or
pulled. This also means that it cannot be saved into a bundle or
preserved elsewhere, which is a problem when using throwaway development
environments.
In addition, users often want to replicate stashes across machines, such
as when they must use multiple machines or when they use throwaway dev
environments, such as those based on the Devcontainer spec, where they
might otherwise lose various in-progress work.
Let's solve this problem by allowing the user to export the stash to a
ref (or, to just write it into the repository and print the hash, à la
git commit-tree). Introduce git stash export, which writes a chain of
commits where the first parent is always a chain to the previous stash,
or to a single, empty commit (for the final item) and the second is the
stash commit normally written to the reflog.
Iterate over each stash from top to bottom, looking up the data for each
one, and then create the chain from the single empty commit back up in
reverse order. Generate a predictable empty commit so our behavior is
reproducible. Create a useful commit message, preserving the author and
committer information, to help users identify stash commits when viewing
them as normal commits.
If the user has specified specific stashes they'd like to export
instead, use those instead of iterating over all of the stashes.
As part of this, specifically request quiet behavior when looking up the
OID for a revision because we will eventually hit a revision that
doesn't exist and we don't want to die when that occurs.
When exporting stashes, be sure to verify that they look like valid
stashes and don't contain invalid data. This will help avoid failures
on import or problems due to attempting to export invalid refs that are
not stashes.
Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We allow several special forms of stash names in this code. In the
future, we'll want to allow these same forms without parsing a stash
commit, so let's refactor this code out into a function for reuse.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
A reasonable person looking at the signature and usage of get_oid and
friends might conclude that in the event of an error, it always returns
-1. However, this is not the case. Instead, get_oid_basic dies if we
go too far back into the history of a reflog (or, when quiet, simply
exits).
This is not especially useful, since in many cases, we might want to
handle this error differently. Let's add a flag here to make it just
return -1 like elsewhere in these code paths.
Note that we cannot make this behavior the default, since we have many
other codepaths that rely on the existing behavior, including in tests.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Since 6547d1c9 (bswap.h: add support for built-in bswap
functions, 2025-04-23) tweaked the way the bswap32/64 macros are
defined, on platforms with __builtin_bswap32/64 supported, the
bswap32/64 macros are defined even on big endian platforms.
However the rest of this file assumes that bswap32/64() are defined
ONLY on little endian machines and uses that assumption to redefine
ntohl/ntohll macros. The said commit broke t4014-format-patch.sh test,
among many others on s390x.
Revert the commit.
Signed-off-by: Sebastian Andrzej Siewior <sebastian@breakpoint.cc>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We have mentioned this in various reviews, but I didn't see it
mentioned in the CodingGuildelines document. Let's add it.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
• Replace with phrases that are more standard (“all-or-nothing”
instead of “-none”)
• Add coordinating words that make it less likely for you to trip
over the sentence (“*that* "gc" can do”)
• Use “SMTP” instead of both SMTP and smtp
• Don’t mention `git fsck --reference` since the previous release
was not affected by this minor bug. Also say “errored out” since
the git-refs(1) bug was there in v2.48.0 as well
• Use the more widespread “linked” instead of “secondary worktree”
Signed-off-by: Kristoffer Haugsbakk <code@khaugsbakk.name>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
It is quite helpful to know what Coverity said, exactly, in case it
fails to analyze the code.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When I added the Coverity workflow in a56b6230d0b1 (ci: add a GitHub
workflow to submit Coverity scans, 2023-09-25), I merely converted an
Azure Pipeline definition that had been running successfully for ages.
In the meantime, the current Coverity documentation describes a very
different way to install the analysis tool, recommending to add the
`bin/` directory to the _end_ of `PATH` (when originally, IIRC, it was
recommended to add it to the _beginning_ of the `PATH`).
This is crucial! The reason is that the current incarnation of the
Windows variant of Coverity's analysis tools come with a _lot_ of DLL
files in their `bin/` directory, some of them interferring rather badly
with the `gcc.exe` in Git for Windows' SDK that we use to run the
Coverity build. The symptom is a cryptic error message:
make: *** [Makefile:2960: headless-git.o] Error 1
make: *** Waiting for unfinished jobs....
D:\git-sdk-64-minimal\mingw64\bin\windres.exe: preprocessing failed.
make: *** [Makefile:2679: git.res] Error 1
make: *** [Makefile:2893: git.o] Error 1
make: *** [Makefile:2893: builtin/add.o] Error 1
Attempting to detect unconfigured compilers in build
|0----------25-----------50----------75---------100|
****************************************************
Warning: Build command make.exe exited with code 2. Please verify that the build completed successfully.
Warning: Emitted 0 C/C++ compilation units (0%) successfully
0 C/C++ compilation units (0%) are ready for analysis
For more details, please look at:
D:/a/git/git/cov-int/build-log.txt
The log (which the workflow is currently not configured to reveal) then
points out that the `windows.h` header cannot be found, which is _still_
not very helpful. The underlying root cause is that the `gcc.exe` in Git
for Windows' SDK determines the location of the header files via the
location of certain DLL files, and finding the "wrong" ones first on the
`PATH` misleads that logic.
Let's fix this problem by following Coverity's current recommendation
and append the `bin/` directory in which `cov-int` can be found to the
_end_ of `PATH`.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When creating a stash, Git uses the current branch name
of the superproject to construct the stash commit message.
However, in repositories with submodules,
the message may mistakenly display the submodule branch name instead.
This is because `refs_resolve_ref_unsafe()` returns a pointer to a static buffer.
Subsequent calls to the same function overwrite the buffer,
corrupting the originally fetched `branch_name` used for the stash message.
Use `xstrdup()` to duplicate the branch name immediately after resolving it,
so that later buffer overwrites do not affect the stash message.
Signed-off-by: K Jayatheerth <jayatheerthkulkarni2005@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Refactor "preload-index.c" to remove the dependency on the global
'the_repository'. Replace the occurrences of 'the_repository' with
'index->repo' and thus remove the definition '#define
USE_THE_REPOSITORY_VARIABLE'.
Mentored-by: Christian Couder <christian.couder@gmail.com>
Mentored-by: Ghanshyam Thakkar <shyamthakkar001@gmail.com>
Signed-off-by: Ayush Chandekar <ayu.chandekar@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The global variable 'core_preload_index' is used in a single function
named 'preload_index()' in "preload-index.c". Move its declaration inside
that function, removing unnecessary global state.
This change is part of an ongoing effort to eliminate global variables,
improve modularity and help libify the codebase.
Mentored-by: Christian Couder <christian.couder@gmail.com>
Mentored-by: Ghanshyam Thakkar <shyamthakkar001@gmail.com>
Signed-off-by: Ayush Chandekar <ayu.chandekar@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In revision.c:prepare_show_merge(), we allocated an array in prune
but forget to free it. Since parse_pathspec is not responsible to
free prune, we should add `free(prune)` in the end of prepare_show_merge().
Signed-off-by: Lidong Yan <502024330056@smail.nju.edu.cn>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
If rebase.instructionFormat is invalid the repository is left in a
strange state when the interactive rebase fails. `git status` outputs
boths the same as it would in the normal case *and* something related to
interactive rebase:
$ git -c rebase.instructionFormat=blah rebase -i
fatal: invalid --pretty format: blah
$ git status
On branch master
Your branch is ahead of 'upstream/master' by 1 commit.
(use "git push" to publish your local commits)
git-rebase-todo is missing.
No commands done.
No commands remaining.
You are currently editing a commit while rebasing branch 'master' on '8db3019401'.
(use "git commit --amend" to amend the current commit)
(use "git rebase --continue" once you are satisfied with your changes)
By attempting to write the rebase script before initializing the state
this potential scenario is avoided.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
commit d3d8c601fd ("t7815: fix unexpectedly passing test on macOS",
2025-06-02) added a MACOS prerequisite by adding a 'Darwin' case
label to the 'OS-specific' case statement. However, this commit
forgot to set several prerequisites which appear in the 'default'
case label, in addition to the new MACOS prerequisite. This causes
several tests, which macOS should pass, being skipped.
In order to run all applicable tests on macOS, add the missing
prerequisites to the 'Darwin' case.
Signed-off-by: Ramsay Jones <ramsay@ramsayjones.plus.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In pack-bitmap.c:find_boundary_objects(), the roots_bitmap is only freed
if cascade_pseudo_merges_1() fails. However, cascade_pseudo_merges_1()
uses roots_bitmap as a mutable reference without taking ownership of it.
As a result, if cascade_pseudo_merges_1() succeeds, roots_bitmap is leaked.
And this leak currently lacks a dedicated test to detect it.
To fix this leak, remove if cascade_pseudo_merges_1() succeed check and
always calling bitmap_free(roots_bitmap);
To trigger this leak, we need roots_bitmap that contains at least one
pseudo merge. So that we can use pseudo merge bitmap when we compute roots
reachable bitmap. Here we create two commits: first A then B. Add A
to the pseudo-merge and perform a traversal over the range A..B.
In this scenario, the "haves" set will be {A}, and cascade_pseudo_merges_1
will succeed, thereby exposing the leak due to the missing roots_bitmap
cleanup.
Signed-off-by: Lidong Yan <502024330056@smail.nju.edu.cn>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Tests that compare $HOME and $(pwd), which should be the same
directory unless the tests chdir's around, would fail when the user
enters the test directory via symbolic links, which has been
corrected.
* mm/test-in-absolute-home:
t: run tests from a normalized working directory
In builtin/submodule--helper.c:update_submodule(), the variable
remote_name is allocated in get_default_remote_submodule() but
may be leaked if remote_submodule_branch() fails. Although it is
unlikely that remote_submodule_branch() would fail after successfully
obtaining a remote ref name from get_default_remote_submodule(),
it is still possible. To prevent a potential memory leak, add a
call to free(remote_name) at the early exit point.
Signed-off-by: Lidong Yan <502024330056@smail.nju.edu.cn>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Solaris 10 and newer has strtoumax().
Solaris 11 and newer has mkdtemp(), memmem(), and strcasestr().
Signed-off-by: Brad Smith <brad@comstyle.com>
Reviewed-by: Collin Funk <collin.funk1@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Adjust to newer version of libcURL.
* jk/curl-easy-setopt-typefix:
curl: fix symbolic constant typechecks with curl_easy_setopt()
curl: fix integer variable typechecks with curl_easy_setopt()
curl: fix integer constant typechecks with curl_easy_setopt()
The support for assuming "push" when "-p" is given introduced in
9e140909f61 (stash: allow pathspecs in the no verb form, 2017-02-28) is
very narrow, neither "git stash -m <message> -p <pathspec>" nor "git
stash --patch <pathspec>" imply "push" and die instead. Relax this by
passing PARSE_OPT_STOP_AT_NON_OPTION when push is being assumed and then
setting "force_assume" if "--patch" was present. This means "git stash
<pathspec> -p" still dies so that it does not assume the user meant
"push" if they mistype a subcommand name but "git stash -m <message> -p
<pathspec>" will now succeed. The test added in the last commit is
adjusted to check that push is still assumed when "--patch" comes after
other options on the command-line.
Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Historically "git stash [<options>]" was assumed to mean "git stash save
[<options>]". Since 1ada5020b38 (stash: use stash_push for no verb form,
2017-02-28) it is assumed to mean "git stash push [<options>]". As the
push subcommand supports pathspecs, 9e140909f61 (stash: allow pathspecs
in the no verb form, 2017-02-28) allowed "git stash -p <pathspec>" to
mean "git stash push -p <pathspec>". This was broken in 8c3713cede7
(stash: eliminate crude option parsing, 2020-02-17) which failed to
account for "push" being added to the start of argv in cmd_stash()
before it calls push_stash() and kept looking in argv[0] for "-p" after
moving the code to push_stash().
Fix this by regression by checking argv[1] instead of argv[0] and add a
couple of tests to prevent future regressions.
Helped-by: Martin Ågren <martin.agren@gmail.com>
Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The .txt extensions were changed to .adoc in 1f010d6 (doc: use .adoc
extension for AsciiDoc files, 2025-01-20). References to the renamed
files were not updated yet.
Signed-off-by: Jouke Witteveen <j.witteveen@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Commit 0b080a70ab (doc: git-diff: apply format changes to
diff-generate-patch, 2024-11-18) wrapped the ".." in
mode <mode>,<mode>..<mode>
in backticks. Note how the line before is quite similar,
index <hash>,<hash>..<hash>
but did not get any backticks. Remove the backticks, since they confuse
Asciidoctor.
The exact failure mode changed with c87b2b3a6f (doc: fix asciidoctor
synopsis processing of triple-dots, 2025-04-12), and arguably to the
better. But Asciidoctor (2.0.18) still ends up confused by these
backticks and leaves the manpage rendering as
index <hash>,<hash>..<hash>
mode <mode>,<mode>`..__<mode>__
{empty}`new file mode <mode>
Drop the backticks. This is a no-op with asciidoc (10.2.0).
Signed-off-by: Martin Ågren <martin.agren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
As of Homebrew's update to cURL v8.14.0, there are new compile errors to
be observed in the `osx-gcc` job of Git's CI builds:
In file included from http.h:8,
from imap-send.c:36:
In function 'setup_curl',
inlined from 'curl_append_msgs_to_imap' at imap-send.c:1460:9,
inlined from 'cmd_main' at imap-send.c:1581:9:
/usr/local/Cellar/curl/8.14.0/include/curl/typecheck-gcc.h:50:15: error: call to '_curl_easy_setopt_err_long' declared with attribute warning: curl_easy_setopt expects a long argument [-Werror=attribute-warning]
50 | _curl_easy_setopt_err_long(); \
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~
/usr/local/Cellar/curl/8.14.0/include/curl/curl.h:54:7: note: in definition of macro 'CURL_IGNORE_DEPRECATION'
54 | statements \
| ^~~~~~~~~~
imap-send.c:1423:9: note: in expansion of macro 'curl_easy_setopt'
1423 | curl_easy_setopt(curl, CURLOPT_PORT, srvc->port);
| ^~~~~~~~~~~~~~~~
[... many more instances of nearly identical warnings...]
See for example this CI workflow run:
https://github.com/git/git/actions/runs/15454602308/job/43504278284#step:4:307
The most likely explanation is the entry "typecheck-gcc.h: fix the
typechecks" in cURL's release notes (https://curl.se/ch/8.14.0.html).
Nearly identical compile errors afflicted recently-updated Debian
setups, which have been addressed by `jk/curl-easy-setopt-typefix`.
However, on macOS Git is built with different build options, which
uncovered more instances of `int` values that need to be cast to
constants, which were not covered by 6f11c42e8edc (curl: fix integer
constant typechecks with curl_easy_setopt(), 2025-06-04). Let's
explicitly convert even those remaining `int` constants in
`curl_easy_setopt()` calls to `long` parameters.
In addition to looking at the compile errors of the `osx-gcc` job, I
verified that there are no other instances of the same issue that need
to be handled in this manner (and that might not be caught by our CI
builds because of yet other build options that might skip those code
parts), I ran the following command and inspected all 23 results
manually to ensure that the fix is now actually complete:
git grep -n curl_easy_setopt |
grep -ve ',.*, *[A-Za-z_"&]' \
-e ',.*, *[-0-9]*L)' \
-e ',.*,.* (long)'
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Commit 2cc5b0facfa4 (git-gui: extract script to generate "tclIndex",
2025-03-11) converted commands in a Makefile rule to a shell script.
In this process, the Makefile variable $@ had to be replaced by the
file name that it represents, 'lib/tclIndex'. However, the occurrence
in `rm -f $@` was missed. In a shell script, $@ expands to all
command line arguments, which happen to be the source files lib/*.tcl
in this case. Needless to say that we do not want to remove source
files during a build. Replace $@ by the intended 'lib/tclIndex'.
Reported-by: Randall S. Becker <rsbecker@nexbridge.com>
Signed-off-by: Johannes Sixt <j6t@kdbg.org>
In the GitHub workflow used in Git's CI builds, the `vs test` jobs use a
subset of a specific revision of Git for Windows' SDK to run Git's test
suite. This revision is validated by another CI workflow to ensure that
said revision _can_ run Git's test suite successfully, skipping buggy
updates in Git for Windows' SDK.
The `win+Meson test` jobs do things differently, quite differently. They
use the Bash of the Git for Windows version that is installed on the
runners to run Git's test suite.
This difference has consequences.
When 68cb0b5253a0 (builtin/receive-pack: add option to skip connectivity
check, 2025-05-20) introduced a test case that uses `tee <file> | git
receive-pack` as `--receive-pack` parameter (imitating an existing
pattern in the same test script), it hit just the sweet spot to trigger
a bug in the MSYS2 runtime shipped in Git for Windows v2.49.0. This
version is the one currently installed on GitHub's runners.
The problem is that the `git receive-pack` process finishes while the
`tee` process does not need to write anything anymore and therefore does
not receive an EOF. Instead, it should receive a SIGPIPE, but the bug in
the MSYS2 runtime prevents that from working as intended. As a
consequence, the `tee` process waits for more input from the `git.exe
send-pack` process but none is coming, and the test script patiently
waits until the 6h timeout hits.
Only every once in a while, the `git receive-pack` process manages to
send an EOF to the `tee` process and no hang occurs. Therefore, the
problem can be worked around by cancelling the clearly-hanging job after
twenty or so minutes and re-running it, repeating the process about half
a dozen times, until the hang was successfully avoided.
This bug in the MSYS2 runtime has been fixed in the meantime, which is
the reason why the same test case causes no problems in the `win test`
and the `vs test` jobs.
This will continue to be the case until the Git for Windows version on
the GitHub runners is upgraded to a version that distributes a newer
MSYS2 runtime version. However, as of time of writing, this _is_ the
latest Git for Windows version, and will be for another 1.5 weeks, until
Git v2.50.0 is scheduled to appear (and shortly thereafter Git for
Windows v2.50.0). Traditionally it takes a while before the runners pick
up the new version.
We could just wait it out, six hours at a time.
Here, I opt for an alternative: Detect the buggy MSYS2 runtime and
simply skip the test case. It's not like the `receive-pack` test cases
are specific to Windows, and even then, to my chagrin the CI runs in
git-for-windows/git spend around ten hours of compute time each and
every time to run the entire test suite on all the platforms, even the
tests that cover cross-platform code, and for Windows alone we do that
three times: with GCC, with MSVC, and with MSVC via Meson. Therefore, I
deem it more than acceptable to skip this test case in one of those
matrices.
For good luck, also the preceding test case is skipped in that scenario,
as it uses the same `--receive-pack=tee <file> | git receive-pack`
pattern, even though I never observed that test case to hang in
practice.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
pretty.c:repo_logmsg_reencode() allocated memory should be freed with
repo_unuse_commit_buffer(). Callers sometimes forgot free it at exit
point. Add `repo_unuse_commit_buffer()` in insert_records_from_trailers
at builtin/shortlog.c and create_commit at builtin/replay.c
Signed-off-by: Lidong Yan <502024330056@smail.nju.edu.cn>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
As with the previous two commits, we should be passing long integers,
not regular ones, to curl_easy_setopt(), and compiling against curl 8.14
loudly complains if we don't.
This patch catches the remaining cases, which are ones where we pass
curl's own symbolic constants. We'll cast them to long manually in each
call.
It seems kind of weird to me that curl doesn't define these constants as
longs, since the point of them is to pass to curl_easy_setopt(). But in
the curl documentation and examples, they clearly show casting them as
part of the setopt calls. It may be that there is some reason not to
push the type into the macro, like backwards compatibility. I didn't
dig, as it doesn't really matter: we have to follow what existing curl
versions ask for anyway.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
As discussed in the previous commit, we should be passing long integers,
not regular ones, to curl_easy_setopt(), and compiling against curl 8.14
loudly complains if we don't.
That patch fixed integer constants by adding an "L". This one deals with
actual variables.
Arguably these variables could just be declared as "long" in the first
place. But it's actually kind of awkward due to other code which uses
them:
- port is conceptually a short, and we even call htons() on it (though
weirdly it is defined as a regular int).
- ssl_verify is conceptually a bool, and we assign to it from
git_config_bool().
So I think we could probably switch these out for longs without hurting
anything, but it just feels a bit weird. Doubly so because if you don't
set USE_CURL_FOR_IMAP_SEND set, then the current types are fine!
So let's just cast these to longs in the curl calls, which makes what's
going on obvious. There aren't that many spots to modify (and as you can
see from the context, we already have some similar casts).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The curl documentation specifies that curl_easy_setopt() takes either:
...a long, a function pointer, an object pointer or a curl_off_t,
depending on what the specific option expects.
But when we pass an integer constant like "0", it will by default be a
regular non-long int. This has always been wrong, but seemed to work in
practice (I didn't dig into curl's implementation to see whether this
might actually be triggering undefined behavior, but it seems likely and
regardless we should do what the docs say).
This is especially important since curl has a type-checking macro that
causes building against curl 8.14 to produce many warnings. The specific
commit is due to their 79b4e56b3 (typecheck-gcc.h: fix the typechecks,
2025-04-22). Curiously, it does only seem to trigger when compiled with
-O2 for me.
We can fix it by just marking the constants with a long "L".
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
d796cedb (bundle-uri: unit test "key=value" parsing, 2022-10-12)
introduced the print_bundle_list() function, which takes a "FILE
*fp" to write the output to. Later with c93c3d2f (bundle-uri:
parse bundle.heuristic=creationToken, 2023-01-31) the function
started showing additional information, which is always written
to the standard output stream.
It does not look like a deliberate decision to do so, and it
does not hurt, as all callers of the function passes stdout to
it.
We could change the function not to take fp and always write to
the standard output to simplify, but let's use the FILE *fp
provided by the caller consistently to write out output.
Signed-off-by: Jan Mazur <mzr@meta.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Allows optionally signing the commits that git subtree creates. This can
be necessary when working in a repository that requires gpg signed
commits.
Signed-off-by: Patrik Weiskircher <patrik@pspdfkit.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Optional parameter handling only works unambiguous with git rev-parse
--parseopt when using the --stuck-long option. To prepare for future commits
which add flags with optional parameters, parse with --stuck-long.
Signed-off-by: Patrik Weiskircher <patrik@pspdfkit.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Whenever an email is sent, send-email shows a log at last, which
contains all the headers of the email that were received by the
receipients.
In case outlook changes the Message-ID, a log for the same is shown to
the user, but that change is not reflected when the log containing all
the headers is displayed. Here is an example of the log that is shown
when outlook changes the Message-ID:
Outlook reassigned Message-ID to: <PN3PR01MB95973E5ACD7CCFADCB4E298CB865A@PN3PR01MB9597.INDPRD01.PROD.OUTLOOK.COM>
OK. Log says:
Server: smtp.office365.com
MAIL FROM:<gargaditya08@live.com>
RCPT TO:<negahe7142@nomrista.com>
From: Aditya Garg <gargaditya08@live.com>
To: negahe7142@nomrista.com
Subject: [PATCH] send-email: show the new message id assigned by outlook in the logs
Date: Mon, 26 May 2025 20:28:36 +0530
Message-ID: <20250526145836.4825-1-gargaditya08@live.com>
X-Mailer: git-send-email @GIT_VERSION@
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
Result: 250
Fix this by updating the $header variable, which has the message ID we
internally assigned on the "Message-ID:" header, with the message ID the
Outlook server assigned. It should look like this after this patch:
OK. Log says:
Server: smtp.office365.com
MAIL FROM:<gargaditya08@live.com>
RCPT TO:<negahe7142@nomrista.com>
From: Aditya Garg <gargaditya08@live.com>
To: negahe7142@nomrista.com
Subject: [PATCH] send-email: show the new message id assigned by outlook in the logs
Date: Mon, 26 May 2025 20:29:22 +0530
Message-ID: <PN3PR01MB95977486061BD2542BD09B67B865A@PN3PR01MB9597.INDPRD01.PROD.OUTLOOK.COM>
X-Mailer: git-send-email @GIT_VERSION@
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
Result: 250
Signed-off-by: Aditya Garg <gargaditya08@live.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Whenever we send a thread of emails using send-email, a message number
is internally assigned to each email. This number is used to track the
order of the emails in the thread. Whenever a new message is processed
in a thread, the current script logic increments the message number by
one, which is intended.
But, if a message is edited and then resent, its message number again
gets incremented. This is because the script uses the same logic to
process the edited message, which it uses to send the next message.
This minor bug is usually harmless, unless a special situations arises.
That situation is when the first message in a thread is edited and
resent, and an `--in-reply-to` argument is also passed to send-email.
In this case, if the user has chosen shallow threading, the threading
does not work as expected, and all messages become replies to the
Message-ID specified in the `--in-reply-to` argument.
The reason for this bug is hidden in the code for threading itself.
if ($thread) {
if ($message_was_sent &&
($chain_reply_to || !defined $in_reply_to || length($in_reply_to) == 0 ||
$message_num == 1)) {
$in_reply_to = $message_id;
if (length $references > 0) {
$references .= "\n $message_id";
} else {
$references = "$message_id";
}
}
}
Here `$message_num` is the current message number, and `$in_reply_to` is
the Message-ID of the message to which the current message is a reply.
In case `--in-reply-to` is specified, the `$in_reply_to` variable
is set to the value of the `--in-reply-to` argument.
Whenever this whole set of conditions is true, the script sets the
`$in_reply_to` variable to the current message's ID. This is done to
ensure that the next message in the thread is a reply to this message.
In case we specify an `--in-reply-to` argument, and have shallow
threading, the only condition that can make this true is
`$message_num == 1`, which is true for the first message in a thread.
Thus, the `$in_reply_to` variable gets set to the first message's ID.
For subsequent messages, the `$message_num` variable is always
greater than 1, and the whole set of conditions is false. Therefore, the
`$in_reply_to` variable remains as the first message's ID. This is what
we expect in shallow threading. But if the user edits the first message
and resends it, the `$message_num` variable gets incremented by 1, and
thus the condition `$message_num == 1` becomes false. This means that
the `$in_reply_to` variable is not set to the first message's ID. As a
result the next message in the thread is not a reply to the first
message, but to the `--in-reply-to` argument, effectively breaking the
threading.
In case the user does not specify an `--in-reply-to` argument, the
`!defined $in_reply_to` condition is true, and thus the `$in_reply_to`
variable is set to the first message's ID, and the threading works
as expected, regardless of the message number.
To fix this bug, we need to ensure that the `$message_num` variable is
not incremented by 1 when a message is edited and resent. We do this by
decreasing the `$message_num` variable by 1 whenever the request to edit
a message is received. This way, the next message in the thread will
have the same message number as the edited message. Therefore the
threading will work as expected.
The same logic has also been applied in case the user drops a single
message from the thread by choosing the "[n]o" option during
confirmation. By doing this, the next message in the thread is assigned
the message number of the dropped message, and thus the threading
works as expected.
Signed-off-by: Aditya Garg <gargaditya08@live.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In commit-graph.c:graph_write(), if read_one_commit() failed,
progress allocated in start_delayed_progress() will leak. Add
stop_progress() before goto cleanup.
Signed-off-by: Lidong Yan <502024330056@smail.nju.edu.cn>
Acked-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In builtin/fetch-pack.c:cmd_fetch_pack(), if finish_connect() failed,
it returns error code without cleanup which cause memory leak. Add
cleanup label before frees in the end of cmd_fetch_pack(), and add
`goto cleanup` if finish_connect() failed.
Signed-off-by: Lidong Yan <502024330056@smail.nju.edu.cn>
Acked-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When an object specification is passed to 'cat-file --batch[-check]'
referring to a submodule (e.g. 'HEAD:path/to/my/submodule'), the current
behavior of the command is to print the "missing" error message. However, it
is often valuable for callers to distinguish between paths that are actually
missing and "the submodule tree entry exists, but the object does not exist
in the repository".
To disambiguate without needing to invoke a separate Git process (e.g.
'ls-tree'), print the message "<oid> submodule" for such objects instead of
"<object> missing". In addition to the change from "missing" to "submodule",
the new message differs from the old in that it always prints the resolved
tree entry's OID, rather than the input object specification.
Note that this implementation maintains a distinction between submodules
where the commit OID is not present in the repo, and submodules where the
commit OID *is* present; the former will now print "<object> submodule", but
the latter will still print the full object content.
Signed-off-by: Victoria Dye <vdye@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add a formatting atom, used with the --batch-check/--batch-command options,
that prints the octal representation of the object mode if a given revision
includes that information, e.g. one that follows the format
<tree-ish>:<path>. If the mode information does not exist, an empty string
is printed instead.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Victoria Dye <vdye@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Update the 'run_tests' test wrapper so that the first argument may refer to
any specifier that uniquely identifies an object (e.g. a ref name,
'<OID>:<path>', '<OID>^{<type>}', etc.), rather than only a full object ID.
Also add tests that use non-OID identifiers, ensuring appropriate parsing in
'cat-file'. The identifiers used in some of the added tests include a space,
which is incompatible with the '%(rest)' atom. To accommodate that without
removing the test case, use 'test_expect_failure' when 'object_name'
includes a space.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Victoria Dye <vdye@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git verify-refs" (and hence "git fsck --reference") started
erroring out in a repository in which secondary worktrees were
prepared with Git 2.43 or lower.
* sj/ref-contents-check-fix:
fsck: ignore missing "refs" directory for linked worktrees
BUG() is not end-user facing but programmer facing, and we do not
use _("...") in them. Replace all `BUG(_("..."))` with `BUG("...")`
Signed-off-by: Lidong Yan <502024330056@smail.nju.edu.cn>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In sequencer.c, caller only pass TODO_SQUASH or TODO_FIXUP to
update_squash_messages(), any other command passed in should be
considered as BUG. Replace `return error('unknown command')`
with `BUG('not a FIXUP or SQUASH')`.
Signed-off-by: Lidong Yan <502024330056@smail.nju.edu.cn>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The "gc" task has a similar locking race as the one that we have fixed
for the "pack-refs" and "reflog-expire" tasks in preceding commits. Fix
this by splitting up the logic of the "gc" task:
- We execute `gc_before_repack()` in the foreground, which contains
the logic that git-gc(1) itself would execute in the foreground, as
well.
- We spawn git-gc(1) after detaching, but with a new hidden flag that
suppresses calling `gc_before_repack()`.
Like this we have roughly the same logic as git-gc(1) itself and know to
repack refs and reflogs before detaching, thus fixing the race.
Note that `gc_before_repack()` is renamed to `gc_foreground_tasks()` to
better reflect what this function does.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The `gc_before_repack()` should only ever run once in git-gc(1), but we
may end up calling it twice when the "--detach" flag is passed. The
duplicated call is avoided though via a static flag in this function.
This pattern is somewhat unintuitive though. Refactor it to drop the
static flag and instead guard the second call of `gc_before_repack()`
via `opts.detach`.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Sometimes code wants to die in a situation where it already has written
an error message. To use the same error code as `die()` we have to use
`exit(128)`, which is easy to get wrong and leaves magic numbers all
over our codebase.
Teach `die_message_builtin()` to not print any error when passed a
`NULL` pointer as error string. Like this, such users can now call
`die(NULL)` to achieve the same result without any hardcoded error
codes.
Adapt a couple of builtins to use this new pattern to demonstrate that
there is a need for such a helper.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
As explained in the preceding commit, git-gc(1) knows to detach only
after it has already packed references and expired reflogs. This is done
to avoid racing around their respective lockfiles.
Adapt git-maintenance(1) accordingly and run the "pack-refs" and
"reflog-expire" tasks in the foreground. Note that the "gc" task has the
same issue, but the fix is a bit more involved there and will thus be
done in a subsequent commit.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Both git-gc(1) and git-maintenance(1) have logic to daemonize so that
the maintenance tasks are performed in the background. git-gc(1) has
some special logic though to not perform _all_ housekeeping tasks in the
background: both references and reflogs are still handled synchronously
in the foreground.
This split exists because otherwise it may easily happen that git-gc(1)
keeps the "packed-refs" file locked for an extended amount of time,
where the next Git command that wants to modify any reference could now
fail. This was especially important in the past, where git-gc(1) was
still executed directly as part of our automatic maintenance: git-gc(1)
was invoked via `git gc --auto --detach`, so we knew to handle most of
the maintenance tasks in the background while doing those parts that may
cause locking issues in the foreground.
We have since moved to git-maintenance(1), which is a more flexible
replacement for git-gc(1). By default this command runs git-gc(1), only,
but it can be configured to run different tasks, as well. This command
does not know about the split between maintenance tasks that should run
before and after detach though, and this has led to several bug reports
about spurious locking errors for the "packed-refs" file.
Prepare for a fix by introducing this split for maintenance tasks. Note
that this commit does not yet change any of the tasks, so there should
not (yet) be a change in behaviour.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The typedefs for `maintenance_task_fn` and `maintenance_auto_fn` are
somewhat confusingly not true function pointers. As such, any user of
those typedefs needs to manually add the pointer to make use of them.
Fix this by making these true function pointers.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Extract the function to run maintenance tasks. This function will be
reused in a subsequent commit where we introduce a split between
maintenance tasks that run before and after daemonizing the process.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When configuring maintenance tasks run by git-maintenance(1) we do so by
modifying the global array of tasks directly. This is already quite bad
on its own, as global state makes for logic that is hard to follow.
Even more importantly though we use multiple different fields to track
whether or not a task should be run:
- "enabled" tracks the "maintenance.*.enabled" config key. This field
disables execution of a task, unless the user has explicitly asked
for the task.
- "selected_order" tracks the order in which jobs have been asked for
by the user via the "--task=" command line option. It overrides
everything else, but only has an effect if at least one job has been
selected.
- "schedule" tracks the schedule priority for a job, that is how often
it should run. This field only plays a role when the user has passed
the "--schedule=" command line option.
All of this makes it non-trivial to figure out which job really should
be running right now. The logic to configure these fields and the logic
that interprets them is distributed across multiple functions, making it
even harder to follow it.
Refactor the logic so that we stop modifying global state. Instead, we
now compute which jobs should be run in `initialize_task_config()`,
represented as an array of jobs to run that is stored in the options
structure. Like this, all logic becomes self-contained and any users of
this array only need to iterate through the tasks and execute them one
by one.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The "--task=" option explicitly allows the user to say which maintenance
tasks should be run, whereas "--schedule=" only respects the maintenance
strategy configured for a specific repository. As such, it is not
sensible to accept both options at the same time.
Mark them as incompatible with one another. While at it, also convert
the existing logic that marks "--auto" and "--schedule=" as incompatible
to use `die_for_incompatible_opt2()`.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Users of git-maintenance(1) can explicitly ask it to run specific tasks
by passing the `--task=` command line option. This option can be passed
multiple times, which causes us to execute tasks in the same order as
the tasks have been provided by the user.
The order in which tasks are run is computed in `task_option_parse()`:
every time we parse such a command line argument, we modify the global
array of tasks by seting the selected index for that specific task.
This has two downsides:
- We modify global state, which makes it hard to follow the logic.
- The configuration of tasks is split across multiple different
functions, so it is not easy to figure out the different factors
that play a role in selecting tasks.
Refactor the logic so that `task_option_parse()` does not modify global
state anymore. Instead, this function now only collects the list of
configured tasks. The logic to configure ordering of the respective
tasks is then deferred to `initialize_task_config()`.
This refactoring solves the second problem, that the configuration of
tasks is spread across multiple different locations. The first problem,
that we modify global state, will be fixed in a subsequent commit.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We have two different variables that track the quietness for git-gc(1):
- The local variable `quiet`, which we wire up.
- The `quiet` field of `struct maintenance_run_opts`.
This leads to confusion which of these variables should be used and what
the respective effect is.
Simplify this logic by dropping the local variable in favor of the
options field.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Convert the array of maintenance tasks to use designated field
initializers. This makes it easier to add more fields to the struct
without having to modify all tasks.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Handle OpenBSD and NetBSD as FreeBSD / DragonFly are. OpenBSD would
need _XOPEN_SOURCE to be set to 700. Its simpler to just not set
_XOPEN_SOURCE.
CC strbuf.o
strbuf.c:645:6: warning: call to undeclared function 'getdelim'; ISO C99 and later do not support implicit function declarations [-Wimplicit-function-declaration]
r = getdelim(&sb->buf, &sb->alloc, term, fp);
^
1 warning generated.
Signed-off-by: Brad Smith <brad@comstyle.com>
Reviewed-by: Collin Funk <collin.funk1@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Instruct in the documentation to also add an entry in meson.build for
builtin/walken.c, as currently both Meson and Make are supported.
Helped-by: Karthik Nayak <karthik.188@gmail.com>
Helped-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Lucas Seiki Oshiro <lucasseikioshiro@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add the parameter `struct repository *repo` to the cmd_walken function.
Since commit 9b1cb5070f (builtin: add a repository parameter for
builtin functions, 2024-09-13), all the cmd_* have the `repo` parameter
and new commands must follow this convention, so the documentation
should also be changed.
Change the `git_config` calls to `repo_config`, also passing the `repo`
parameter, as since 036876a106 (config: hide functions using
`the_repository` by default, 2024-08-13) the non-repo config functions
are no longer recommended as they use the global `repository` variable.
Helped-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: Lucas Seiki Oshiro <lucasseikioshiro@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The OpenBSD 'sed' command does not support '\n' to represent newlines in
sed expressions. This leads to the follow compiler error:
In file included from builtin/help.c:15:
./config-list.h:282:18: error: use of undeclared identifier 'n'
"gitcvs.dbUser",n "gitcvs.dbPass",
^
1 error generated.
gmake: *** [Makefile:2821: builtin/help.o] Error 1
We can fix this by documenting related configuration variables
one-per-line instead of listing them separated by commas. This allows us
to remove the unportable part of the sed expression in
generate-configlist.sh.
Signed-off-by: Collin Funk <collin.funk1@gmail.com>
Reviewed-by: Jacob Keller <jacob.keller@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git refs verify" doesn't work if there are worktrees created on Git
v2.43.0 or older versions. These versions don't automatically create the
"refs" directory, causing the error:
error: cannot open directory .git/worktrees/<worktree name>/refs:
No such file or directory
Since 8f4c00de95 (builtin/worktree: create refdb via ref backend,
2024-01-08), we automatically create the "refs" directory for new
worktrees. And in 7c78d819e6 (ref: support multiple worktrees check for
refs, 2024-11-20), we assume that all linked worktrees have this
directory and would wrongly report an error to the user, thus
introducing compatibility issue.
Check for ENOENT errno before reporting directory access errors for
linked worktrees to maintain backward compatibility.
Reported-by: Kristoffer Haugsbakk <code@khaugsbakk.name>
Signed-off-by: shejialuo <shejialuo@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Mark a new feature added during this cycle as experimental and fix
its default so that existing users of the fast-export command is
not broken.
* jc/signed-fast-export-is-experimental:
fast-export: --signed-commits is experimental
Doc mark-up fixes.
* ja/doc-synopsis-style:
doc: convert git-switch manpage to new synopsis style
doc: convert git-mergetool options to new synopsis style
doc: convert git-mergetool manpage to new synopsis style
doc: switch merge config description to new synopsis format
doc: convert merge strategies to synopsis format
doc: merge-options.adoc remove a misleading double negation
doc: convert merge options to new synopsis format
doc: convert git-merge manpage to new style
doc: convert git-checkout manpage to new style
By default, Meson only knows to pay respect to the exit code of tests to
judge whether or not it ran successfully. This can be changed though by
specifying the "protocol" parameter. Next to the default "exitcode"
protocol, Meson also supports the "tap" output that our tests already
know to generate.
Unfortunately, the "tap" protocol was incompatible with `meson test
--interactive` and caused a hang. We have upstreamed a fix [1] though,
so with the recent release of Meson 1.8 that fix is finally out and we
can start using the "tap" protocol when running with a recent-enough
version of this build tool.
With this change in place, Meson now properly detects how many subtests
ran and whether test suites have been skipped:
```
$ meson test t002*
ninja: Entering directory `/home/pks/Development/git/build'
1/10 t0024-crlf-archive OK 0.17s 2 subtests passed
2/10 t0022-crlf-rename OK 0.18s 2 subtests passed
3/10 t0029-core-unsetenvvars SKIP 0.15s
4/10 t0023-crlf-am OK 0.18s 2 subtests passed
5/10 t0025-crlf-renormalize OK 0.21s 3 subtests passed
6/10 t0026-eol-config OK 0.25s 5 subtests passed
7/10 t0020-crlf OK 0.81s 36 subtests passed
8/10 t0028-working-tree-encoding OK 0.85s 22 subtests passed
9/10 t0021-conversion OK 3.45s 38 subtests passed
10/10 t0027-auto-crlf OK 26.35s 2600 subtests passed
Ok: 9
Fail: 0
Skipped: 1
```
Note that when running `meson test --interactive` the test results will
now be marked as "ignored". This is because in interactive mode the file
descriptors will remain connected to the user's terminal, and it is
expected that the user interacts with the tests (e.g., spawn a debugger
or use `test_pause`). As such, the TAP output cannot be parsed reliably
by Meson in that case, so the tests are marked as ignored accordingly.
[1]: https://github.com/mesonbuild/meson/pull/13980
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Meson has the ability to create a kwargs dictionary that can then be
passed to any function call with the `kwargs:` positional argument. This
allows one to deduplicate common parameters that one wishes to pass to
several different function invocations.
Our tests already have one common parameter that we use everywhere,
"timeout", and we're about to add a second common parameter in the next
commit. Let's prepare for this by introducing `test_kwargs` so that we
can deduplicate these common arguments.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When tests are executed via `test_expect_failure` we rather obviously
expect the test itself to fail. If it unexpectedly does not fail then we
count the test as a "fixed" test and announce that a known breakage has
vanished:
ok 1 - setup
ok 2 - create refs/heads/main # TODO known breakage vanished
ok 3 - create refs/heads/main with oldvalue verification
...
ok 299 - update-ref should also create reflog for HEAD
# 1 known breakage(s) vanished; please update test(s)
# passed all remaining 298 test(s)
1..299
While we announce that tests should be updated, the overall test suite
still passes. This makes it quite hard to detect when a test that has
previously failed succeeds now as the developer needs to pay close
attention to the exact output. Even more importantly, tests that only
succeed on _some_ systems are even easier to miss now, as one would have
to explicitly take a look at respective CI jobs to notice that those do
pass now.
Furthermore, we are about to introduce support for parsing TAP output in
Meson. In contrast to prove(1), which treats unexpected passes as a
successful test run, Meson treats those as failure. Neither of these
tools is wrong in doing so. Quoting the TAP specification [1]:
Should a todo test point begin succeeding, the harness may report it
in some way that indicates that whatever was supposed to be done has
been, and it should be promoted to a normal Test Point.
So it is essentially implementation-defined how exactly the unexpected
pass is reported, and whether it should cause the overall test suite to
fail or not. It is unarguably a bad thing for us though if these tools
interpret these differently, as it would mean that test results now
depend on whether the developer uses prove(1) or Meson.
Unify the behaviour by causing a test suite to fail when there are any
unexpected passes. As prove(1) does not consider an unexpected pass to
be an error this leads to somewhat funky output:
t1400-update-ref.sh ................................ Dubious, test returned 1 (wstat 256, 0x100)
All 299 subtests passed
(1 TODO test unexpectedly succeeded)
...
Test Summary Report
-------------------
t1400-update-ref.sh (Wstat: 256 (exited 1) Tests: 299 Failed: 0)
TODO passed: 2
Non-zero exit status: 1
But as we directly announce that the root cause is an unexpected TODO
that has succeeded it's not all that bad.
[1]: https://testanything.org/tap-version-14-specification.html
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In t7815, we have the following test:
test_expect_failure !CYGWIN 'git grep .fi a' '
git grep .fi a
'
The test passes if '.' matches a NUL byte, which we expect to only
happen on Cygwin. The upcoming changes to support parsing TAP output in
Meson surface that this test, surprisingly, passes on macOS as well.
It is unclear how long the test has been passing on macOS already.
064eed36c7f (config.mak.uname: only set NO_REGEX on cygwin for v1.7,
2025-04-17) mentions that the test started to pass for Cygwin. This was
attributed to a new implementation of regcomp(3p) and friends, which was
inherited from FreeBSD. Given the BSD lineage of macOS it is feasible
that it also inherited similar code eventually that made the test pass
now.
It is somewhat dubious what the test actually brings to the table given
that it is quite platform specific. Ideally, we would fix this mess by
having a configure-time check whether regcomp(3p) works as expected,
including NUL bytes, and use our bundled version of the regex library in
case it doesn't. Like this, we could ensure that all platforms work the
same in this edge case and mark the new behaviour as expected.
This change is outside of the scope of this patch series, which only
introduces support for TAP. So instead of fixing the bigger issue,
ignore the test on Darwin like we already do for Cygwin.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When the Bash version is too old to support BASH_XTRACEFD we print a
warning to stderr. This warning is not prefixed with "#", which causes
TAP parsers to (wrongly) interpret the warning as part of the protocol.
Fix this issue by prefixing the warning with a "#" so that it is treated
as comment.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We have several flags like "--verbose", "--verbose-only" or "-x" that
cause us to generate shell traces. The generated tracing output is split
up in these cases so that the test's stdout is printed to file
descriptor 3 whereas its stderr is printed to file descriptor 4.
Depending on which options have been given, we then end up either:
- Redirecting both file descriptors to a file.
- Redirecting them to stdout and stderr, respectively.
- Closing them in case we're running in none-verbose mode.
The second case causes problems though when passing output to a TAP
parser. We print the test's stdout to the console's stdout, and that
results in broken TAP output.
Fix the issue by instead redirecting the test's stdout to the shell's
stderr. This makes it impossible to discern stdout from stderr, but
going by my own experience I never came across a usecase where I would
have needed this distinction.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The tests in t9835 and t9836 verify that git-p4(1) works with both
Python 2 and 3, respectively. To determine whether we have those Python
versions in the first place we create a wrapper script that directly
executes the git-p4(1) script with `python2` or `python3` binaries. We
then condition the execution of tests on whether that wrapper script can
be executed successfully.
The logic that does all of this is not contained in a prerequisite block
though, so the output it generates causes us to break the TAP format.
Refactor the logic to use `test_lazy_prereq()` to fix this.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Tests in t9822 depend on filesystem support for ISO-8859-1 encoding. We
thus have a block of code that acts as a prerequisite -- if we fail to
write a file with an ISO-8859-1-encoded file name to disk then we skip
all tests.
When the prerequisite fails though we end up printing an error message
to stderr, which breaks the TAP format. Fix this by converting the code
to a proper prerequisite, which handles output redirection for us.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
There are a couple users of `test_create_repo()` that use this function
outside of any test case. This function is nowadays only a thin wrapper
around `git init`, which by default prints a message to stdout that the
repository has been initialized. The resulting output may thus confuse
TAP parsers.
Refactor these users to instead create the repository in a "setup" test
case so that we don't explicitly have to silence them. There's one
exception in t1007: we use `push_repo()` and its `pop_repo()` equivalent
multiple times, so to reduce the noise introduced by this patch we
instead silence this invocation.
While at it, convert callsites to use git-init(1) directly as the
`test_create_repo()` function has been deprecated in f0d4d398e28
(test-lib: split up and deprecate test_create_repo(), 2021-05-10).
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We have a couple of cases where our tests end up announcing that a
certain prerequisite is or isn't fulfilled. While this is supposed to
help the developer it has the downside that it breaks the TAP format.
We could convert these cases to just have a "#" prefix, but it feels
rather unlikely that these are generally useful in the first place. We
already do announce why a specific test is being skipped, so we should
try to use this mechanism to the best extent possible.
Stop announcing these prereqs to fix the TAP format. Where possible,
convert the tests to rely on the prerequisites themselves to announce
why a test ran or didn't ran.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
OpenBSD requires DIR_HAS_BSD_GROUP_SEMANTICS.
OpenBSD has never had the BSD sysctl KERN_PROC_PATHNAME nor
does it support or use the /proc filesystem.
OpenBSD has had strcasestr() since 3.8. OpenBSD has had memmem()
since 5.4.
Signed-off-by: Brad Smith <brad@comstyle.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
OpenBSD / NetBSD use HW_PHYSMEM64 to detect the amount of physical
memory in a system. HW_PHYSMEM will not provide the correct amount
on a system with >=4GB of memory.
Signed-off-by: Brad Smith <brad@comstyle.com>
Reviewed-by: Collin Funk <collin.funk1@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
227c4f33a03 (doc: add a blank line around block delimiters,
2025-03-09) added blank lines around block delimiters as a
defensive measure. For each block you had to mind the con-
text (like the commit says):
• Top-level: just add blank lines
• Block: use list continuation (+)
But list continuation was used here at the top level, which
results in literal `+` in the output formats.
Acked-by: Jean-Noël Avila <jn.avila@free.fr>
Signed-off-by: Kristoffer Haugsbakk <code@khaugsbakk.name>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
OpenBSD / NetBSD use HW_NCPUONLINE to detect the online CPU
count. OpenBSD ships with SMT disabled on X86 systems so
HW_NCPU would provide double the number of CPUs as opposed
to the proper online count.
Signed-off-by: Brad Smith <brad@comstyle.com>
Reviewed-by: Collin Funk <collin.funk1@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Some tests make git perform actions that produce observable pathnames,
and have expectations on those paths. Tests run with $HOME set to a
$TRASH_DIRECTORY, and with their working directory the same
$TRASH_DIRECTORY, although these paths are logically identical, they do
not observe the same pathname canonicalization rules and thus might not
be represented by strings that compare equal. In particular, no pathname
normalization is applied to $TRASH_DIRECTORY or $HOME, while tests
change their working directory with `cd -P`, which normalizes the
working directory's path by fully resolving symbolic links.
t7900's macOS maintenance tests (which are not limited to running on
macOS) have an expectation on a path that `git maintenance` forms by
using abspath.c strbuf_realpath() to resolve a canonical absolute path
based on $HOME. When t7900 runs from a working directory that contains
symbolic links in its pathname, $HOME will also contain symbolic links,
which `git maintenance` resolves but the test's expectation does not,
causing a test failure.
Align $TRASH_DIRECTORY and $HOME with the normalized path as used for
the working directory by resetting them to match the working directory
after it's established by `cd -P`. With all paths in agreement and
symbolic links resolved, pathname expectations can be set and met based
on string comparison without regard to external environmental factors
such as the presence of symbolic links in a path.
Suggested-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Mark Mentovai <mark@chromium.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When a stale .midx file refers to .pack files that no longer exist,
we ended up checking for these non-existent files repeatedly, which
has been optimized by memoizing the non-existence.
* ps/midx-negative-packfile-cache:
midx: stop repeatedly looking up nonexistent packfiles
packfile: explain ordering of how we look up auxiliary pack files
"git notes --help" documentation updates.
* kh/notes-doc-fixes:
doc: notes: use stuck form throughout
doc: notes: treat --stdin equally between copy/remove
doc: notes: point out copy --stdin use with argv
doc: notes: clearly state that --stripspace is the default
doc: notes: remove stripspace discussion from other options
doc: notes: rework --[no-]stripspace
doc: notes: split out options with negated forms
doc: config: mention core.commentChar on commit.cleanup
doc: stripspace: mention where the default comes from
"git apply --index/--cached" when applying a deletion patch in
reverse failed to give the mode bits of the path "removed" by the
patch to the file it creates, which has been corrected.
* mm/apply-reverse-mode-of-deleted-path:
apply: set file mode when --reverse creates a deleted file
t4129: test that git apply warns for unexpected mode changes
Recent versions of Perl started warning against "! A =~ /pattern/"
which does not negate the result of the matching. As it turns out
that the problematic function is not even called, it was removed.
* op/cvsserver-perl-warning:
cvsserver: remove unused escapeRefName function
Avoid adding directory path to a sparse-index tree entries to the
name-hash, since they would bloat the hashtable without anybody
querying for them. This was done already for a single threaded
part of the code, but now the multi-threaded code also does the
same.
* am/sparse-index-name-hash-fix:
name-hash: don't add sparse directories in threaded lazy init
Integer overflow fix around code paths for "git multi-pack-index repack"..
* pw/midx-repack-overflow-fix:
midx docs: clarify tie breaking
midx: avoid negative array index
midx repack: avoid potential integer overflow on 64 bit systems
midx repack: avoid integer overflow on 32 bit systems
The current example for Gmail suggests using app passwords for
send-email if user has multi-factor authentication set up for their
account. However, it does not clarify that the user cannot use their
normal password in case they do not have multi-factor authentication
enabled. Most likely the example was written in the days when Google
allowed using normal passwords without multi-factor authentication.
Clarify that regular passwords do not work for Gmail and app-passwords
are the only way for basic authentication. Also encourage users to use
OAuth2.0 as a more secure alternative.
While at it, also prefer using the word "mechanism" over "method" for
`OAUTHBEARER` and `XOAUTH2` since that is what official docs use.
Signed-off-by: Aditya Garg <gargaditya08@live.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In a recent attempt to add links of email helpers to git-scm.com [1], I
came to a conclusion that the links in the gitcredentials page are meant
for people needing credential helpers for cloning, fetching and pushing
repositories to remote hosts, and not sending emails. gitcredentials
docs don't even talk about send emails, thus confirming this view.
So, lets remove these links from the gitcredentials page. The links are
still available in the git-send-email documentation, which is the right
place for them.
[1]: https://github.com/git/git-scm.com/pull/2005
Signed-off-by: Aditya Garg <gargaditya08@live.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The current documentation for git-send-email had an inconsistent use of
"", ``, and '' for quoting. This commit improves the formatting by
using the same style throughout the documentation. Missing full stops
have also been added at some places.
Finally, the cpan links of necessary perl modules have been added to
make their installation easier.
While at it, the unecessary use of $ with <num> and <int> placeholders
has also been removed.
Signed-off-by: Aditya Garg <gargaditya08@live.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This commit links `git-credential-yahoo` as a credential helper for
Yahoo accounts. Also, Google's `sendgmail` tool has been linked as an
alternative method for sending emails through Gmail.
Signed-off-by: Aditya Garg <gargaditya08@live.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Fix this inline list to use a single style, namely numeric, instead of
`(1)` followed by `(b)`.
Signed-off-by: Wonuk Kim <kimww0306@gmail.com>
Acked-by: Kristoffer Haugsbakk <kristofferhaugsbakk@fastmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add userdiff patterns to support R programming language.
Also, add three userdiff tests for R programming language
files. These files define simple function and nested function,
with and without indentation.
Signed-off-by: Rodrigo Carvalho <rodrigorsdc@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Since f93b2a0424 (reftable/basics: introduce `REFTABLE_UNUSED`
annotation, 2025-02-18), the reftable library was migrated to
use an internal version of `UNUSED`, which unconditionally sets
a GNU __attribute__ to avoid warnings function parameters that
are not being used.
Make the definition conditional to prevent breaking the build
with non GNU compilers.
Reported-by: "Randall S. Becker" <rsbecker@nexbridge.com>
Signed-off-by: Carlo Marcelo Arenas Belón <carenas@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* 'master' of https://github.com/j6t/git-gui:
git-gui: wire up support for the Meson build system
git-gui: stop including GIT-VERSION-FILE file
git-gui: extract script to generate macOS app
git-gui: extract script to generate macOS wrapper
git-gui: extract script to generate "tclIndex"
git-gui: extract script to generate "git-gui"
git-gui: drop no-op GITGUI_SCRIPT replacement
git-gui: make output of GIT-VERSION-GEN source'able
git-gui: prepare GIT-VERSION-GEN for out-of-tree builds
git-gui: replace GIT-GUI-VARS with GIT-GUI-BUILD-OPTIONS
* 'master' of https://github.com/j6t/gitk:
gitk: do not hard-code color of search results in commit list
gitk: place file name arguments after options in msgfmt call
gitk: Legacy widgets doesn't have combobox
- Added complete Irish translation (ga.po).
- Added entry for Irish in po/TEAMS.
- Corrected email format and removed trailing whitespace.
- Translated new strings from Git 2.50.0-rc0
Signed-off-by: Aindriú Mac Giolla Eoin <aindriu80@gmail.com>
* 'pks-meson-support' of github.com:pks-t/git-gui:
git-gui: wire up support for the Meson build system
git-gui: stop including GIT-VERSION-FILE file
git-gui: extract script to generate macOS app
git-gui: extract script to generate macOS wrapper
git-gui: extract script to generate "tclIndex"
git-gui: extract script to generate "git-gui"
git-gui: drop no-op GITGUI_SCRIPT replacement
git-gui: make output of GIT-VERSION-GEN source'able
git-gui: prepare GIT-VERSION-GEN for out-of-tree builds
git-gui: replace GIT-GUI-VARS with GIT-GUI-BUILD-OPTIONS
Signed-off-by: Johannes Sixt <j6t@kdbg.org>
* maint-2.47:
Git 2.47.3
Git 2.46.4
Git 2.45.4
Git 2.44.4
Git 2.43.7
wincred: avoid buffer overflow in wcsncat()
bundle-uri: fix arbitrary file writes via parameter injection
config: quote values containing CR character
git-gui: sanitize 'exec' arguments: convert new 'cygpath' calls
git-gui: do not mistake command arguments as redirection operators
git-gui: introduce function git_redir for git calls with redirections
git-gui: pass redirections as separate argument to git_read
git-gui: pass redirections as separate argument to _open_stdout_stderr
git-gui: convert git_read*, git_write to be non-variadic
git-gui: override exec and open only on Windows
gitk: sanitize 'open' arguments: revisit recently updated 'open' calls
git-gui: use git_read in githook_read
git-gui: sanitize $PATH on all platforms
git-gui: break out a separate function git_read_nice
git-gui: assure PATH has only absolute elements.
git-gui: remove option --stderr from git_read
git-gui: cleanup git-bash menu item
git-gui: sanitize 'exec' arguments: background
git-gui: avoid auto_execok in do_windows_shortcut
git-gui: sanitize 'exec' arguments: simple cases
git-gui: avoid auto_execok for git-bash menu item
git-gui: treat file names beginning with "|" as relative paths
git-gui: remove unused proc is_shellscript
git-gui: remove git config --list handling for git < 1.5.3
git-gui: remove special treatment of Windows from open_cmd_pipe
git-gui: remove HEAD detachment implementation for git < 1.5.3
git-gui: use only the configured shell
git-gui: remove Tcl 8.4 workaround on 2>@1 redirection
git-gui: make _shellpath usable on startup
git-gui: use [is_Windows], not bad _shellpath
git-gui: _which, only add .exe suffix if not present
gitk: encode arguments correctly with "open"
gitk: sanitize 'open' arguments: command pipeline
gitk: collect construction of blameargs into a single conditional
gitk: sanitize 'open' arguments: simple commands, readable and writable
gitk: sanitize 'open' arguments: simple commands with redirections
gitk: sanitize 'open' arguments: simple commands
gitk: sanitize 'exec' arguments: redirect to process
gitk: sanitize 'exec' arguments: redirections and background
gitk: sanitize 'exec' arguments: redirections
gitk: sanitize 'exec' arguments: 'eval exec'
gitk: sanitize 'exec' arguments: simple cases
gitk: have callers of diffcmd supply pipe symbol when necessary
gitk: treat file names beginning with "|" as relative paths
* maint-2.46:
Git 2.46.4
Git 2.45.4
Git 2.44.4
Git 2.43.7
wincred: avoid buffer overflow in wcsncat()
bundle-uri: fix arbitrary file writes via parameter injection
config: quote values containing CR character
git-gui: sanitize 'exec' arguments: convert new 'cygpath' calls
git-gui: do not mistake command arguments as redirection operators
git-gui: introduce function git_redir for git calls with redirections
git-gui: pass redirections as separate argument to git_read
git-gui: pass redirections as separate argument to _open_stdout_stderr
git-gui: convert git_read*, git_write to be non-variadic
git-gui: override exec and open only on Windows
gitk: sanitize 'open' arguments: revisit recently updated 'open' calls
git-gui: use git_read in githook_read
git-gui: sanitize $PATH on all platforms
git-gui: break out a separate function git_read_nice
git-gui: assure PATH has only absolute elements.
git-gui: remove option --stderr from git_read
git-gui: cleanup git-bash menu item
git-gui: sanitize 'exec' arguments: background
git-gui: avoid auto_execok in do_windows_shortcut
git-gui: sanitize 'exec' arguments: simple cases
git-gui: avoid auto_execok for git-bash menu item
git-gui: treat file names beginning with "|" as relative paths
git-gui: remove unused proc is_shellscript
git-gui: remove git config --list handling for git < 1.5.3
git-gui: remove special treatment of Windows from open_cmd_pipe
git-gui: remove HEAD detachment implementation for git < 1.5.3
git-gui: use only the configured shell
git-gui: remove Tcl 8.4 workaround on 2>@1 redirection
git-gui: make _shellpath usable on startup
git-gui: use [is_Windows], not bad _shellpath
git-gui: _which, only add .exe suffix if not present
gitk: encode arguments correctly with "open"
gitk: sanitize 'open' arguments: command pipeline
gitk: collect construction of blameargs into a single conditional
gitk: sanitize 'open' arguments: simple commands, readable and writable
gitk: sanitize 'open' arguments: simple commands with redirections
gitk: sanitize 'open' arguments: simple commands
gitk: sanitize 'exec' arguments: redirect to process
gitk: sanitize 'exec' arguments: redirections and background
gitk: sanitize 'exec' arguments: redirections
gitk: sanitize 'exec' arguments: 'eval exec'
gitk: sanitize 'exec' arguments: simple cases
gitk: have callers of diffcmd supply pipe symbol when necessary
gitk: treat file names beginning with "|" as relative paths
Signed-off-by: Taylor Blau <me@ttaylorr.com>
* maint-2.45:
Git 2.45.4
Git 2.44.4
Git 2.43.7
wincred: avoid buffer overflow in wcsncat()
bundle-uri: fix arbitrary file writes via parameter injection
config: quote values containing CR character
git-gui: sanitize 'exec' arguments: convert new 'cygpath' calls
git-gui: do not mistake command arguments as redirection operators
git-gui: introduce function git_redir for git calls with redirections
git-gui: pass redirections as separate argument to git_read
git-gui: pass redirections as separate argument to _open_stdout_stderr
git-gui: convert git_read*, git_write to be non-variadic
git-gui: override exec and open only on Windows
gitk: sanitize 'open' arguments: revisit recently updated 'open' calls
git-gui: use git_read in githook_read
git-gui: sanitize $PATH on all platforms
git-gui: break out a separate function git_read_nice
git-gui: assure PATH has only absolute elements.
git-gui: remove option --stderr from git_read
git-gui: cleanup git-bash menu item
git-gui: sanitize 'exec' arguments: background
git-gui: avoid auto_execok in do_windows_shortcut
git-gui: sanitize 'exec' arguments: simple cases
git-gui: avoid auto_execok for git-bash menu item
git-gui: treat file names beginning with "|" as relative paths
git-gui: remove unused proc is_shellscript
git-gui: remove git config --list handling for git < 1.5.3
git-gui: remove special treatment of Windows from open_cmd_pipe
git-gui: remove HEAD detachment implementation for git < 1.5.3
git-gui: use only the configured shell
git-gui: remove Tcl 8.4 workaround on 2>@1 redirection
git-gui: make _shellpath usable on startup
git-gui: use [is_Windows], not bad _shellpath
git-gui: _which, only add .exe suffix if not present
gitk: encode arguments correctly with "open"
gitk: sanitize 'open' arguments: command pipeline
gitk: collect construction of blameargs into a single conditional
gitk: sanitize 'open' arguments: simple commands, readable and writable
gitk: sanitize 'open' arguments: simple commands with redirections
gitk: sanitize 'open' arguments: simple commands
gitk: sanitize 'exec' arguments: redirect to process
gitk: sanitize 'exec' arguments: redirections and background
gitk: sanitize 'exec' arguments: redirections
gitk: sanitize 'exec' arguments: 'eval exec'
gitk: sanitize 'exec' arguments: simple cases
gitk: have callers of diffcmd supply pipe symbol when necessary
gitk: treat file names beginning with "|" as relative paths
Signed-off-by: Taylor Blau <me@ttaylorr.com>
* maint-2.44:
Git 2.44.4
Git 2.43.7
wincred: avoid buffer overflow in wcsncat()
bundle-uri: fix arbitrary file writes via parameter injection
config: quote values containing CR character
git-gui: sanitize 'exec' arguments: convert new 'cygpath' calls
git-gui: do not mistake command arguments as redirection operators
git-gui: introduce function git_redir for git calls with redirections
git-gui: pass redirections as separate argument to git_read
git-gui: pass redirections as separate argument to _open_stdout_stderr
git-gui: convert git_read*, git_write to be non-variadic
git-gui: override exec and open only on Windows
gitk: sanitize 'open' arguments: revisit recently updated 'open' calls
git-gui: use git_read in githook_read
git-gui: sanitize $PATH on all platforms
git-gui: break out a separate function git_read_nice
git-gui: assure PATH has only absolute elements.
git-gui: remove option --stderr from git_read
git-gui: cleanup git-bash menu item
git-gui: sanitize 'exec' arguments: background
git-gui: avoid auto_execok in do_windows_shortcut
git-gui: sanitize 'exec' arguments: simple cases
git-gui: avoid auto_execok for git-bash menu item
git-gui: treat file names beginning with "|" as relative paths
git-gui: remove unused proc is_shellscript
git-gui: remove git config --list handling for git < 1.5.3
git-gui: remove special treatment of Windows from open_cmd_pipe
git-gui: remove HEAD detachment implementation for git < 1.5.3
git-gui: use only the configured shell
git-gui: remove Tcl 8.4 workaround on 2>@1 redirection
git-gui: make _shellpath usable on startup
git-gui: use [is_Windows], not bad _shellpath
git-gui: _which, only add .exe suffix if not present
gitk: encode arguments correctly with "open"
gitk: sanitize 'open' arguments: command pipeline
gitk: collect construction of blameargs into a single conditional
gitk: sanitize 'open' arguments: simple commands, readable and writable
gitk: sanitize 'open' arguments: simple commands with redirections
gitk: sanitize 'open' arguments: simple commands
gitk: sanitize 'exec' arguments: redirect to process
gitk: sanitize 'exec' arguments: redirections and background
gitk: sanitize 'exec' arguments: redirections
gitk: sanitize 'exec' arguments: 'eval exec'
gitk: sanitize 'exec' arguments: simple cases
gitk: have callers of diffcmd supply pipe symbol when necessary
gitk: treat file names beginning with "|" as relative paths
Signed-off-by: Taylor Blau <me@ttaylorr.com>
* maint-2.43:
Git 2.43.7
wincred: avoid buffer overflow in wcsncat()
bundle-uri: fix arbitrary file writes via parameter injection
config: quote values containing CR character
git-gui: sanitize 'exec' arguments: convert new 'cygpath' calls
git-gui: do not mistake command arguments as redirection operators
git-gui: introduce function git_redir for git calls with redirections
git-gui: pass redirections as separate argument to git_read
git-gui: pass redirections as separate argument to _open_stdout_stderr
git-gui: convert git_read*, git_write to be non-variadic
git-gui: override exec and open only on Windows
gitk: sanitize 'open' arguments: revisit recently updated 'open' calls
git-gui: use git_read in githook_read
git-gui: sanitize $PATH on all platforms
git-gui: break out a separate function git_read_nice
git-gui: assure PATH has only absolute elements.
git-gui: remove option --stderr from git_read
git-gui: cleanup git-bash menu item
git-gui: sanitize 'exec' arguments: background
git-gui: avoid auto_execok in do_windows_shortcut
git-gui: sanitize 'exec' arguments: simple cases
git-gui: avoid auto_execok for git-bash menu item
git-gui: treat file names beginning with "|" as relative paths
git-gui: remove unused proc is_shellscript
git-gui: remove git config --list handling for git < 1.5.3
git-gui: remove special treatment of Windows from open_cmd_pipe
git-gui: remove HEAD detachment implementation for git < 1.5.3
git-gui: use only the configured shell
git-gui: remove Tcl 8.4 workaround on 2>@1 redirection
git-gui: make _shellpath usable on startup
git-gui: use [is_Windows], not bad _shellpath
git-gui: _which, only add .exe suffix if not present
gitk: encode arguments correctly with "open"
gitk: sanitize 'open' arguments: command pipeline
gitk: collect construction of blameargs into a single conditional
gitk: sanitize 'open' arguments: simple commands, readable and writable
gitk: sanitize 'open' arguments: simple commands with redirections
gitk: sanitize 'open' arguments: simple commands
gitk: sanitize 'exec' arguments: redirect to process
gitk: sanitize 'exec' arguments: redirections and background
gitk: sanitize 'exec' arguments: redirections
gitk: sanitize 'exec' arguments: 'eval exec'
gitk: sanitize 'exec' arguments: simple cases
gitk: have callers of diffcmd supply pipe symbol when necessary
gitk: treat file names beginning with "|" as relative paths
Signed-off-by: Taylor Blau <me@ttaylorr.com>
This merges in the fix for CVE-2025-48386.
* tb/wincred-buffer-overflow:
wincred: avoid buffer overflow in wcsncat()
Signed-off-by: Taylor Blau <me@ttaylorr.com>
As the design of signature handling is still being discussed, it is
likely that the data stream produced by the code in Git 2.50 would
have to be changed in such a way that is not backward compatible.
Mark the feature as experimental and discourge its use for now.
Also flip the default on the generation side to "strip"; users of
existing versions would not have passed --signed-commits=strip and
will be broken by this change if the default is made to abort, and
will be encouraged by the error message to produce data stream with
future breakage guarantees by passing --signed-commits option.
As we tone down the default behaviour, we no longer need the
FAST_EXPORT_SIGNED_COMMITS_NOABORT environment variable, which was
not discoverable enough.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The wincred credential helper uses a static buffer ("target") as a
unique key for storing and comparing against internal storage. It does
this by building up a string is supposed to look like:
git:$PROTOCOL://$USERNAME@$HOST/@PATH
However, the static "target" buffer is declared as a wide string with no
more than 1,024 wide characters. The first call to wcsncat() is almost
correct (it copies no more than ARRAY_SIZE(target) wchar_t's), but does
not account for the trailing NUL, introducing an off-by-one error.
But subsequent calls to wcsncat() have an additional problem on top of
the off-by-one. They do not account for the length of the existing
wide string being built up in 'target'. So the following:
$ perl -e '
my $x = "x" x 1_000;
print "protocol=$x\nhost=$x\nusername=$x\npath=$x\n"
' |
C\:/Program\ Files/Git/mingw64/libexec/git-core/git-credential-wincred.exe get
will result in a segmentation fault from over-filling buffer.
This bug is as old as the wincred helper itself, dating back to
a6253da0f3 (contrib: add win32 credential-helper, 2012-07-27). Commit
8b2d219a3d (wincred: improve compatibility with windows versions,
2013-01-10) replaced the use of strncat() with wcsncat(), but retained
the buggy behavior.
Fix this by using a "target_append()" helper which accounts for both the
length of the existing string within the buffer, as well as the trailing
NUL character.
Reported-by: David Leadbeater <dgl@dgl.cx>
Helped-by: David Leadbeater <dgl@dgl.cx>
Helped-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
This merges in the fix for CVE-2025-48384.
* jt/config-quote-cr:
config: quote values containing CR character
Signed-off-by: Taylor Blau <me@ttaylorr.com>
This merges in the fix for CVE-2025-48385.
* ps/bundle-uri-arbitrary-writes:
bundle-uri: fix arbitrary file writes via parameter injection
Signed-off-by: Taylor Blau <me@ttaylorr.com>
This merges in fixes for CVE-2025-27614, CVE-2025-27613, CVE-2025-46334,
and CVE-2025-46835 targeting Gitk and Git GUI.
* js/gitk-git-gui-harden-exec-open: (41 commits)
git-gui: sanitize 'exec' arguments: convert new 'cygpath' calls
git-gui: do not mistake command arguments as redirection operators
git-gui: introduce function git_redir for git calls with redirections
git-gui: pass redirections as separate argument to git_read
git-gui: pass redirections as separate argument to _open_stdout_stderr
git-gui: convert git_read*, git_write to be non-variadic
git-gui: override exec and open only on Windows
gitk: sanitize 'open' arguments: revisit recently updated 'open' calls
git-gui: use git_read in githook_read
git-gui: sanitize $PATH on all platforms
git-gui: break out a separate function git_read_nice
git-gui: assure PATH has only absolute elements.
git-gui: remove option --stderr from git_read
git-gui: cleanup git-bash menu item
git-gui: sanitize 'exec' arguments: background
git-gui: avoid auto_execok in do_windows_shortcut
git-gui: sanitize 'exec' arguments: simple cases
git-gui: avoid auto_execok for git-bash menu item
git-gui: treat file names beginning with "|" as relative paths
git-gui: remove unused proc is_shellscript
git-gui: remove git config --list handling for git < 1.5.3
git-gui: remove special treatment of Windows from open_cmd_pipe
git-gui: remove HEAD detachment implementation for git < 1.5.3
git-gui: use only the configured shell
git-gui: remove Tcl 8.4 workaround on 2>@1 redirection
git-gui: make _shellpath usable on startup
git-gui: use [is_Windows], not bad _shellpath
git-gui: _which, only add .exe suffix if not present
gitk: encode arguments correctly with "open"
gitk: sanitize 'open' arguments: command pipeline
gitk: collect construction of blameargs into a single conditional
gitk: sanitize 'open' arguments: simple commands, readable and writable
gitk: sanitize 'open' arguments: simple commands with redirections
gitk: sanitize 'open' arguments: simple commands
gitk: sanitize 'exec' arguments: redirect to process
gitk: sanitize 'exec' arguments: redirections and background
gitk: sanitize 'exec' arguments: redirections
gitk: sanitize 'exec' arguments: 'eval exec'
gitk: sanitize 'exec' arguments: simple cases
gitk: have callers of diffcmd supply pipe symbol when necessary
gitk: treat file names beginning with "|" as relative paths
...
Signed-off-by: Taylor Blau <me@ttaylorr.com>
"git receive-pack" optionally learns not to care about connectivity
check, which can be useful when the repository arranges to ensure
connectivity by some other means.
* jt/receive-pack-skip-connectivity-check:
builtin/receive-pack: add option to skip connectivity check
t5410: test receive-pack connectivity check
Remove the leftover hints to the test framework to mark tests that
do not pass the leak checker tests, as they should no longer be
needed.
* kn/passing-leak-tests:
t: remove unexpected SANITIZE_LEAK variables
The multi-pack index acts as a cache across a set of packfiles so that
we can quickly look up which of those packfiles contains a given object.
As such, the multi-pack index naturally needs to be updated every time
one of the packfiles goes away, or otherwise the multi-pack index has
grown stale.
A stale multi-pack index should be handled gracefully by Git though, and
in fact it is: if the indexed pack cannot be found we simply ignore it
and eventually we fall back to doing the object lookup by just iterating
through all packs, even if those aren't indexed.
But while this fallback works, it has one significant downside: we don't
cache the fact that a pack has vanished. This leads to us repeatedly
trying to look up the same pack only to realize that it (still) doesn't
exist.
This issue can be easily demonstrated by creating a repository with a
stale multi-pack index and a couple of objects. We do so by creating a
repository with two packfiles, both of which are indexed by the
multi-pack index, and then repack those two packfiles. Note that we have
to move the multi-pack-index before doing the final repack, as Git knows
to delete it otherwise.
$ git init repo
$ cd repo/
$ git config set maintenance.auto false
$ for i in $(seq 1000); do printf "%d-original" $i >file-$i; done
$ git add .
$ git commit -moriginal
$ git repack -dl
$ for i in $(seq 1000); do printf "%d-modified" $i >file-$i; done
$ git commit -a -mmodified
$ git repack -dl
$ git multi-pack-index write
$ mv .git/objects/pack/multi-pack-index .
$ git repack -Adl
$ mv multi-pack-index .git/objects/pack/
Commands that cause a lot of objects lookups will now repeatedly invoke
`add_packed_git()`, which leads to three failed access(3p) calls as well
as one failed stat(3p) call. The following strace for example is done
for `git log --patch` in the above repository:
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
74.67 0.024693 1 18038 18031 access
25.33 0.008378 1 6045 6017 newfstatat
------ ----------- ----------- --------- --------- ----------------
100.00 0.033071 1 24083 24048 total
Fix the issue by introducing a negative lookup cache for indexed packs.
This cache works by simply storing an invalid pointer for a missing pack
when `prepare_midx_pack()` fails to look up the pack. Most users of the
`packs` array don't need to be adjusted, either, as they all know to
call `prepare_midx_pack()` before accessing the array.
With this change in place we can now see a significantly reduced number
of syscalls:
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
73.58 0.000323 5 60 28 newfstatat
26.42 0.000116 5 23 16 access
------ ----------- ----------- --------- --------- ----------------
100.00 0.000439 5 83 44 total
Furthermore, this change also results in a speedup:
Benchmark 1: git log --patch (revision = HEAD~)
Time (mean ± σ): 50.4 ms ± 2.5 ms [User: 22.0 ms, System: 24.4 ms]
Range (min … max): 45.4 ms … 54.9 ms 53 runs
Benchmark 2: git log --patch (revision = HEAD)
Time (mean ± σ): 12.7 ms ± 0.4 ms [User: 11.1 ms, System: 1.6 ms]
Range (min … max): 12.4 ms … 15.0 ms 191 runs
Summary
git log --patch (revision = HEAD) ran
3.96 ± 0.22 times faster than git log --patch (revision = HEAD~)
In the end, it should in theory never be necessary to have this negative
lookup cache given that we know to update the multi-pack index together
with repacks. But as the change is quite contained and as the speedup
can be significant as demonstrated above, it does feel sensible to have
the negative lookup cache regardless.
Based-on-patch-by: Jeff King <peff@peff.net>
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When adding a packfile to an object database we perform four syscalls:
- Three calls to access(3p) are done to check for auxiliary data
structures.
- One call to stat(3p) is done to check for the ".pack" itself.
One curious bit is that we perform the access(3p) calls before checking
for the packfile itself, but if the packfile doesn't exist we discard
all results. The access(3p) calls are thus essentially wasted, so one
may be triggered to reorder those calls so that we can short-circuit the
other syscalls in case the packfile does not exist.
The order in which we look up files is quite important though to help
avoid races:
- When installing a packfile we move auxiliary data structures into
place before we install the ".idx" file.
- When deleting a packfile we first delete the ".idx" and ".pack"
files before deleting auxiliary data structures.
As such, to avoid any races with concurrently created or deleted packs
we need to make sure that we _first_ read auxiliary data structures
before we read the corresponding ".idx" or ".pack" file. Otherwise it
may easily happen that we return a populated but misclassified pack.
Add a comment to `add_packed_git()` to make future readers aware of this
ordering requirement.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
gitcli(7) recommends the *stuck form*. `--ref` is the only one which
does not use it.
Signed-off-by: Kristoffer Haugsbakk <code@khaugsbakk.name>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
46538012d94 (notes remove: --stdin reads from the standard input,
2011-05-18) added `--stdin` for the `remove` subcommand, documenting it
in the “Options” section. But `copy --stdin` was added before that, in
160baa0d9cb (notes: implement 'git notes copy --stdin', 2010-03-12).
Treat this option equally between the two subcommands:
• remove: mention `--stdin` on the subcommand as well, like for `copy`
• copy: mention it as well under the option documentation
Signed-off-by: Kristoffer Haugsbakk <code@khaugsbakk.name>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Unlike `remove --stdin`, this option cannot be combined with object
names given via the command line.
Signed-off-by: Kristoffer Haugsbakk <code@khaugsbakk.name>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Clearly state when which of the regular and negated form of the
option take effect.[1]
Also mention the subtle behavior that occurs when you mix options like
`-m` and `-C`, including a note that it might be fixed in the future.
The topic was brought up on v8 of the `--separator` series.[2][3]
[1]: https://lore.kernel.org/git/xmqqcyct1mtq.fsf@gitster.g/
[2]: https://lore.kernel.org/git/xmqq4jp326oj.fsf@gitster.g/
† 3: v11 was the version that landed
Helped-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Kristoffer Haugsbakk <code@khaugsbakk.name>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Cleaning up whitespace in metadata is typical porcelain behavior and
this default does not need to be pointed out.[1] Only speak up when
the default `--stripspace` is not used.
Also remove all misleading mentions of comment lines in the process;
see the previous commit.
Also remove the period that trails the parenthetical here.
† 1: See `-F` in git-commit(1) which has nothing to say about whitespace
cleanup. The cleanup discussion is on `--cleanup`.
Signed-off-by: Kristoffer Haugsbakk <code@khaugsbakk.name>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Document this option by copying the bullet list from git-stripspace(1).
A bullet list is cleaner when there are this many points to consider.
We also get a more standardized description of the multiple-blank-lines
behavior. Compare the repeating (git-notes(1)):
empty lines other than a single line between paragraphs
With (git-stripspace(1)):
multiple consecutive empty lines
And:
leading [...] whitespace
With:
empty lines from the beginning
Leading whitespace in the form of spaces (indentation) are not removed.
However, empty lines at the start of the message are removed.
Note that we drop the mentions of comment line handling because they are
wrong; this option does not control how lines which can be recognized as
comment lines are handled. Only interactivity controls that:
• Comment lines are stripped after editing interactively
• Lines which could be recognized as comment lines are left alone when
the message is given non-interactively
So it is misleading to document the comment line behavior on
this option.
Further, the text is wrong:
Lines starting with `#` will be stripped out in non-editor cases
like `-m`, [...]
Comment lines are still indirectly discussed on other options. We will
deal with them in the next commit.
Signed-off-by: Kristoffer Haugsbakk <code@khaugsbakk.name>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Split these out so that they are easier to search for.[1]
[1]: https://lore.kernel.org/git/xmqqcyct1mtq.fsf@gitster.g/
Signed-off-by: Kristoffer Haugsbakk <code@khaugsbakk.name>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Mention it in parentheses since we are in a configuration context.
Refer to the default as such, not as “the” character.
Also don’t mention `#` again; just say “comment character”.
Signed-off-by: Kristoffer Haugsbakk <code@khaugsbakk.name>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Also quote `#` in line with the modern formatting convention.
Signed-off-by: Kristoffer Haugsbakk <code@khaugsbakk.name>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Prefix '#' to the commit title in the "rebase -i" todo file, just
like a merge commit being replayed.
* en/sequencer-comment-messages:
sequencer: make it clearer that commit descriptions are just comments
Assorted fixes for issues found with CodeQL.
* js/misc-fixes:
sequencer: stop pretending that an assignment is a condition
bundle-uri: avoid using undefined output of `sscanf()`
commit-graph: avoid using stale stack addresses
trace2: avoid "futile conditional"
Avoid redundant conditions
fetch: avoid unnecessary work when there is no current branch
has_dir_name(): make code more obvious
upload-pack: rename `enum` to reflect the operation
commit-graph: avoid malloc'ing a local variable
fetch: carefully clear local variable's address after use
commit: simplify code
The code path to access the "packed-refs" file while "fsck" is
taught to mmap the file, instead of reading the whole file in the
memory.
* sj/use-mmap-to-check-packed-refs:
packed-backend: mmap large "packed-refs" file during fsck
packed-backend: extract snapshot allocation in `load_contents`
packed-backend: fsck should warn when "packed-refs" file is empty
"git apply" and "git add -i/-p" code paths no longer unnecessarily
expand sparse-index while working.
* ds/sparse-apply-add-p:
p2000: add performance test for patch-mode commands
reset: integrate sparse index with --patch
git add: make -p/-i aware of sparse index
apply: integrate with the sparse index
Updates to meson-based build procedure.
* rj/build-tweaks-part2:
configure.ac: upgrade to a compilation check for sysinfo
meson.build: correct setting of GIT_EXEC_PATH
meson: correct path to system config/attribute files
meson: correct install location of YAML.pm
meson.build: quote the GITWEBDIR build configuration
"git merge-tree" learned an option to see if it resolves cleanly
without actually creating a result.
* en/merge-tree-check:
merge-tree: add a new --quiet flag
merge-ort: add a new mergeability_only option
Support to create a loose object file with unknown object type has
been dropped.
* jk/no-funny-object-types:
object-file: drop support for writing objects with unknown types
hash-object: handle --literally with OPT_NEGBIT
hash-object: merge HASH_* and INDEX_* flags
hash-object: stop allowing unknown types
t: add lib-loose.sh
t/helper: add zlib test-tool
oid_object_info(): drop type_name strbuf
fsck: stop using object_info->type_name strbuf
oid_object_info_convert(): stop using string for object type
cat-file: use type enum instead of buffer for -t option
object-file: drop OBJECT_INFO_ALLOW_UNKNOWN_TYPE flag
cat-file: make --allow-unknown-type a noop
object-file.h: fix typo in variable declaration
The userdiff pattern for shell scripts has been updated to cope
with more bash-isms.
* md/userdiff-bash-shell-function:
userdiff: extend Bash pattern to cover more shell function forms
Function 'escapeRefName' introduced in 51a7e6dbc9 has never been used.
Despite being dead code, changes in Perl 5.41.4 exposed precedence
warning within its logic, which then caused test failures in t9402 by
logging the warnings to stderr while parsing the code. The affected
tests are t9402.30, t9402.31, t9402.32 and t9402.34.
Remove this unused function to simplify the codebase and stop the
warnings and test failures. Its corresponding unescapeRefName function,
which remains in use, has had its comments updated.
Reported-by: Jitka Plesnikova <jplesnik@redhat.com>
Signed-off-by: Ondřej Pohořelský <opohorel@redhat.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
- Switch the synopsis to a synopsis block which will automatically
format placeholders in italics and keywords in monospace
- Use _<placeholder>_ instead of <placeholder> in the description
- Use `backticks` for keywords and more complex option
descriptions. The new rendering engine will apply synopsis rules to
these spans.
Signed-off-by: Jean-Noël Avila <jn.avila@free.fr>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
- Use _<placeholder>_ instead of <placeholder> in the description
- Use `backticks` for keywords and more complex option
descriptions. The new rendering engine will apply synopsis rules to
these spans.
Signed-off-by: Jean-Noël Avila <jn.avila@free.fr>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
- Switch the synopsis to a synopsis block which will automatically
format placeholders in italics and keywords in monospace
- Use _<placeholder>_ instead of <placeholder> in the description
- Use `backticks` for keywords and more complex option
descriptions. The new rendering engine will apply synopsis rules to
these spans.
Signed-off-by: Jean-Noël Avila <jn.avila@free.fr>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
- Use _<placeholder>_ instead of <placeholder> in the description
- Use `backticks` for keywords and more complex option
descriptions. The new rendering engine will apply synopsis rules to
these spans.
Additionally, a list of option possible values has been reformatted as a
standalone definition list.
Signed-off-by: Jean-Noël Avila <jn.avila@free.fr>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
- Switch the synopsis to a synopsis block which will automatically
format placeholders in italics and keywords in monospace
- Use _<placeholder>_ instead of <placeholder> in the description
- Use `backticks` for keywords and more complex option
descriptions. The new rendering engine will apply synopsis rules to
these spans.
Signed-off-by: Jean-Noël Avila <jn.avila@free.fr>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
- Use _<placeholder>_ instead of <placeholder> in the description
- Use `backticks` for keywords and more complex option
descriptions. The new rendering engine will apply synopsis rules to
these spans.
Signed-off-by: Jean-Noël Avila <jn.avila@free.fr>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
- Switch the synopsis to a synopsis block which will automatically
format placeholders in italics and keywords in monospace
- Use _<placeholder>_ instead of <placeholder> in the description
- Use `backticks` for keywords and more complex option
descriptions. The new rendering engine will apply synopsis rules to
these spans.
In order to avoid breaking the format on '<<<<<<' and '>>>>>' lines
by applying the synopsis rules to these spans, they are formatted using '+'
signs instead of '`' signs.
Signed-off-by: Jean-Noël Avila <jn.avila@free.fr>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
- Switch the synopsis to a synopsis block which will automatically
format placeholders in italics and keywords in monospace
- Use _<placeholder>_ instead of <placeholder> in the description
- Use `backticks` for keywords and more complex option
descriptions. The new rendering engine will apply synopsis rules to
these spans.
Signed-off-by: Jean-Noël Avila <jn.avila@free.fr>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Commit 01aff0a (apply: correctly reverse patch's pre- and post-image
mode bits, 2023-12-26) revised reverse_patches() to maintain the desired
property that when only one of patch::old_mode and patch::new_mode is
set, the mode will be carried in old_mode. That property is generally
correct, with one notable exception: when creating a file, only new_mode
will be set. Since reversing a deletion results in a creation, new_mode
must be set in that case.
Omitting handling for this case means that reversing a patch that
removes an executable file will not result in the executable permission
being set on the re-created file. Existing test coverage for file modes
focuses only on mode changes of existing files.
Swap old_mode and new_mode in reverse_patches() for what's represented
in the patch as a file deletion, as it is transformed into a file
creation under reversal. This causes git apply --reverse to set the
executable permission properly when re-creating a deleted executable
file.
Add tests ensuring that git apply sets file modes correctly on file
creation, both in the forward and reverse directions.
Signed-off-by: Mark Mentovai <mark@chromium.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
There is no test covering what commit 01aff0a (apply: correctly reverse
patch's pre- and post-image mode bits, 2023-12-26) addressed. Prior to
that commit, git apply was erroneously unaware of a file's expected mode
while reverse-patching a file whose mode was not changing.
Add the missing test coverage to assure that git apply is aware of the
expected mode of a file being patched when the patch does not indicate
that the file's mode is changing. This is achieved by arranging a file
mode so that it doesn't agree with patch being applied, and checking git
apply's output for the warning it's supposed to raise in this situation.
Test in both reverse and normal (forward) directions.
Signed-off-by: Mark Mentovai <mark@chromium.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The dependency on the_repository variable has been reduced from the
code paths in "git replay".
* en/replay-wo-the-repository:
replay: replace the_repository with repo parameter passed to cmd_replay ()
Teach "git send-email" to also consult `hostname -f` for mail
domain to compute the identity given to SMTP servers.
* ag/send-email-hostname-f:
send-email: try to get fqdn by running hostname -f on Linux and macOS
CI settings at GitLab has been updated to run MSVC based Meson job
automatically (as opposed to be done only upon manual request).
* ps/ci-gitlab-enable-msvc-meson-job:
gitlab-ci: always run MSVC-based Meson job
Two "scalar" subcommands that adds a repository that hasn't been
under "scalar"'s control are taught an option not to enable the
scheduled maintenance on it.
* ds/scalar-no-maintenance:
scalar reconfigure: improve --maintenance docs
scalar reconfigure: add --maintenance=<mode> option
scalar clone: add --no-maintenance option
scalar register: add --no-maintenance option
scalar: customize register_dir()'s behavior
win+Meson CI pipeline, unlike other pipelines for Windows,
used to build artifacts in develper mode, which has been changed to
build them in release mode for consistency.
* js/ci-build-win-in-release-mode:
ci(win+Meson): build in Release mode
We fetch bundle URIs via `download_https_uri_to_file()`. The logic to
fetch those bundles is not handled in-process, but we instead use a
separate git-remote-https(1) process that performs the fetch for us. The
information about which file should be downloaded and where that file
should be put gets communicated via stdin of that process via a "get"
request. This "get" request has the form "get $uri $file\n\n". As may be
obvious to the reader, this will cause git-remote-https(1) to download
the URI "$uri" and put it into "$file".
The fact that we are using plain spaces and newlines as separators for
the request arguments means that we have to be extra careful with the
respective vaules of these arguments:
- If "$uri" contained a space we would interpret this as both URI and
target location.
- If either "$uri" or "$file" contained a newline we would interpret
this as a new command.
But we neither quote the arguments such that any characters with special
meaning would be escaped, nor do we verify that none of these special
characters are contained.
If either the URI or file contains a newline character, we are open to
protocol injection attacks. Likewise, if the URI itself contains a
space, then an attacker-controlled URI can lead to partially-controlled
file writes.
Note that the attacker-controlled URIs do not permit completely
arbitrary file writes, but instead allows an attacker to control the
path in which we will write a temporary (e.g., "tmp_uri_XXXXXX")
file.
The result is twofold:
- By adding a space in "$uri" we can control where exactly a file will
be written to, including out-of-repository writes. The final
location is not completely arbitrary, as the injected string will be
concatenated with the original "$file" path. Furthermore, the name
of the bundle will be "tmp_uri_XXXXXX", further restricting what an
adversary would be able to write.
Also note that is not possible for the URI to contain a newline
because we end up in `credential_from_url_1()` before we try to
issue any requests using that URI. As such, it is not possible to
inject arbitrary commands via the URI.
- By adding a newline to "$file" we can inject arbitrary commands.
This gives us full control over where a specific file will be
written to. Potential attack vectors would be to overwrite hooks,
but if an adversary were to guess where the user's home directory is
located they might also easily write e.g. a "~/.profile" file and
thus cause arbitrary code execution.
This injection can only become possible when the adversary has full
control over the target path where a bundle will be downloaded to.
While this feels unlikely, it is possible to control this path when
users perform a recursive clone with a ".gitmodules" file that is
controlled by the adversary.
Luckily though, the use of bundle URIs is not enabled by default in Git
clients (yet): they have to be enabled by setting the `bundle.heuristic`
config key explicitly. As such, the blast radius of this parameter
injection should overall be quite contained.
Fix the issue by rejecting spaces in the URI and newlines in both the
URI and the file. As explained, it shouldn't be required to also
restrict the use of newlines in the URI, as we would eventually die
anyway in `credential_from_url_1()`. But given that we're only one small
step away from arbitrary code execution, let's rather be safe and
restrict newlines in URIs, as well.
Eventually we should probably refactor the way that Git talks with the
git-remote-https(1) subprocess so that it is less fragile. Until then,
these two restrictions should plug the issue.
Reported-by: David Leadbeater <dgl@dgl.cx>
Based-on-patch-by: David Leadbeater <dgl@dgl.cx>
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
When reading the config, values that contain a trailing CRLF are
stripped. If the value itself has a trailing CR, the normal LF that
follows results in the CR being unintentionally stripped. This may lead
to unintended behavior due to the config value written being different
when it gets read.
One such issue involves a repository with a submodule path containing a
trailing CR. When the submodule gets initialized, the submodule is
cloned without being checked out and has "core.worktree" set to the
submodule path. The git-checkout(1) that gets spawned later reads the
"core.worktree" config value, but without the trailing CR, and
consequently attempts to checkout to a different path than intended.
If the repository contains a matching path that is a symlink, it is
possible for the submodule repository to be checked out in arbitrary
locations. This is extra bad when the symlink points to the submodule
hooks directory and the submodule repository contains an executable
"post-checkout" hook. Once the submodule repository checkout completes,
the "post-checkout" hook immediately executes.
To prevent mismatched config state due to misinterpreting a trailing CR,
wrap config values containing CR in double quotes when writing the
entry. This ensures a trailing CR is always separated for an LF and thus
prevented from getting stripped.
Note that this problem cannot be addressed by just quoting each CR with
"\r". The reading side of the config interprets only a few backslash
escapes, and "\r" is not among them. This fix is sufficient though
because it only affects the CR at the end of a line and any literal CR
in the interior is already preserved.
Co-authored-by: David Leadbeater <dgl@dgl.cx>
Signed-off-by: Justin Tobler <jltobler@gmail.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
This addresses CVE-2025-46835, Git GUI can create and overwrite a
user's files:
When a user clones an untrusted repository and is tricked into editing
a file located in a maliciously named directory in the repository, then
Git GUI can create and overwrite files for which the user has write
permission.
Signed-off-by: Johannes Sixt <j6t@kdbg.org>
The side branch merged in the previous commit introduces new 'exec'
calls. Convert these in the same way we did earlier for existing
'exec' calls.
Signed-off-by: Johannes Sixt <j6t@kdbg.org>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
This addresses CVE-2025-46334, Git GUI malicious command injection on
Windows.
A malicious repository can ship versions of sh.exe or typical textconv
filter programs such as astextplain. Due to the unfortunate design of
Tcl on Windows, the search path when looking for an executable always
includes the current directory. The mentioned programs are invoked when
the user selects "Git Bash" or "Browse Files" from the menu.
Signed-off-by: Johannes Sixt <j6t@kdbg.org>
This addresses CVE-2025-27613, Gitk can create and truncate a user's
files:
When a user clones an untrusted repository and runs gitk without
additional command arguments, files for which the user has write
permission can be created and truncated. The option "Support per-file
encoding" must have been enabled before in Gitk's Preferences. This
option is disabled by default.
The same happens when "Show origin of this line" is used in the main
window (regardless of whether "Support per-file encoding" is enabled or
not).
Signed-off-by: Johannes Sixt <j6t@kdbg.org>
This addresses CVE-2025-27614, Arbitrary command execution with Gitk:
A Git repository can be crafted in such a way that with some social
engineering a user who has cloned the repository can be tricked into
running any script (e.g., Bourne shell, Perl, Python, ...) supplied by
the attacker by invoking `gitk filename`, where `filename` has a
particular structure. The script is run with the privileges of the user.
Signed-off-by: Johannes Sixt <j6t@kdbg.org>
Tcl 'open' assigns special meaning to its argument when they begin with
redirection, pipe or background operator. There are many calls of the
'open' variant that runs a process which construct arguments that are
taken from the Git repository or are user input. However, when file
names or ref names are taken from the repository, it is possible to
find names that have these special forms. They must not be interpreted
by 'open' lest it redirects input or output, or attempts to build a
pipeline using a command name controlled by the repository.
Use the helper function make_arglist_safe, which identifies such
arguments and prepends "./" to force such a name to be regarded as a
relative file name.
After this change the following 'open' calls that start a process do not
apply the argument processing:
git-gui.sh:4095: || [catch {set spell_fd [open $spell_cmd r+]} spell_err]} {
lib/spellcheck.tcl:47: set pipe_fd [open [list | $s_prog -v] r]
lib/spellcheck.tcl:133: _connect $this [open $spell_cmd r+]
lib/spellcheck.tcl:405: set fd [open [list | aspell dump dicts] r]
In all cases, the command arguments are constant strings (or begin with
a constant string) that are of a form that would not be affected by the
processing anyway.
Signed-off-by: Johannes Sixt <j6t@kdbg.org>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Proc git invokes git and collects all output, which is it returns.
We are going to treat command arguments and redirections differently to
avoid passing arguments that look like redirections to the command
accidentally. A few invocations also pass redirection operators as
command arguments deliberately. Rewrite these cases to use a new
function git_redir that takes two lists, one for the regular command
arguments and one for the redirection operations.
Signed-off-by: Johannes Sixt <j6t@kdbg.org>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
We are going to treat command arguments and redirections differently to
avoid passing arguments that look like redirections to the command
accidentally. To do so, it will be necessary to know which arguments
are intentional redirections. Rewrite direct call sites of git_read
to pass intentional redirections as a second (optional) argument.
git_read defers to safe_open_command, but we cannot make it safe, yet,
because one of the callers of git_read is proc git, which does not yet
know which of its arguments are redirections. This is the topic of the
next commit.
Signed-off-by: Johannes Sixt <j6t@kdbg.org>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
We are going to treat command arguments and redirections differently to
avoid passing arguments that look like redirections to the command
accidentally. To do so, it will be necessary to know which arguments
are intentional redirections. Rewrite direct callers of
_open_stdout_stderr to pass intentional redirections as a second
(optional) argument.
Passing arbitrary arguments is not safe right now, but we rename it
to safe_open_command anyway to avoid having to touch the call sites
again later when we make it actually safe.
We cannot make the function safe right away because one caller is
git_read, which does not yet know which of its arguments are
redirections. This is the topic of the next commit.
Signed-off-by: Johannes Sixt <j6t@kdbg.org>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
We are going to treat command arguments and redirections differently to
avoid passing arguments that look like redirections to the command
accidentally. To do so, it will be necessary to know which arguments
are intentional redirections. As a preparation, convert git_read,
git_read_nice, and git_write to take just a single argument that is
the command in a list. Adjust all call sites accordingly.
In the future, this argument will be the regular command arguments and
a second argument will be the redirection operations.
Signed-off-by: Johannes Sixt <j6t@kdbg.org>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Since aae9560a355d (Work around Tcl's default `PATH` lookup,
2022-11-23), git-gui overrides exec and open on all platforms. But,
this was done in response to Tcl adding elements to $PATH on Windows,
while exec, open, and auto_execok honor $PATH as given on all other
platforms.
Let's do the override only on Windows, restoring others to using their
native exec and open. These honor the sanitized $PATH as that is written
out to env(PATH) in a previous commit. auto_execok is also safe on these
platforms, so can be used for _which.
Signed-off-by: Mark Levedahl <mlevedahl@gmail.com>
Signed-off-by: Johannes Sixt <j6t@kdbg.org>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
The previous commits bb5cb23daf75 (gitk: prevent overly long command
lines, 2023-01-24) rewrote a set of the 'open' calls substantially.
These were then later updated by 7dd272eca153 (gitk: escape file paths
before piping to git log, 2023-01-24) and d5d1b91e5327 (gitk: encode
arguments correctly with "open", 2025-03-07). In the preceding merge,
the conversions to a safe_open variant were undone to ensure that the
principal operation of the new 'open' calls is not modified by accident.
Since the 'open' calls now pass a redirection from a Tcl string as
stdin, convert the calls to 'safe_open_command_redirect'.
Signed-off-by: Johannes Sixt <j6t@kdbg.org>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
0730a5a3a5e6 ("git-gui - use git-hook, honor core.hooksPath", 2023-09-17)
rewrote githook_read to use `git hook` to run a hook script. The code
that was replaced discovered the hook script file manually and invoked
it using function _open_stdout_stderr. After the rewrite, this function
is still invoked, but it calls into `git` instead of the hook scripts.
Notice though, that we have function git_read that invokes git and
prepares a pipe for the caller to read from. Replace the implementation
of githook_read to be just a wrapper around git_read. This unifies the
way in which the git executable is invoked. git_read ultimately also
calls into _open_stdout_stderr, but it modifies the path to the git
executable before doing so.
Signed-off-by: Johannes Sixt <j6t@kdbg.org>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Since 8f23432b38d9 (windows: ignore empty `PATH` elements, 2022-11-23),
git-gui removes empty elements from $PATH, and a prior commit made this
remove all non-absolute elements from $PATH. But, this happens only on
Windows. Unsafe $PATH elements in $PATH are possible on all platforms.
Let's sanitize $PATH on all platforms to have consistent behavior. If a
user really wants the current repository on $PATH, they can add its
absolute name to $PATH.
Signed-off-by: Mark Levedahl <mlevedahl@gmail.com>
Signed-off-by: Johannes Sixt <j6t@kdbg.org>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
There are two callers of git_read that request special treatment using
option --nice. Rewrite them to call a new function git_read_nice that
does the special treatment. Now we can remove all option treatment from
git_read.
git_write has the same capability, but there are no callers that
request --nice. Remove the feature without substitution.
This is a preparation for a later change where we want to make git_read
and friends non-variadic. Then it cannot have optional arguments.
Signed-off-by: Johannes Sixt <j6t@kdbg.org>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Since 8f23432b38d9 (windows: ignore empty `PATH` elements, 2022-11-23),
git-gui excises all empty paths from $PATH, but still allows '.' or
other relative paths, which can also allow executing code from the
repository. Let's remove anything except absolute elements. While here,
let's remove duplicated elements, which are very common on Windows:
only the first such item can do anything except waste time repeating a
search.
Signed-off-by: Mark Levedahl <mlevedahl@gmail.com>
Signed-off-by: Johannes Sixt <j6t@kdbg.org>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Some callers of git_read want to redirect stderr of the invoked command
to stdout. The function offers option --stderr for this purpose.
However, the option only appends 2>@1 to the commands. The callers can
do that themselves. In lib/console.tcl we even have a caller that
already knew implictly what --stderr does behind the scenes.
This is a preparation for a later change where we want to make git_read
non-variadic. Then it cannot have optional leading arguments.
Signed-off-by: Johannes Sixt <j6t@kdbg.org>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
git-gui on Git for Windows creates a menu item to start a git-bash
session for the current repository. This menu-item works as desired when
git-gui is installed in the Git for Windows (g4w) distribution, but
not when run from a different location such as normally done in
development. The reason is that git-bash's location is known to be
'/git-bash' in the Unix pathname space known to MSYS, but this is not
known in the Windows pathname space. Instead, git-gui derives a pathname
for git-bash assuming it is at a known relative location.
If git-gui is run from a different directory than assumed in g4w, the
relative location changes, and git-gui resorts to running a generic bash
login session in a Windows console.
But, the MSYS system underlying Git for Windows includes the 'cygpath'
utility to convert between Unix and Windows pathnames. Let's use this so
git-bash's Windows pathname is determined directly from /git-bash.
Signed-off-by: Mark Levedahl <mlevedahl@gmail.com>
Signed-off-by: Johannes Sixt <j6t@kdbg.org>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
As in the previous commits, introduce a function that sanitizes
arguments intended for the process, but runs the process in the
background. Convert 'exec' calls to use this new function.
Signed-off-by: Johannes Sixt <j6t@kdbg.org>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
git-gui on Windows uses auto_execok to locate git-gui.exe,
which performs the same flawed search as does the builtin exec.
Use _which instead, performing a safe PATH lookup.
Signed-off-by: Mark Levedahl <mlevedahl@gmail.com>
Signed-off-by: Johannes Sixt <j6t@kdbg.org>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Tcl 'exec' assigns special meaning to its argument when they begin with
redirection, pipe or background operator. There are a number of
invocations of 'exec' which construct arguments that are taken from the
Git repository or a user input. However, when file names or ref names
are taken from the repository, it is possible to find names that have
these special forms. They must not be interpreted by 'exec' lest it
redirects input or output, or attempts to build a pipeline using a
command name controlled by the repository.
Introduce a helper function that identifies such arguments and prepends
"./" to force such a name to be regarded as a relative file name.
Convert those 'exec' calls where the arguments can simply be packed
into a list.
Note that most commands containing the word 'exec' route through
console::exec or console::chain, which we will treat in another commit.
Signed-off-by: Johannes Sixt <j6t@kdbg.org>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
On Windows, git-gui offers to open a git-bash session for the current
repository from the menu, but uses [auto_execok start] to get the
command to actually run that shell.
The code for auto_execok, in /usr/share/tcl8.6/tcl.init, has 'start' in
the 'shellBuiltins' list for cmd.exe on Windows: as a result,
auto_execok does not actually search for start, meaning this usage is
technically ok with auto_execok now. However, leaving this use of
auto_execok in place will just induce confusion about why a known unsafe
function is being used on Windows. Instead, let's switch to using our
known safe _which function that looks only in $PATH, excluding the
current working directory.
Signed-off-by: Mark Levedahl <mlevedahl@gmail.com>
Signed-off-by: Johannes Sixt <j6t@kdbg.org>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
The Tcl 'open' function has a very wide interface. It can open files as
well as pipes to external processes. The difference is made only by the
first character of the file name: if it is "|", a process is spawned.
We have a number of calls of Tcl 'open' that take a file name from the
environment in which Git GUI is running. Be prepared that insane values
are injected. In particular, when we intend to open a file, do not take
a file name that happens to begin with "|" as a request to run a process.
Signed-off-by: Johannes Sixt <j6t@kdbg.org>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Commit 7d076d56757c (git-gui: handle shell script text filters when
loading for blame, 2011-12-09) added is_shellscript to test if a file
is executable by the shell, used only when searching for textconv
filters. The previous commit rearranged the tests for finding such
filters, and removed the only user of is_shellscript. Remove this
function.
Signed-off-by: Mark Levedahl <mlevedahl@gmail.com>
Signed-off-by: Johannes Sixt <j6t@kdbg.org>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
git-gui uses `git config --null --list` to parse configuration. Git
versions prior to 1.5.3 do not have --null and need different treatment.
Nobody should be using such an old version anymore. (Moreover, since
0730a5a3a, git-gui requires git v2.36 or later). Keep only the code for
modern Git.
Signed-off-by: Johannes Sixt <j6t@kdbg.org>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Commit 7d076d56757c (git-gui: handle shell script text filters when
loading for blame, 2011-12-09) added open_cmd_pipe to run text
conversion in support of blame, with special handling for shell
scripts on Windows. To determine whether the command is a shell
script, 'lindex' is used to pick off the first token from the command.
However, cmd is actually a command string taken from .gitconfig
literally and is not necessarily a syntactically correct Tcl list.
Hence, it cannot be processed by 'lindex' and 'lrange' reliably.
Pass the command string to the shell just like on non-Windows
platforms to avoid the potentially incorrect treatment.
A use of 'auto_execok' is removed by this change. This function is
dangerous on Windows, because it searches programs in the current
directory. Delegating the path lookup to the shell is safe, because
/bin/sh and /bin/bash follow POSIX on all platforms, including the
Git for Windows port.
A possible regression is that the old code, given filter command of
'foo', could find 'foo.bat' as a script, and not just bare 'foo', or
'foo.exe'. This rewrite requires explicitly giving the suffix if it is
not .exe.
This part of Git GUI can be exercised using
git gui blame -- some.file
while some.file has a textconv filter configured and has unstaged
modifications.
Helped-by: Mark Levedahl <mlevedahl@gmail.com>
Signed-off-by: Johannes Sixt <j6t@kdbg.org>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
git-gui provides an implementation to detach HEAD on Git versions prior
to 1.5.3. Nobody should be using such an old version anymore.
(Moreover, since 0730a5a3a, git-gui requires git v2.36 or later).
Keep only the code for modern Git.
Signed-off-by: Mark Levedahl <mlevedahl@gmail.com>
[j6t: message tweaked]
Signed-off-by: Johannes Sixt <j6t@kdbg.org>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
git-gui has a few places where a bare "sh" is passed to exec, meaning
that the first instance of "sh" on $PATH will be used rather than the
shell configured. This violates expectations that the configured shell
is being used. Let's use [shellpath] everywhere.
Signed-off-by: Mark Levedahl <mlevedahl@gmail.com>
Signed-off-by: Johannes Sixt <j6t@kdbg.org>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Since b792230 ("git-gui: Show a progress meter for checking out files",
2007-07-08), git-gui includes a workaround for Tcl that does not support
using 2>@1 to redirect stderr to stdout. Tcl added such support in
8.4.7, released in 2004, and this is fully supported in all 8.5
releases.
As git-gui has a hard-coded requirement for Tcl >= 8.5, the workaround
is no longer needed. Delete it.
Signed-off-by: Mark Levedahl <mlevedahl@gmail.com>
Signed-off-by: Johannes Sixt <j6t@kdbg.org>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Since commit d5257fb3c1de (git-gui: handle textconv filter on
Windows and in development, 2010-08-07), git-gui will search for a
usable shell if _shellpath is not configured, and on Windows may
resort to using auto_execok to find 'sh'. While this was intended for
development use, checks are insufficient to assure a proper
configuration when deployed where _shellpath is always set, but might
not give a usable shell.
Let's make this more robust by only searching if _shellpath was not
defined, and then using only our restricted search functions.
Furthermore, we should convert to a Windows path on Windows. Always
check for a valid shell on startup, meaning an absolute path to an
executable, aborting if these conditions are not met.
Signed-off-by: Mark Levedahl <mlevedahl@gmail.com>
Signed-off-by: Johannes Sixt <j6t@kdbg.org>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Commit 7d076d56757c (git-gui: handle shell script text filters when
loading for blame, 2011-12-09) added open_cmd_pipe, with special
handling for Windows detected by seeing that _shellpath does not
point to an executable shell. That is bad practice, and is broken by
the next commit that assures _shellpath is valid on all platforms.
Fix this by using [is_Windows] as done for all Windows specific code.
Signed-off-by: Mark Levedahl <mlevedahl@gmail.com>
Signed-off-by: Johannes Sixt <j6t@kdbg.org>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
The _which function finds executables on $PATH, and adds .exe on Windows
unless -script was given. However, win32.tcl executes "wscript.exe"
and "cscript.exe", both of which fail as _which adds .exe to both. This
is already fixed in git-gui released by Git for Windows. Do so here.
Signed-off-by: Mark Levedahl <mlevedahl@gmail.com>
Signed-off-by: Johannes Sixt <j6t@kdbg.org>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Branch js/fix-open-exec-2.40.0 converts `open` and `exec` calls to call
wrappers that sanitze the command arguments. This side branch updates
three `open` calls that are in conflict with the fix in the preceding
commit. To keep the intended operation of the 'open' calls, this merge
does not try to merge and resolve the conflicts, but ignores the
conversions that are brought in by the side branch, taking "ours" side
of the code in these three cases.
New fixes are the topic of the next commit.
Signed-off-by: Johannes Sixt <j6t@kdbg.org>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
While "exec" uses a normal arguments list which is applied as
command + arguments (and redirections, etc), "open" uses a single
argument which is this command+arguments, where the command and
arguments are a list inside this one argument to "open".
Commit bb5cb23 (gitk: prevent overly long command lines 2023-05-08)
changed several values from individual arguments in that list (hashes
and file names), to a single value which is fed to git via redirection
to its stdin using "open" [1].
However, it didn't ensure correctly that this aggregate value in this
string is interpreted as a single element in this command+args list.
It did just enough so that newlines (which is how these elements are
concatenated) don't split this single list element.
A followup commit at the same patchset: 7dd272e (gitk: escape file
paths before piping to git log 2023-05-08) added a bit more, by
escaping backslahes and spaces at the file names, so that at least
it doesn't break when such file names get used there.
But these are not enough. At the very least tab is missing, and more,
and trying to manually escape every possible thing which can affect
how this string is interpreted in a list is a sub-par approach.
The solution is simply to tell tcl "this is a single list element".
which we can do by aggregating this value completely normally (hashes
and files separated by newlines), and then do [list $value].
So this is what this commit does, for all 3 places where bb5cb23
changed individual elements into an aggregate value.
[1]
That was not a fully accurate description. The accurate version
is that this string originally included two lists: hashes and files.
When used with "open" these lists correctly become the individual
elements of these lists, even if they contain spaces etc, so the
arguments which were used at this "git" commands were correct.
Commit bb5cb23 couldn't use these two lists as-is, because it needed
to process the individual elements in them (one element per line of
the aggregate value), and the issue is that ensuring this aggregate
is indeed interpreted as a single list element was sub-par.
Note: all the (double) quotes before/after the modification are not
required and with zero effect, even for \n. But this commit preserves
the original quoting form intentionally. It can be cleaned up later.
Signed-off-by: Avi Halachmi (:avih) <avihpit@yahoo.com>
Signed-off-by: Johannes Sixt <j6t@kdbg.org>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
As in the earlier commits, introduce a function that constructs a
pipeline of commands after sanitizing the arguments.
Signed-off-by: Johannes Sixt <j6t@kdbg.org>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
The command line to invoke 'git blame' for a single line is constructed
using several if-conditionals, each with the same condition
{$from_index new {}}. Merge all of them into a single conditional.
This requires to duplicate significant parts of the command, but it
helps the next change, where we will have to deal with a nested list
structure.
Signed-off-by: Johannes Sixt <j6t@kdbg.org>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
As in the previous commits, introduce a function that sanitizes
arguments and also keeps the returned file handle writable to pass
data to stdin.
Signed-off-by: Johannes Sixt <j6t@kdbg.org>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
As in the previous commits, introduce a function that sanitizes
arguments intended for the process and in addition allows to pass
redirections, which are passed to Tcl's 'open' verbatim.
Signed-off-by: Johannes Sixt <j6t@kdbg.org>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Tcl 'open' treats the second argument as a command when it begins
with |. The remainder of the argument is a list comprising the command
and its arguments. It assigns special meaning to these arguments when
they begin with a redirection, pipe or background operator. There are a
number of invocations of 'open' which construct arguments that are
taken from the Git repository or a user input. However, when file names
or ref names are taken from the repository, it is possible to find
names which have these special forms. They must not be interpreted by
'open' lest it redirects input or output, or attempts to build a
pipeline using a command name controlled by the repository.
Introduce a helper function that identifies such arguments and prepends
"./" to force such a name to be regarded as a relative file name.
Signed-off-by: Johannes Sixt <j6t@kdbg.org>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Convert one 'exec' call that sends output to a process (pipeline).
Fortunately, the command does not contain any variables. For this
reason, just treat it as a "redirection".
Signed-off-by: Johannes Sixt <j6t@kdbg.org>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Convert 'exec' calls that both redirect output to a file and run the
process in the background. 'safe_exec_redirect' can take both these
"redirections" in the second argument simultaneously.
Signed-off-by: Johannes Sixt <j6t@kdbg.org>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
As in the previous commits, introduce a function that sanitizes
arguments intended for the process and in addition allows to pass
redirections verbatim, which are interpreted by Tcl's 'exec'.
Redirections can include the background operator '&'.
Signed-off-by: Johannes Sixt <j6t@kdbg.org>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Convert calls of 'exec' where the arguments are already available in
a list and 'eval' is used to unpack the list. Use 'concat' to unite
the arguments into a single list before passing them to 'safe_exec'.
Signed-off-by: Johannes Sixt <j6t@kdbg.org>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Tcl 'exec' assigns special meaning to its argument when they begin with
redirection, pipe or background operator. There are a number of
invocations of 'exec' which construct arguments that are taken from the
Git repository or a user input. However, when file names or ref names
are taken from the repository, it is possible to find names with have
these special forms. They must not be interpreted by 'exec' lest it
redirects input or output, or attempts to build a pipeline using a
command name controlled by the repository.
Introduce a helper function that identifies such arguments and prepends
"./" to force such a name to be regarded as a relative file name.
Convert those 'exec' calls where the arguments can simply be packed
into a list.
Signed-off-by: Johannes Sixt <j6t@kdbg.org>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Function 'diffcmd' derives which of git diff-files, git diff-index, or
git diff-tree must be invoked depending on the ids provided. It puts
the pipe symbol as the first element of the returned command list.
Note though that of the four callers only two use the command with
Tcl 'open' and need the pipe symbol. The other two callers pass the
command to Tcl 'exec' and must remove the pipe symbol.
Do not include the pipe symbol in the constructed command list, but let
the call sites decide whether to add it or not. Note that Tcl 'open'
inspects only the first character of the command list, which is also
the first character of the first element in the list. For this reason,
it is valid to just tack on the pipe symbol with |$cmd and it is not
necessary to use [concat | $cmd].
Signed-off-by: Johannes Sixt <j6t@kdbg.org>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
The Tcl 'open' function has a vary wide interface. It can open files as
well as pipes to external processes. The difference is made only by the
first character of the file name: if it is "|", an process is spawned.
We have a number of calls of Tcl 'open' that take a file name from the
environment in which Gitk is running. Be prepared that insane values are
injected. In particular, when we intend to open a file, do not mistake
a file name that happens to begin with "|" as a request to run a process.
Signed-off-by: Johannes Sixt <j6t@kdbg.org>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Clarify what happens when an object exists in more than one pack, but
not in the preferred pack. "git multi-pack-index repack" relies on ties
for objects that are not in the preferred pack being resolved in favor
of the newest pack that contains a copy of the object. If ties were
resolved in favor of the oldest pack as the current documentation
suggests the multi-pack index would not reference any of the objects in
the pack created by "git multi-pack-index repack".
Helped-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
nth_midxed_pack_int_id() returns the index of the pack file in the multi
pack index's list of packfiles that the specified object. The index is
returned as a uint32_t. Storing this in an int will make the index
negative if the most significant bit is set. Fix this by using uint32_t
as the rest of the code does. This is unlikely to be a practical problem
as it requires the multipack index to reference 2^31 packfiles.
Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
On a 64 bit system the calculation
p->pack_size * pack_info[i].referenced_objects
could overflow. If a pack file contains 2^28 objects with an average
compressed size of 1KB then the pack size will be 2^38B. If all of the
objects are referenced by the multi-pack index the sum above will
overflow. Avoid this by using shifted integer arithmetic and changing
the order of the calculation so that the pack size is divided by the
total number of objects in the pack before multiplying by the number of
objects referenced by the multi-pack index. Using a shift of 14 bits
should give reasonable accuracy while avoiding overflow for pack sizes
less that 1PB.
Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
On a 32 bit system "git multi-pack-index --repack --batch-size=120M"
failed with
fatal: size_t overflow: 6038786 * 1289
The calculation to estimated size of the objects in the pack referenced
by the multi-pack-index uses st_mult() to multiply the pack size by the
number of referenced objects before dividing by the total number of
objects in the pack. As size_t is 32 bits on 32 bit systems this
calculation easily overflows. Fix this by using 64bit arithmetic instead.
Also fix a potential overflow when caluculating the total size of the
objects referenced by the multipack index with a batch size larger
than SIZE_MAX / 2. In that case
total_size += estimated_size
can overflow as both total_size and estimated_size can be greater that
SIZE_MAX / 2. This is addressed by using saturating arithmetic for the
addition. Although estimated_size is of type uint64_t by the time we
reach this sum it is bounded by the batch size which is of type size_t
and so casting estimated_size to size_t does not truncate the value.
Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The --no-index option of git-diff enables using the diff machinery from
git while operating outside of a repository. This mode of git diff is
able to compare directories and produce a diff of their contents.
When operating git diff in a repository, git has the notion of
"pathspecs" which can specify which files to compare. In particular,
when using git to diff two trees, you might invoke:
$ git diff-tree -r <treeish1> <treeish2>.
where the treeish could point to a subdirectory of the repository.
When invoked this way, users can limit the selected paths of the tree by
using a pathspec. Either by providing some list of paths to accept, or
by removing paths via a negative refspec.
The git diff --no-index mode does not support pathspecs, and cannot
limit the diff output in this way. Other diff programs such as GNU
difftools have options for excluding paths based on a pattern match.
However, using git diff as a diff replacement has several advantages
over many popular diff tools, including coloring moved lines, rename
detections, and similar.
Teach git diff --no-index how to handle pathspecs to limit the
comparisons. This will only be supported if both provided paths are
directories.
For comparisons where one path isn't a directory, the --no-index mode
already has some DWIM shortcuts implemented in the fixup_paths()
function.
Modify the fixup_paths function to return 1 if both paths are
directories. If this is the case, interpret any extra arguments to git
diff as pathspecs via parse_pathspec.
Use parse_pathspec to load the remaining arguments (if any) to git diff
--no-index as pathspec items. Disable PATHSPEC_ATTR support since we do
not have a repository to do attribute lookup. Disable PATHSPEC_FROMTOP
since we do not have a repository root. All pathspecs are treated as
rooted at the provided comparison paths.
After loading the pathspec data, calculate skip offsets for skipping
past the root portion of the paths. This is required to ensure that
pathspecs start matching from the provided path, rather than matching
from the absolute path. We could instead pass the paths as prefix values
to parse_pathspec. This is slightly problematic because the paths come
from the command line and don't necessarily have the proper trailing
slash. Additionally, that would require parsing pathspecs multiple
times.
Pass the pathspec object and the skip offsets into queue_diff, which
in-turn must pass them along to read_directory_contents.
Modify read_directory_contents to check against the pathspecs when
scanning the directory. Use the skip offset to skip past the initial
root of the path, and only match against portions that are below the
intended directory structure being compared.
The search algorithm for finding paths is recursive with read_dir. To
make pathspec matching work properly, we must set both
DO_MATCH_DIRECTORY and DO_MATCH_LEADING_PATHSPEC.
Without DO_MATCH_DIRECTORY, paths like "a/b/c/d" will not match against
pathspecs like "a/b/c". This is usually achieved by setting the is_dir
parameter of match_pathspec.
Without DO_MATCH_LEADING_PATHSPEC, paths like "a/b/c" would not match
against pathspecs like "a/b/c/d". This is crucial because we recursively
iterate down the directories. We could simply avoid checking pathspecs
at subdirectories, but this would force recursion down directories
which would simply be skipped.
If we always passed DO_MATCH_LEADING_PATHSPEC, then we will
incorrectly match in certain cases such as matching 'a/c' against
':(glob)**/d'. The match logic will see that a matches the leading part
of the **/ and accept this even tho c doesn't match.
To avoid this, use the match_leading_pathspec() variant recently
introduced. This sets both flags when is_dir is set, but leaves them
both cleared when is_dir is 0.
Add test cases and documentation covering the new functionality. Note
for the documentation I opted not to move the placement of '--' which is
sometimes used to disambiguate arguments. The diff --no-index mode
requires exactly 2 arguments determining what to compare. Any additional
arguments are interpreted as pathspecs and must come afterwards. Use of
'--' would not actually disambiguate anything, since there will never be
ambiguity over which arguments represent paths or pathspecs.
Signed-off-by: Jacob Keller <jacob.keller@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
A following change will add support for pathspecs to the git diff
--no-index command. This mode of git diff does not load any repository.
Add a new PATHSPEC_NO_REPOSITORY flag indicating that we're parsing
pathspecs without a repository.
Both PATHSPEC_ATTR and PATHSPEC_FROMTOP require a repository to
function. Thus, verify that both of these are set in magic_mask to
ensure they won't be accepted when PATHSPEC_NO_REPOSITORY is set.
Check PATHSPEC_NO_REPOSITORY when warning about paths outside the
directory tree. When the flag is set, do not look for a git repository
when generating the warning message.
Finally, add a BUG in match_pathspec_item if the istate is NULL but the
pathspec has PATHSPEC_ATTR set. Callers which support PATHSPEC_ATTR
should always pass a valid istate, and callers which don't pass a valid
istate should have set PATHSPEC_ATTR in the magic_mask field to disable
support for attribute-based pathspecs.
Signed-off-by: Jacob Keller <jacob.keller@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The do_match_pathspec() function has the DO_MATCH_LEADING_PATHSPEC
option to allow pathspecs to match when matching "src" against a
pathspec like "src/path/...". This support is not exposed by
match_pathspec, and the internal flags to do_match_pathspec are not
exposed outside of dir.c
The upcoming support for pathspecs in git diff --no-index need the
LEADING matching behavior when iterating down through a directory with
readdir.
We could try to expose the match_pathspec_with_flags to the public API.
However, DO_MATCH_EXCLUDES really shouldn't be public, and its a bit
weird to only have a few of the flags become public.
Instead, add match_leading_pathspec() as a function which sets both
DO_MATCH_DIRECTORY and DO_MATCH_LEADING_PATHSPEC when is_dir is true.
This will be used in a following change to support pathspec matching in
git diff --no-index.
Signed-off-by: Jacob Keller <jacob.keller@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* 'top-panel-search-highlight' of github.com:bnfour/gitk:
gitk: do not hard-code color of search results in commit list
Signed-off-by: Johannes Sixt <j6t@kdbg.org>
Ensure that logic added in 5f11669586 (name-hash: don't add directories
to name_hash, 2021-04-12) also applies in multithreaded hashtable init
path.
As per the original single-threaded change above: sparse directory entries
represent a directory that is outside the sparse-checkout definition.
These are not paths to blobs, so should not be added to the name_hash
table. Instead, they should be added to the directory hashtable when
'ignore_case' is true.
Add a condition to avoid placing sparse directories into the name_hash
hashtable. This avoids filling the table with extra entries that will
never be queried.
Signed-off-by: Alex Mironov <alexandrfox@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
As of 1fc7ddf35b (test-lib: unconditionally enable leak checking,
2024-11-20), both the `GIT_TEST_PASSING_SANITIZE_LEAK` and
`TEST_PASSES_SANITIZE_LEAK` variables no longer have any meaning, the
leak checks are enabled by default. However, some newly added tests
include them by mistake. Let's clean this up.
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Acked-by: Justin Tobler <jltobler@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
During git-receive-pack(1), connectivity of the object graph is
validated to ensure that the received packfile does not leave the
repository in a broken state. This is done via git-rev-list(1) and
walking the objects, which can be expensive for large repositories.
Generally, this check is critical to avoid an incomplete received
packfile from corrupting a repository. Server operators may have
additional knowledge though around exactly how Git is being used on the
server-side which can be used to facilitate more efficient connectivity
computation of incoming objects.
For example, if it can be ensured that all objects in a repository are
connected and do not depend on any missing objects, the connectivity of
newly written objects can be checked by walking the object graph
containing only the new objects from the updated tips and identifying
the missing objects which represent the boundary between the new objects
and the repository. These boundary objects can be checked in the
canonical repository to ensure the new objects connect as expected and
thus avoid walking the rest of the object graph.
Git itself cannot make the guarantees required for such an optimization
as it is possible for a repository to contain an unreachable object that
references a missing object without the repository being considered
corrupt.
Introduce the --skip-connectivity-check option for git-receive-pack(1)
which bypasses this connectivity check to give more control to the
server-side. Note that without proper server-side validation of newly
received objects handled outside of Git, usage of this option risks
corrupting a repository.
Signed-off-by: Justin Tobler <jltobler@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
As part of git-recieve-pack(1), the connectivity of objects is checked.
Add a test validating that git-receive-pack(1) fails due to an incoming
packfile that would leave the repository with missing objects. Instead
of creating a new test file, "t5410" is generalized for receive-pack
testing.
Signed-off-by: Justin Tobler <jltobler@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Performance regression in not-yet-released code has been corrected.
* ps/reftable-read-block-perffix:
reftable: fix perf regression when reading blocks of unwanted type
Code cleanup.
* jk/oidmap-cleanup:
raw_object_store: drop extra pointer to replace_map
oidmap: add size function
oidmap: rename oidmap_free() to oidmap_clear()
The `send-email` documentation has been updated with OAuth2.0
related examples.
* ag/doc-send-email:
docs: add credential helper for outlook and gmail in OAuth list of helpers
docs: improve send-email documentation
send-mail: improve checks for valid_fqdn
Bundle-URI feature did not use refs recorded in the bundle other
than normal branches as anchoring points to optimize the follow-up
fetch during "git clone"; now it is told to utilize all.
* sc/bundle-uri-use-all-refs-in-bundle:
bundle-uri: add test for bundle-uri clones with tags
bundle-uri: copy all bundle references ino the refs/bundle space
Commit f5e3c6c57d ("meson: do a full usage-based compile check for
sysinfo", 2025-04-25) updated the 'sysinfo()' check, as part of the
meson build, due to the failure of the check on Solaris. Prior to
that commit, the meson build only checked the availability of the
'<sys/sysinfo.h>' header file. On Solaris, both the header and the
'sysinfo()' function exist, but are completely unrelated to the same
function on Linux (and cygwin).
Commit 50dec7c566 ("config.mak.uname: add sysinfo() configuration for
cygwin", 2025-04-17) added a similar 'sysinfo()' check to the autoconf
build. This check looked for the 'sysinfo()' function itself, rather
than just the header, but it will fail (incorrectly set HAVE_SYSINFO)
for the same reason.
In order to correctly identify the 'sysinfo()' function we require as
part of 'git-gc' (used in the 'total_ram() function), we also upgrade
to a compilation check, in a similar way to the meson commit. Note that
since commit c9a51775a3 ("builtin/gc.c: correct RAM calculation when
using sysinfo", 2025-04-17) both the 'totalram' and 'mem_unit' fields
of the 'struct sysinfo' are used, so the new check includes both of
those fields in the compile check.
Signed-off-by: Ramsay Jones <ramsay@ramsayjones.plus.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
For the non-'runtime prefix' case, the meson build sets the GIT_EXEC_PATH
build variable to an absolute path equivalent to <prefix>/libexec/git-core.
In comparison, the default make build sets it to a relative path equivalent
to 'libexec/git-core'. Indeed, the make build requires the use of some
means outside of the Makefile (eg. config.mak[.*] or the command-line)
to set GIT_EXEC_PATH to anything other than 'libexec/git-core'.
For example, the make invocation:
$ make gitexecdir=/some/other/bin all install
will build git with GIT_EXEC_PATH set to '/some/other/bin' and install
the 'library' executables to that location. However, without setting the
'gitexecdir' make variable, irrespective of the 'runtime prefix' setting,
the GIT_EXEC_PATH is always set to 'libexec/git-core'.
The meson built-in 'libexecdir' option can be used to provide a similar
configurability. The default value for the option is 'libexec'. Attempting
to set the option to '' on the command-line, will reset it to the '.'
string, presumably to ensure a relative path value.
This commit allows the meson build, similar to the above, to configure the
project like:
$ meson setup --buildtype=debugoptimized -Dprefix=$HOME -Dpcre2=disabled \
-Dlibexecdir=/some/other/bin build
so that the GIT_EXEC_PATH is set to '/some/other/bin'. Absent the
-Dlibexecdir argument, the GIT_EXEC_PATH is set to 'libexec/git-core'.
In order to correct the value of GIT_EXEC_PATH, default the value to the
static string value 'libexec/git-core', and only override if the value
of the 'libexecdir' option has a value different to 'libexec' or '.'.
Also, like the Makefile, add a check for an absolute path when the
runtime prefix option is true (and if so, error out).
Signed-off-by: Ramsay Jones <ramsay@ramsayjones.plus.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The path to the system-wide config and attributes files are not being
set correctly in the meson build. Unless explicitly overridden on the
command line during setup, the 'gitconfig' and 'gitattributes' options
are defaulting to absolute paths in the '/etc' system directory. This
is only appropriate if the <prefix> is set specifically to '/usr'.
The directory in which these files are placed is generally referred to
as the 'system configuration directory' or 'sysconfdir' for short. When
the prefix is '/usr' then the sysconfdir is usually set to '/etc', but
any other value for prefix results in the relative directory value 'etc'
instead. (eg if prefix is '/usr/local', then the 'etc' relative value
results in a system configuration directory of '/usr/local/etc'). When
setting the 'sysconfdir' builtin option value, the meson system uses
exactly this algorithm, so we can use get_option('sysconfdir') directly
when setting the (non-overridden) build variables.
In order to allow for overriding from the command line, remove the
default values specified for the 'gitconfig' and 'gitattributes' options
in the 'meson_options.txt' file. This allows the user to specify any
pathname for those options, while being able to test for the unset
(empty) value. An absolute pathname will be used unchanged and a relative
pathname will be appended to '<prefix>/'. These values are then used to
set the 'ETC_GITCONFIG' and 'ETC_GITATTRIBUTES' build variables which are,
in turn, passed to the compiler as '-D' arguments.
When the 'gitconfig' or 'gitattributes' options are not used, then use
the built-in 'sysconfdir' and set the ETC_GITCONFIG build variable to
the string "<sysconfdir>/gitconfig". Similarly, set ETC_ATTRIBUTES to
"<sysconfdir>/gitattributes".
Signed-off-by: Ramsay Jones <ramsay@ramsayjones.plus.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When executing an 'meson install' the YAML.pm file is incorrectly
placed in the <prefix>/share/perl5/Git/SVN directory. The YAML.pm
file should be placed in a 'Memoize' subdirectory instead. In order
to correct the location, update the 'install_dir' of the relevant
target in the 'perl/Git/SVN/Memoize/meson.build' file.
Signed-off-by: Ramsay Jones <ramsay@ramsayjones.plus.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The build configuration options with (non-empty) values, for example
filesystem paths potentially containing spaces, have been set using
the '.set_quoted()' method. However, the GITWEBDIR value has been
set using the '.set()' method instead. In order to correctly quote
the GITWEBDIR value, replace the '.set()' method with '.set_quoted()'.
Signed-off-by: Ramsay Jones <ramsay@ramsayjones.plus.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Since 13cb20fc46 ("meson: fix compilation with Visual Studio",
2025-01-22) it has not been possible to list build options via `meson
configure`. This is due to Meson's static analysis of build options
failing to handle constant folding, and thinking we set a totally
invalid default `-std=`.
This is reported upstream but we anyways need to work with existing
versions. It turns out there is a simple solution: turn the entire
default option into a conditional branch, which means Meson sees either
nothing, or everything.
As a result, Git users can once again see pretty-printed options before
building.
Reported-by: Ramsay Jones <ramsay@ramsayjones.plus.com>
Bug: https://github.com/mesonbuild/meson/issues/14623
Signed-off-by: Eli Schwartz <eschwartz@gentoo.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The reference updates performed as a part of 'git-receive-pack(1)', take
place one at a time. For each reference update, a new transaction is
created and committed. This is necessary to ensure we can allow
individual updates to fail without failing the entire command. The
command also supports an 'atomic' mode, which uses a single transaction
to update all of the references. But this mode has an all-or-nothing
approach, where if a single update fails, all updates would fail.
In 23fc8e4f61 (refs: implement batch reference update support,
2025-04-08), we introduced a new mechanism to batch reference updates.
Under the hood, this uses a single transaction to perform a batch of
reference updates, while allowing only individual updates to fail.
Utilize this newly introduced batch update mechanism in
'git-receive-pack(1)'. This provides a significant bump in performance,
especially when dealing with repositories with large number of
references.
With the reftable backend there is a 18x performance improvement, when
performing receive-pack with 10000 refs:
Benchmark 1: receive: many refs (refformat = reftable, refcount = 10000, revision = master)
Time (mean ± σ): 4.276 s ± 0.078 s [User: 0.796 s, System: 3.318 s]
Range (min … max): 4.185 s … 4.430 s 10 runs
Benchmark 2: receive: many refs (refformat = reftable, refcount = 10000, revision = HEAD)
Time (mean ± σ): 235.4 ms ± 6.9 ms [User: 75.4 ms, System: 157.3 ms]
Range (min … max): 228.5 ms … 254.2 ms 11 runs
Summary
receive: many refs (refformat = reftable, refcount = 10000, revision = HEAD) ran
18.16 ± 0.63 times faster than receive: many refs (refformat = reftable, refcount = 10000, revision = master)
In similar conditions, the files backend sees a 1.21x performance
improvement:
Benchmark 1: receive: many refs (refformat = files, refcount = 10000, revision = master)
Time (mean ± σ): 1.121 s ± 0.021 s [User: 0.128 s, System: 0.975 s]
Range (min … max): 1.097 s … 1.156 s 10 runs
Benchmark 2: receive: many refs (refformat = files, refcount = 10000, revision = HEAD)
Time (mean ± σ): 927.9 ms ± 22.6 ms [User: 99.0 ms, System: 815.2 ms]
Range (min … max): 903.1 ms … 978.0 ms 10 runs
Summary
receive: many refs (refformat = files, refcount = 10000, revision = HEAD) ran
1.21 ± 0.04 times faster than receive: many refs (refformat = files, refcount = 10000, revision = master)
As using batched updates requires the error handling to be moved to the
end of the flow, create and use a 'struct strset' to track the failed
refs and attribute the correct errors to them.
This change also uncovers an issue when a client provides multiple
updates to the same reference. For example:
$ git send-pack remote.git A:foo B:foo
Enumerating objects: 3, done.
Counting objects: 100% (3/3), done.
Delta compression using up to 20 threads
Compressing objects: 100% (2/2), done.
Writing objects: 100% (3/3), 226 bytes | 226.00 KiB/s, done.
Total 3 (delta 1), reused 0 (delta 0), pack-reused 0 (from 0)
remote: error: cannot lock ref 'refs/heads/foo': reference already exists
To remote.git
! [remote rejected] A -> foo (failed to update ref)
! [remote failure] B -> foo (remote failed to report status)
As you can see, the remote runs into an error because it cannot lock the
target reference for the second update. Furthermore, the remote complains
that the first update has been rejected whereas the second update didn't
receive any status update because we failed to lock it. Reading this status
message alone a user would probably expect that `foo` has not been updated
at all. But that's not the case: while we claim that the ref wasn't updated,
it surprisingly points to `A` now.
One could argue that this is merely an error in how we report the result of
this push. But ultimately, the user's request itself is already broken and
doesn't make any sense in the first place and cannot ever lead to a sensible
outcome that honors the full request.
The conversion to batched transactions fixes the issue because we now try to
queue both updates in the same transaction. As such, the transaction itself
will notice this conflict and refuse the update altogether before we commit
any of the values.
Note that this requires changes to a couple of tests in t5408 that happened
to exercise this behaviour. Given that the generated output is misleading
and given that the user request cannot ever be fully honored this really
feels more like a bug than properly designed behaviour. As such, changing
the behaviour feels like the right thing to do.
Since now reference updates are batched, the 'reference-transaction'
hook will be invoked with all updates together. Currently git will 'die'
when the hook returns with a non-zero exit status in the 'prepared'
stage. For 'git-receive-pack(1)', this allowed users to reject an
individual reference update, git would have applied previous updates but
immediately abort further execution. This is definitely an incorrect
usage of this hook, since the right place to do this would be the
'update' hook. This patch retains the latter behavior, but
'reference-transaction' hook now changes to a all-or-nothing behavior
when a non-zero exit status is returned in the 'prepared' stage, since
batch updates use a transaction under the hood. This explains the change
in 't1416'.
Helped-by: Jeff King <peff@peff.net>
Helped-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The 'git-send-pack(1)' allows users to push objects to a remote
repository and explicitly list the references to be pushed. The status
of each reference pushed is captured into a list mapped by refname.
If a reference fails to be updated, its error message is captured in the
`ref->remote_status` field. While the command allows duplicate ref
inputs, the list doesn't accommodate this behavior as a particular
refname is linked to a single `struct ref*` element. So if the user
inputs a reference twice like:
git send-pack remote.git A:foo B:foo
where the user is trying to update the same reference 'foo' twice and
the reference fails to be updated, we first fill `ref->remote_status`
with error message for the input 'A:foo' then we override the same field
with the error message for 'B:foo'. This override happens without first
free'ing the previous value. Fix this leak.
The current tests already incorporate the above example, but in the test
'A:foo' succeeds while 'B:foo' fails, meaning that the memory leak isn't
triggered. Add a new test with multiple duplicates.
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The reference updates performed as a part of 'git-fetch(1)', take place
one at a time. For each reference update, a new transaction is created
and committed. This is necessary to ensure we can allow individual
updates to fail without failing the entire command. The command also
supports an '--atomic' mode, which uses a single transaction to update
all of the references. But this mode has an all-or-nothing approach,
where if a single update fails, all updates would fail.
In 23fc8e4f61 (refs: implement batch reference update support,
2025-04-08), we introduced a new mechanism to batch reference updates.
Under the hood, this uses a single transaction to perform a batch of
reference updates, while allowing only individual updates to fail.
Utilize this newly introduced batch update mechanism in 'git-fetch(1)'.
This provides a significant bump in performance, especially when dealing
with repositories with large number of references.
Adding support for batched updates is simply modifying the flow to also
create a batch update transaction in the non-atomic flow.
With the reftable backend there is a 22x performance improvement, when
performing 'git-fetch(1)' with 10000 refs:
Benchmark 1: fetch: many refs (refformat = reftable, refcount = 10000, revision = master)
Time (mean ± σ): 3.403 s ± 0.775 s [User: 1.875 s, System: 1.417 s]
Range (min … max): 2.454 s … 4.529 s 10 runs
Benchmark 2: fetch: many refs (refformat = reftable, refcount = 10000, revision = HEAD)
Time (mean ± σ): 154.3 ms ± 17.6 ms [User: 102.5 ms, System: 56.1 ms]
Range (min … max): 145.2 ms … 220.5 ms 18 runs
Summary
fetch: many refs (refformat = reftable, refcount = 10000, revision = HEAD) ran
22.06 ± 5.62 times faster than fetch: many refs (refformat = reftable, refcount = 10000, revision = master)
In similar conditions, the files backend sees a 1.25x performance
improvement:
Benchmark 1: fetch: many refs (refformat = files, refcount = 10000, revision = master)
Time (mean ± σ): 605.5 ms ± 9.4 ms [User: 117.8 ms, System: 483.3 ms]
Range (min … max): 595.6 ms … 621.5 ms 10 runs
Benchmark 2: fetch: many refs (refformat = files, refcount = 10000, revision = HEAD)
Time (mean ± σ): 485.8 ms ± 4.3 ms [User: 91.1 ms, System: 396.7 ms]
Range (min … max): 477.6 ms … 494.3 ms 10 runs
Summary
fetch: many refs (refformat = files, refcount = 10000, revision = HEAD) ran
1.25 ± 0.02 times faster than fetch: many refs (refformat = files, refcount = 10000, revision = master)
With this we'll either be using a regular transaction or a batch update
transaction. This helps cleanup some code which is no longer needed as
we'll now always have some type of 'ref_transaction' object being
propagated.
One big change is that earlier, each individual update would propagate a
failure. Whereas now, the `ref_transaction_for_each_rejected_update`
function is called at the end of the flow to capture the exit status for
'git-fetch(1)' and also to print F/D conflict errors. This does change
the order of the errors being printed, but the behavior stays the same.
Since transaction errors are now explicitly defined as part of
76e760b999 (refs: introduce enum-based transaction error types,
2025-04-08), utilize them and get rid of custom errors defined within
'builtin/fetch.c'.
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The commit 76e760b999 (refs: introduce enum-based transaction error
types, 2025-04-08) introduced enum-based transaction error types. The
refs transaction logic was also modified to propagate these errors. For
clients of the ref transaction system, it would be beneficial to provide
human readable messages for these errors.
There is already an existing mapping in 'builtin/update-ref.c', move it
to 'refs.c' as `ref_transaction_error_msg()` and use the same within the
'builtin/update-ref.c'.
Helped-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Since this document was written, the built-in API has been
updated a few times, but the document was left stale.
Adjust to the current best practices by calling repo_config() on the
repository instance the subcommand implementation receives as a
parameter, instead of calling git_config() that used to be the
common practice.
Signed-off-by: K Jayatheerth <jayatheerthkulkarni2005@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The sample program, as written, would no longer build for at least two
reasons:
- Since this document was first written, the convention to call a
subcommand implementation has changed, and cmd_psuh() now needs
to accept the fourth parameter, repository.
- These days, compiler warning options for developers include one
that detects and complains about unused parameters, so ones that
are deliberately unused have to be marked as such.
Update the old-style examples to adjust to the current practices,
with explanations as needed.
Signed-off-by: K Jayatheerth <jayatheerthkulkarni2005@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The git-mentoring group was initially created to help newcomers
with their development itches. However, in practice,
most of their questions were already being addressed
directly on the mailing list, and contributors consistently
received helpful responses there.
Remove the mentoring group details from the Documentation.
Signed-off-by: K Jayatheerth <jayatheerthkulkarni2005@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Git Forges may be interested in whether two branches can be merged while
not being interested in what the resulting merge tree is nor which files
conflicted. For such cases, add a new --quiet flag which
will make use of the new mergeability_only flag added to merge-ort in
the previous commit. This option allows the merge machinery to, in the
outer layer of the merge:
* exit early when a conflict is detected
* avoid writing (most) merged blobs/trees to the object store
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Git Forges may be interested in whether two branches can be merged while
not being interested in what the resulting merge tree is nor which files
conflicted. For such cases, add a new mergeability_only option. This
option allows the merge machinery to, in the "outer layer" of the merge:
* exit upon first[-ish] conflict
* avoid (not prevent) writing merged blobs/trees to the object store
I have a number of qualifiers there, so let me explain each:
"outer layer":
Note that since the recursive merge of merge bases (corresponding to
call_depth > 0) can conflict without the outer final merge
(corresponding to call_depth == 0) conflicting, we can't short-circuit
nor avoid writing merged blobs/trees to the object store during those
inner merges.
"first-ish conflict":
The current patch only exits early from process_entries() on the first
conflict it detects, but conflicts could have been detected in a
previous function call, namely detect_and_process_renames(). However:
* conflicts detected by detect_and_process_renames() are quite rare
conflict types
* the detection would still come after regular rename detection
(which is the expensive part of detect_and_process_renames()), so
it is not saving us much in computation time given that
process_entries() directly follows detect_and_process_renames()
* [this overlaps with the next bullet point] process_entries() is the
place where virtually all object writing occurs (object writing is
sometimes more of a concern for Forges than computation time), so
exiting early here isn't saving us much in object writes either
* the code changes needed to handle an earlier exit are slightly
more invasive in detect_and_process_renames() than for
process_entries().
Given the rareness of the even earlier conflicts, the limited savings
we'd get from exiting even earlier, and in an attempt to keep this
patch simpler, we don't guarantee that we actually exit on the first
conflict detected. We can always revisit this decision later if we
decide that a further micro-optimization to exit slightly earlier in
rare cases is worthwhile.
"avoid (not prevent) writing objects":
The detect_and_process_renames() call can also write objects to the
object store, when rename/rename conflicts involve one (or more) files
that have also been modified on both sides. Because of this alternate
call path leading to handle_content_merges(), our "early exit" does not
prevent writing objects entirely, even within the "outer layer"
(i.e. even within call_depth == 0). I figure that's fine though, since
we're already writing objects for the inner merges (i.e. for call_depth
> 0), which are likely going to represent vastly more objects than files
involved in rename/rename+modify/modify cases in the outer merge, on
average.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Every once in a while, users report that editing the commit summaries
in the todo list does not get reflected in the rebase operation,
suggesting that users are (a) only using one-line commit messages, and
(b) not understanding that the commit summaries are merely helpful
comments to help them find the right hashes.
It may be difficult to correct users' poor commit messages, but we can
at least try to make it clearer that the commit summaries are not
directives of some sort by inserting a comment character. Hopefully
that leads to them looking a little further and noticing the hints at
the bottom to use 'reword' or 'edit' directives.
Yes, this change may look funny at first since it hardcodes '#' rather
than using comment_line_str. However:
* comment_line_str exists to allow disambiguation between lines in
a commit message and lines that are instructions to users editing
the commit message. No such disambiguation is needed for these
comments that occur on the same line after existing directives
* the exact "comment" character(s) on regular pick lines used aren't
actually important; I could have used anything, including completely
random variable length text for each line and it'd work because we
ignore everything after 'pick' and the hash.
* The whole point of this change is to signal to users that they
should NOT be editing any part of the line after the hash (and if
they do so, their edits will be ignored), while the whole point of
comment_line_str is to allow highly flexible editing. So making
it more general by using comment_line_str actually feels
counterproductive.
* The character for merge directives absolutely must be '#'; that
has been deeply hardcoded for a long time (see below), and will
break if some other comment character is used instead. In a
desire to have pick and merge directives be similar, I use the
same comment character for both.
* Perhaps merge directives could be fixed to not be inflexible about
the comment character used, if someone feels highly motivated, but
I think that should be done in a separate follow-on patch.
Here are (some of?) the locations where '#' has already been hardcoded
for a long time for merges:
1) In check_label_or_ref_arg():
case TODO_LABEL:
/*
* '#' is not a valid label as the merge command uses it to
* separate merge parents from the commit subject.
*/
2) In do_merge():
/*
* For octopus merges, the arg starts with the list of revisions to be
* merged. The list is optionally followed by '#' and the oneline.
*/
merge_arg_len = oneline_offset = arg_len;
for (p = arg; p - arg < arg_len; p += strspn(p, " \t\n")) {
if (!*p)
break;
if (*p == '#' && (!p[1] || isspace(p[1]))) {
3) In label_oid():
if ((buf->len == the_hash_algo->hexsz &&
!get_oid_hex(label, &dummy)) ||
(buf->len == 1 && *label == '#') ||
hashmap_get_from_hash(&state->labels,
strihash(label), label)) {
/*
* If the label already exists, or if the label is a
* valid full OID, or the label is a '#' (which we use
* as a separator between merge heads and oneline), we
* append a dash and a number to make it unique.
*/
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
There does not appear to be anything particularly incompatible about the
--shallow and --path-walk options of 'git pack-objects'. If shallow
commits are to be handled differently, then it is by the revision walk
that defines the commit set and which are interesting or uninteresting.
However, before the previous change, a trivial removal of the warning
would cause a failure in t5500-fetch-pack.sh when
GIT_TEST_PACK_PATH_WALK is enabled. The shallow fetch would provide more
objects than we desired, due to some incorrect behavior of the path-walk
API, especially around walking uninteresting objects.
The recently-added tests in t5538-push-shallow.sh help to confirm this
behavior is working with the --path-walk option if
GIT_TEST_PACK_PATH_WALK is enabled. These tests passed previously due to
the --path-walk feature being disabled in the presence of a shallow
clone.
Signed-off-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In preparation for allowing both the --shallow and --path-walk options
in the 'git pack-objects' builtin, create a new 'edge_aggressive' option
in the path-walk API. This option will help walk the boundary more
thoroughly and help avoid sending extra objects during fetches and
pushes.
The only use of the 'edge_hint_aggressive' option in the revision API is
within mark_edges_uninteresting(), which is usually called before
between prepare_revision_walk() and before visiting commits with
get_revision(). In prepare_revision_walk(), the UNINTERESTING commits
are walked until a boundary is found.
Signed-off-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Adapting the implementation of ll_find_deltas(), create a threaded
version of the --path-walk compression step in 'git pack-objects'.
This involves adding a 'regions' member to the thread_params struct,
allowing each thread to own a section of paths. We can simplify the way
jobs are split because there is no value in extending the batch based on
name-hash the way sections of the object entry array are attempted to be
grouped. We re-use the 'list_size' and 'remaining' items for the purpose
of borrowing work in progress from other "victim" threads when a thread
has finished its batch of work more quickly.
Using the Git repository as a test repo, the p5313 performance test
shows that the resulting size of the repo is the same, but the threaded
implementation gives gains of varying degrees depending on the number of
objects being packed. (This was tested on a 16-core machine.)
Test HEAD~1 HEAD
---------------------------------------------------
5313.20: big pack 2.38 1.99 -16.4%
5313.21: big pack size 16.1M 16.0M -0.2%
5313.24: repack 107.32 45.41 -57.7%
5313.25: repack size 213.3M 213.2M -0.0%
(Test output is formatted to better fit in message.)
This ~60% reduction in 'git repack --path-walk' time is typical across
all repos I used for testing. What is interesting is to compare when the
overall time improves enough to outperform the --name-hash-version=1
case. These time improvements correlate with repositories with data
shapes that significantly improve their data size as well. The
--path-walk feature frequently takes longer than --name-hash-version=2,
trading some extra computation for some additional compression. The
natural place where this additional computation comes from is the two
compression passes that --path-walk takes, though the first pass is
naturally faster due to the path boundaries avoiding a number of delta
compression attempts.
For example, the microsoft/fluentui repo has significant size reduction
from --name-hash-version=1 to --name-hash-version=2 followed by further
improvements with --path-walk. The threaded computation makes
--path-walk more competitive in time compared to --name-hash-version=2,
though still ~31% more expensive in that metric.
Repack Method Pack Size Time
------------------------------------------
Hash v1 439.4M 87.24s
Hash v2 161.7M 21.51s
Path Walk (Before) 142.5M 81.29s
Path Walk (After) 142.5M 28.16s
Similar results hold for the Git repository:
Repack Method Pack Size Time
------------------------------------------
Hash v1 248.8M 30.44s
Hash v2 249.0M 30.15s
Path Walk (Before) 213.2M 142.50s
Path Walk (After) 213.3M 45.41s
...as well as the nodejs/node repository:
Repack Method Pack Size Time
------------------------------------------
Hash v1 739.9M 71.18s
Hash v2 764.6M 67.82s
Path Walk (Before) 698.1M 208.10s
Path Walk (After) 698.0M 75.10s
Finally, the Linux kernel repository is a good test for this repacking
time change, even though the space savings is more subtle:
Repack Method Pack Size Time
------------------------------------------
Hash v1 2.5G 554.41s
Hash v2 2.5G 549.62s
Path Walk (before) 2.2G 1562.36s
Path Walk (before) 2.2G 559.00s
Signed-off-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Previously, the --path-walk option to 'git pack-objects' would compute
deltas inline with the path-walk logic. This would make the progress
indicator look like it is taking a long time to enumerate objects, and
then very quickly computed deltas.
Instead of computing deltas on each region of objects organized by tree,
store a list of regions corresponding to these groups. These can later
be pulled from the list for delta compression before doing the "global"
delta search.
This presents a new progress indicator that can be used in tests to
verify that this stage is happening.
The current implementation is not integrated with threads, but we are
setting it up to arrive in the next change.
Since we do not attempt to sort objects by size until after exploring
all trees, we can remove the previous change to t5530 due to a different
error message appearing first.
Signed-off-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Repositories registered with Scalar are expected to be client-only
repositories that are rather large. This means that they are more likely to
be good candidates for using the --path-walk option when running 'git
pack-objects', especially under the hood of 'git push'. Enable this config
in Scalar repositories.
Signed-off-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Users may want to enable the --path-walk option for 'git pack-objects' by
default, especially underneath commands like 'git push' or 'git repack'.
This should be limited to client repositories, since the --path-walk option
disables bitmap walks, so would be bad to include in Git servers when
serving fetches and clones. There is potential that it may be helpful to
consider when repacking the repository, to take advantage of improved deltas
across historical versions of the same files.
Much like how "pack.useSparse" was introduced and included in
"feature.experimental" before being enabled by default, use the repository
settings infrastructure to make the new "pack.usePathWalk" config enabled by
"feature.experimental" and "feature.manyFiles".
In order to test that this config works, add a new trace2 region around
the path walk code that can be checked by a 'git push' command.
Signed-off-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Since 'git pack-objects' supports a --path-walk option, allow passing it
through in 'git repack'. This presents interesting testing opportunities for
comparing the different repacking strategies against each other.
Add the --path-walk option to the performance tests in p5313.
For the microsoft/fluentui repo [1] checked out at a specific commit [2],
the --path-walk tests in p5313 look like this:
Test this tree
-------------------------------------------------------------------------
5313.18: thin pack with --path-walk 0.08(0.06+0.02)
5313.19: thin pack size with --path-walk 18.4K
5313.20: big pack with --path-walk 2.10(7.80+0.26)
5313.21: big pack size with --path-walk 19.8M
5313.22: shallow fetch pack with --path-walk 1.62(3.38+0.17)
5313.23: shallow pack size with --path-walk 33.6M
5313.24: repack with --path-walk 81.29(96.08+0.71)
5313.25: repack size with --path-walk 142.5M
[1] https://github.com/microsoft/fluentui
[2] e70848ebac1cd720875bccaa3026f4a9ed700e08
Along with the earlier tests in p5313, I'll instead reformat the
comparison as follows:
Repack Method Pack Size Time
---------------------------------------
Hash v1 439.4M 87.24s
Hash v2 161.7M 21.51s
Path Walk 142.5M 81.29s
There are a few things to notice here:
1. The benefits of --name-hash-version=2 over --name-hash-version=1 are
significant, but --path-walk still compresses better than that
option.
2. The --path-walk command is still using --name-hash-version=1 for the
second pass of delta computation, using the increased name hash
collisions as a potential method for opportunistic compression on
top of the path-focused compression.
3. The --path-walk algorithm is currently sequential and does not use
multiple threads for delta compression. Threading will be
implemented in a future change so the computation time will improve
to better compete in this metric.
There are small benefits in size for my copy of the Git repository:
Repack Method Pack Size Time
---------------------------------------
Hash v1 248.8M 30.44s
Hash v2 249.0M 30.15s
Path Walk 213.2M 142.50s
As well as in the nodejs/node repository [3]:
Repack Method Pack Size Time
---------------------------------------
Hash v1 739.9M 71.18s
Hash v2 764.6M 67.82s
Path Walk 698.1M 208.10s
[3] https://github.com/nodejs/node
This benefit also repeats in my copy of the Linux kernel repository:
Repack Method Pack Size Time
---------------------------------------
Hash v1 2.5G 554.41s
Hash v2 2.5G 549.62s
Path Walk 2.2G 1562.36s
It is important to see that even when the repository shape does not have
many name-hash collisions, there is a slight space boost to be found
using this method.
As this repacking strategy was released in Git for Windows 2.47.0, some
users have reported cases where the --path-walk compression is slightly
worse than the --name-hash-version=2 option. In those cases, it may be
beneficial to combine the two options. However, there has not been a
released version of Git that has both options and I don't have access to
these repos for testing.
Signed-off-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
It can be notoriously difficult to detect if delta bases are being
computed properly during 'git push'. Construct an example where it will
make a kilobyte worth of difference when a delta base is not found. We
can then use the progress indicators to distinguish between bytes and
KiB depending on whether the delta base is found and used.
Signed-off-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
There are many tests that validate whether 'git pack-objects' works as
expected. Instead of duplicating these tests, add a new test environment
variable, GIT_TEST_PACK_PATH_WALK, that implies --path-walk by default
when specified.
This was useful in testing the implementation of the --path-walk
implementation, helping to find tests that are overly specific to the
default object walk. These include:
- t0411-clone-from-partial.sh : One test fetches from a repo that does
not have the boundary objects. This causes the path-based walk to
fail. Disable the variable for this test.
- t5306-pack-nobase.sh : Similar to t0411, one test fetches from a repo
without a boundary object.
- t5310-pack-bitmaps.sh : One test compares the case when packing with
bitmaps to the case when packing without them. Since we disable the
test variable when writing bitmaps, this causes a difference in the
object list (the --path-walk option adds an extra object). Specify
--no-path-walk in both processes for the comparison. Another test
checks for a specific delta base, but when computing dynamically
without using bitmaps, the base object it too small to be considered
in the delta calculations so no base is used.
- t5316-pack-delta-depth.sh : This script cares about certain delta
choices and their chain lengths. The --path-walk option changes how
these chains are selected, and thus changes the results of this test.
- t5322-pack-objects-sparse.sh : This demonstrates the effectiveness of
the --sparse option and how it combines with --path-walk.
- t5332-multi-pack-reuse.sh : This test verifies that the preferred
pack is used for delta reuse when possible. The --path-walk option is
not currently aware of the preferred pack at all, so finds a
different delta base.
- t7406-submodule-update.sh : When using the variable, the --depth
option collides with the --path-walk feature, resulting in a warning
message. Disable the variable so this warning does not appear.
I want to call out one specific test change that is only temporary:
- t5530-upload-pack-error.sh : One test cares specifically about an
"unable to read" error message. Since the current implementation
performs delta calculations within the path-walk API callback, a
different "unable to get size" error message appears. When this
is changed in a future refactoring, this test change can be reverted.
Similar to GIT_TEST_NAME_HASH_VERSION, we do not add this option to the
linux-TEST-vars CI build as that's already an overloaded build.
Signed-off-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The previous change added a --path-walk option to 'git pack-objects'.
Create a performance test that demonstrates the time and space benefits
of the feature.
In order to get an appropriate comparison, we need to avoid reusing
deltas and recompute them from scratch.
Compare the creation of a thin pack representing a small push and the
creation of a relatively large non-thin pack.
Running on my copy of the Git repository results in this data (removing
the repack tests for --name-hash-version):
Test this tree
------------------------------------------------------------------------
5313.2: thin pack with --name-hash-version=1 0.02(0.01+0.01)
5313.3: thin pack size with --name-hash-version=1 1.6K
5313.4: big pack with --name-hash-version=1 2.55(4.20+0.26)
5313.5: big pack size with --name-hash-version=1 16.4M
5313.6: shallow fetch pack with --name-hash-version=1 1.24(2.03+0.08)
5313.7: shallow pack size with --name-hash-version=1 12.2M
5313.10: thin pack with --name-hash-version=2 0.03(0.01+0.01)
5313.11: thin pack size with --name-hash-version=2 1.6K
5313.12: big pack with --name-hash-version=2 1.91(3.23+0.20)
5313.13: big pack size with --name-hash-version=2 16.4M
5313.14: shallow fetch pack with --name-hash-version=2 1.06(1.57+0.10)
5313.15: shallow pack size with --name-hash-version=2 12.5M
5313.18: thin pack with --path-walk 0.03(0.01+0.01)
5313.19: thin pack size with --path-walk 1.6K
5313.20: big pack with --path-walk 2.05(3.24+0.27)
5313.21: big pack size with --path-walk 16.3M
5313.22: shallow fetch pack with --path-walk 1.08(1.66+0.07)
5313.23: shallow pack size with --path-walk 12.4M
This can be reformatted as follows:
Pack Type Hash v1 Hash v2 Path Walk
---------------------------------------------------
thin pack (time) 0.02s 0.03s 0.03s
(size) 1.6K 1.6K 1.6K
big pack (time) 2.55s 1.91s 2.05s
(size) 16.4M 16.4M 16.3M
shallow pack (time) 1.24s 1.06s 1.08s
(size) 12.2M 12.5M 12.4M
Note that the timing is slower because there is no threading in the
--path-walk case (yet). Also, the shallow pack cases are really not
using the --path-walk logic right now because it is disabled until some
additions are made to the path walk API.
The cases where the --path-walk option really shines is when the default
name-hash is overwhelmed with unhelpful collisions. An open source
example can be found in the microsoft/fluentui repo [1] at a certain
commit [2].
[1] https://github.com/microsoft/fluentui
[2] e70848ebac1cd720875bccaa3026f4a9ed700e08
Running the tests on this repo results in the following comparison table:
Pack Type Hash v1 Hash v2 Path Walk
---------------------------------------------------
thin pack (time) 0.36s 0.12s 0.08s
(size) 1.2M 22.0K 18.4K
big pack (time) 2.00s 2.90s 2.21s
(size) 20.4M 25.9M 19.5M
shallow pack (time) 1.41s 1.80s 1.65s
(size) 34.4M 33.7M 33.6M
Notice in particular that in the small thin pack, the time performance
has improved from 0.36s for --name-hash-version=1 to 0.08s and this is
likely due to the improved size of the resulting pack: 18.4K instead of
1.2M. The relatively new --name-hash-version=2 is competitive with
--path-walk (0.12s and 22.0K) but not quite as successful.
Finally, running this on a copy of the Linux kernel repository results
in these data points:
Pack Type Hash v1 Hash v2 Path Walk
---------------------------------------------------
thin pack (time) 0.03s 0.13s 0.03s
(size) 4.6K 4.6K 4.6K
big pack (time) 15.29s 12.32s 13.92s
(size) 201.1M 159.1M 158.5M
shallow pack (time) 10.88s 22.93s 22.74s
(size) 269.2M 273.8M 267.7M
Signed-off-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The t0450 test script verifies that builtin usage matches the synopsis
in the documentation. Adjust the builtin to match and then remove 'git
pack-objects' from the exception list.
Signed-off-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In order to more easily compute delta bases among objects that appear at
the exact same path, add a --path-walk option to 'git pack-objects'.
This option will use the path-walk API instead of the object walk given
by the revision machinery. Since objects will be provided in batches
representing a common path, those objects can be tested for delta bases
immediately instead of waiting for a sort of the full object list by
name-hash. This has multiple benefits, including avoiding collisions by
name-hash.
The objects marked as UNINTERESTING are included in these batches, so we
are guaranteeing some locality to find good delta bases.
After the individual passes are done on a per-path basis, the default
name-hash is used to find other opportunistic delta bases that did not
match exactly by the full path name.
The current implementation performs delta calculations while walking
objects, which is not ideal for a few reasons. First, this will cause
the "Enumerating objects" phase to be much longer than usual. Second, it
does not take advantage of threading during the path-scoped delta
calculations. Even with this lack of threading, the path-walk option is
sometimes faster than the usual approach. Future changes will refactor
this code to allow for threading, but that complexity is deferred until
later to keep this patch as simple as possible.
This new walk is incompatible with some features and is ignored by
others:
* Object filters are not currently integrated with the path-walk API,
such as sparse-checkout or tree depth. A blobless packfile could be
integrated easily, but that is deferred for later.
* Server-focused features such as delta islands, shallow packs, and
using a bitmap index are incompatible with the path-walk API.
* The path walk API is only compatible with the --revs option, not
taking object lists or pack lists over stdin. These alternative ways
to specify the objects currently ignores the --path-walk option
without even a warning.
Future changes will create performance tests that demonstrate the power
of this approach.
Signed-off-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This will be helpful in a future change, which will reuse this logic.
Signed-off-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The previous three changes contributed performance improvements to 'git
apply', 'git add -p', and 'git reset -p' when using a sparse index. The
improvement to 'git apply' also improved 'git checkout -p'. Add
performance tests to demonstrate this (and to help validate that
performance remains good in the future).
In the truncated test output below, we see that the full checkout
performance changes within noise expectations, but the sparse index
cases improve 33% and then 96% for 'git add -p' and 41% and then 95% for
'git reset -p'. 'git checkout -p' improves immediatley by 91% because it
does not need any change to its builtin.
Test HEAD~4 HEAD~3 HEAD~2 HEAD~1
-------------------------------------------------------------------------------------
2000.118: ... git add -p (full-v3) 0.79 0.79 +0.0% 0.82 +3.8% 0.82 +3.8%
2000.119: ... git add -p (full-v4) 0.74 0.76 +2.7% 0.74 +0.0% 0.76 +2.7%
2000.120: ... git add -p (sparse-v3) 1.94 1.28 -34.0% 0.07 -96.4% 0.07 -96.4%
2000.121: ... git add -p (sparse-v4) 1.93 1.28 -33.7% 0.06 -96.9% 0.06 -96.9%
2000.122: ... git checkout -p (full-v3) 1.18 1.18 +0.0% 1.18 +0.0% 1.19 +0.8%
2000.123: ... git checkout -p (full-v4) 1.10 1.12 +1.8% 1.11 +0.9% 1.11 +0.9%
2000.124: ... git checkout -p (sparse-v3) 1.31 0.11 -91.6% 0.11 -91.6% 0.11 -91.6%
2000.125: ... git checkout -p (sparse-v4) 1.29 0.11 -91.5% 0.11 -91.5% 0.11 -91.5%
2000.126: ... git reset -p (full-v3) 0.81 0.80 -1.2% 0.83 +2.5% 0.83 +2.5%
2000.127: ... git reset -p (full-v4) 0.78 0.77 -1.3% 0.77 -1.3% 0.78 +0.0%
2000.128: ... git reset -p (sparse-v3) 1.58 0.92 -41.8% 0.91 -42.4% 0.07 -95.6%
2000.129: ... git reset -p (sparse-v4) 1.58 0.92 -41.8% 0.92 -41.8% 0.07 -95.6%
It is worth noting that if our test was more involved and had multiple
hunks to evaluate, then the time spent in 'git apply' would dominate due
to multiple index loads and writes. As it stands, we need the sparse
index improvement in 'git add -p' itself to confirm this performance
improvement.
Since the change for 'git add -i' is identical, we avoid a second test
case for that similar operation.
Signed-off-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Similar to the previous change for 'git add -p', the reset builtin
checked for integration with the sparse index after possibly redirecting
its logic toward the interactive logic. This means that the builtin
would expand the sparse index to a full one upon read.
Move this check earlier within cmd_reset() to improve performance here.
Add tests to guarantee that we are not universally expanding the index.
Add behavior tests to check that we are doing the same operations as a
full index.
Signed-off-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
It is slow to expand a sparse index in-memory due to parsing of trees.
We aim to minimize that performance cost when possible. 'git add -p'
uses 'git apply' child processes to modify the index, but still there
are some expansions that occur.
It turns out that control flows out of cmd_add() in the interactive
cases before the lines that confirm that the builtin is integrated with
the sparse index.
Moving that integration point earlier in cmd_add() allows 'git add -i'
and 'git add -p' to operate without expanding a sparse index to a full
one.
Add test cases that confirm that these interactive add options work with
the sparse index.
Signed-off-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The sparse index allows storing directory entries in the index, marked
with the skip-wortkree bit and pointing to a tree object. This may be an
unexpected data shape for some implementation areas, so we are rolling
it out incrementally on a builtin-per-builtin basis.
This change enables the sparse index for 'git apply'. The main
motivation for this change is that 'git apply' is used as a child
process of 'git add -p' and expanding the sparse index for each of those
child processes can lead to significant performance issues.
The good news is that the actual index manipulation code used by 'git
apply' is already integrated with the sparse index, so the only product
change is to mark the builtin as allowing the sparse index so it isn't
inflated on read.
The more involved part of this change is around adding tests that verify
how 'git apply' behaves in a sparse-checkout environment and whether or
not the index expands in certain operations.
Signed-off-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The previous function regex required explicit matching of function
bodies using `{`, `(`, `((`, or `[[`, which caused several issues:
- It failed to capture valid functions where `{` was on the next line
due to line continuation (`\`).
- It did not recognize functions with single command body, such as
`x () echo hello`.
Replacing the function body matching logic with `.*$`, ensures
that everything on the function definition line is captured.
Additionally, the word regex is refined to better recognize shell
syntax, including additional parameter expansion operators and
command-line options.
Signed-off-by: Moumita Dhar <dhar61595@gmail.com>
Acked-by: Johannes Sixt <j6t@kdbg.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Since "hash-object --literally" no longer supports objects with unknown
types, there are now no callers of write_object_file_literally() and its
helpers. Let's drop them to simplify the code.
In particular, this gets rid of some ugly copy-and-paste code from
write_object_file_literally(), which is a parallel implementation of
write_object_file(). When the split was originally made, the two weren't
that long, but commits like 63a6745a07 (object-file: update the loose
object map when writing loose objects, 2023-10-01) ended up having to
duplicate some tricky code.
This patch drops all of that duplication and should make things less
error-prone going forward.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Since we recently removed the hash_literally() function, the hash-object
--literally option has been simplified to just removing the
INDEX_FORMAT_CHECK flag. Rather than pass it around as a separate bool,
we can just have the option parser remove the bit from the set of flags
directly. This simplifies the helper functions.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The hash-object command has its own custom flag bits that it sets based
on command-line options. But since we dropped hash_literally() in the
previous commit, the only thing we do with those flag bits is convert
them directly into "index_flags" to pass to index_fd().
This extra layer of indirection makes the code harder to read and reason
about. Let's just use the INDEX_* flags directly.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When passed the "--literally" option, hash-object will allow any
arbitrary string for its "-t" type option. Such objects are only useful
for testing or debugging, as they cannot be used in the normal way
(e.g., you cannot fetch their contents!).
Let's drop this feature, which will eventually let us simplify the
object-writing code. This is technically backwards incompatible, but
since such objects were never really functional, it seems unlikely that
anybody will notice.
We will retain the --literally flag, as it also instructs hash-object
not to worry about other format issues (e.g., type-specific things that
fsck would complain about). The documentation does not need to be
updated, as it was always vague about which checks we're loosening (it
uses only the phrase "any garbage").
The code change is a bit hard to verify from just the patch text. We can
drop our local hash_literally() helper, but it was really just wrapping
write_object_file_literally(). We now replace that with calling
index_fd(), as we do for the non-literal code path, but dropping the
INDEX_FORMAT_CHECK flag. This ends up being the same semantically as
what the _literally() code path was doing (modulo handling unknown
types, which is our goal).
We'll be able to clean up these code paths a bit more in subsequent
patches.
The existing test is flipped to show that we now reject the unknown
type. The additional "extra-long type" test is now redundant, as we bail
early upon seeing a bogus type.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This commit adds a shell library for writing raw loose objects into the
object database. Normally this is done with hash-object, but the
specific intent here is to allow broken objects that hash-object may not
support.
We'll convert several cases that use "hash-object --literally" to write
objects with invalid types. That works currently, but dropping this
dependency will allow us to remove that feature and simplify the
object-writing code.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
It's occasionally useful when testing or debugging to be able to do raw
zlib inflate/deflate operations (e.g., to check the bytes of a specific
loose or packed object).
Even though zlib's deflate algorithm is used by many other programs,
this is surprisingly hard to do in a portable way. E.g., gzip can do
this if you manually munge some header bytes. But the result is somewhat
arcane, and we don't assume gzip is available anyway. Likewise, pigz
will handle raw zlib, but we can't assume it is available.
So let's introduce a short test helper for just doing zlib operations.
We'll use it in subsequent patches to add some new tests, but it would
also have come in handy a few times in the past:
- The hard-coded pack data from 3b910d0c5e (add tests for indexing
packs with delta cycles, 2013-08-23) could probably be generated on
the fly.
- Likewise we could avoid the hard-coded data from 0b1493c2d4
(git_inflate(): skip zlib_post_call() sanity check on Z_NEED_DICT,
2025-02-25). Though note this would require support for more zlib
options.
- It would have helped with the debugging documented in 41dfbb2dbe
(howto: add article on recovering a corrupted object, 2013-10-25).
I'll leave refactoring existing tests for another day, but I hope the
examples above show the general utility.
I aimed for simplicity in the code. In particular, it will read all
input into a memory buffer, rather than streaming. That makes the zlib
loops harder to get wrong (which has been a source of subtle bugs in the
past).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We provide a mechanism for callers to get the object type as a raw
string, rather than an object_type enum. This was in theory useful for
returning types that are not representable in the enum, but we consider
any such type to be an error, and there are no callers that use the
strbuf anymore.
Let's drop support to simplify the code a bit.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When fsck-ing a loose object, we use object_info's type_name strbuf to
record the parsed object type as a string. For most objects this is
redundant with the object_type enum, but it does let us report the
string when we encounter an object with an unknown type (for which there
is no matching enum value).
There are a few downsides, though:
1. The code to report these cases is not actually robust. Since we did
not pass a strbuf to unpack_loose_header(), we only retrieved types
from headers up to 32 bytes. In longer cases, we'd simply say
"object corrupt or missing".
2. This is the last caller that uses object_info's type_name strbuf
support. It would be nice to refactor it so that we can simplify
that code.
3. Likewise, we'll check the hash of the object using its unknown type
(again, as long as that type is short enough). That depends on the
hash_object_file_literally() code, which we'd eventually like to
get rid of.
So we can simplify things by bailing immediately in read_loose_object()
when we encounter an unknown type. This has a few user-visible effects:
a. Instead of producing a single line of error output like this:
error: 26ed13ce3564fbbb44e35bde42c7da717ea004a6: object is of unknown type 'bogus': .git/objects/26/ed13ce3564fbbb44e35bde42c7da717ea004a6
we'll now issue two lines (the first from read_loose_object() when
we see the unparsable header, and the second from the fsck code,
since we couldn't read the object):
error: unable to parse type from header 'bogus 4' of .git/objects/26/ed13ce3564fbbb44e35bde42c7da717ea004a6
error: 26ed13ce3564fbbb44e35bde42c7da717ea004a6: object corrupt or missing: .git/objects/26/ed13ce3564fbbb44e35bde42c7da717ea004a6
This is a little more verbose, but this sort of error should be
rare (such objects are almost impossible to work with, and cannot
be transferred between repositories as they are not representable
in packfiles). And as a bonus, reporting the broken header in full
could help with debugging other cases (e.g., a header like "blob
xyzzy\0" would fail in parsing the size, but previously we'd not
have showed the offending bytes).
b. An object with an unknown type will be reported as corrupt, without
actually doing a hash check. Again, I think this is unlikely to
matter in practice since such objects are totally unusable.
We'll update one fsck test to match the new error strings. And we can
remove another test that covered the case of an object with an unknown
type _and_ a hash corruption. Since we'll skip the hash check now in
this case, the test is no longer interesting.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In oid_object_info_convert(), we convert objects between their sha1 and
sha256 variants. To do this, we naturally need to know the type, which
we get from oid_object_info_extended() using its type_name strbuf
option.
But getting the value as a string (versus an object_type enum) is not
helpful. Since we do not allow unknown types, the regular enum is
sufficient. And the resulting code is a bit simpler, as we no longer
have to manage the extra allocation nor convert the string to an enum
ourselves.
Note that at first glance, it might seem like we should retain the error
check for "type == -1" to catch bogus types found by the underlying
parser. But we don't need it, as an unknown type would have yielded an
error from the call to oid_object_info_extended(), which would already
have caused us to return an error.
In fact, I suspect this was always impossible to trigger. Even when we
were converting the string to a type enum ourselves, an invalid type
would never have escaped oid_object_info_extended(), since we never
passed the (now removed) OBJECT_INFO_ALLOW_UNKNOWN_TYPE option.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Now that we no longer support OBJECT_INFO_ALLOW_UNKNOWN_TYPE, there is
no need to pass a strbuf into oid_object_info_extended() to record the
type. The regular object_type enum is sufficient to capture all of the
types we will allow.
This simplifies the code a bit, and will eventually let us drop
object_info's type_name strbuf support.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Since cat-file dropped its "--allow-unknown-type" option in the previous
commit, there are no more uses of the internal flag that implemented it.
Let's drop it.
That in turn lets us drop the strbuf parameter of unpack_loose_header(),
which now is always NULL. And without that, we can drop all of the
additional code to inflate larger headers into the strbuf.
Arguably we could drop ULHR_TOO_LONG, as no callers really care about
the distinction from ULHR_BAD. But it's easy enough to retain, and it
does let us produce a slightly more specific message in one instance.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The cat-file command has some minor support for handling objects with
"unknown" types. I.e., strings that are not "blob", "commit", "tree", or
"tag".
In theory this could be used for debugging or experimenting with
extensions to Git. But in practice this support is not very useful:
1. You can get the type and size of such objects, but nothing else.
Not even the contents!
2. Only loose objects are supported, since packfiles use numeric ids
for the types, rather than strings.
3. Likewise you cannot ever transfer objects between repositories,
because they cannot be represented in the packfiles used for the
on-the-wire protocol.
The support for these unknown types complicates the object-parsing code,
and has led to bugs such as b748ddb7a4 (unpack_loose_header(): fix
infinite loop on broken zlib input, 2025-02-25). So let's drop it.
The first step is to remove the user-facing parts, which are accessible
only via cat-file. This is technically backwards-incompatible, but given
the limitations listed above, these objects couldn't possibly be useful
in any workflow.
However, we can't just rip out the option entirely. That would hurt a
caller who ran:
git cat-file -t --allow-unknown-object <oid>
and fed it normal, well-formed objects. There --allow-unknown-type was
doing nothing, but we wouldn't want to start bailing with an error. So
to protect any such callers, we'll retain --allow-unknown-type as a
noop.
The code change is fairly small (but we'll able to clean up more code in
follow-on patches). The test updates drop any use of the option. We
still retain tests that feed the broken objects to cat-file without
--allow-unknown-type, as we should continue to confirm that those
objects are rejected. Note that in one spot we can drop a layer of loop,
re-indenting the body; viewing the diff with "-w" helps there.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Provide an overview of the set of functions used for manipulating
`json_writer`s, by describing what functions should be used for
each JSON-related task.
Helped-by: Junio C Hamano <gitster@pobox.com>
Helped-by: Patrick Steinhardt <ps@pks.im>
Helped-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: Lucas Seiki Oshiro <lucasseikioshiro@gmail.com>
Acked-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add a docstring for each function that manipulates json_writers.
Helped-by: Junio C Hamano <gitster@pobox.com>
Helped-by: Patrick Steinhardt <ps@pks.im>
Helped-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: Lucas Seiki Oshiro <lucasseikioshiro@gmail.com>
Acked-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Make repository clean-up tasks "gc" can do available to "git
maintenance" front-end.
* ps/maintenance-missing-tasks:
builtin/maintenance: introduce "rerere-gc" task
builtin/gc: move rerere garbage collection into separate function
builtin/maintenance: introduce "worktree-prune" task
builtin/gc: move pruning of worktrees into a separate function
builtin/gc: remove global variables where it is trivial to do
builtin/gc: fix indentation of `cmd_gc()` parameters
The fallback implementation of open_nofollow() depended on
open("symlink", O_NOFOLLOW) to set errno to ELOOP, but a few BSD
derived systems use different errno, which has been worked around.
* cf/wrapper-bsd-eloop:
wrapper: NetBSD gives EFTYPE and FreeBSD gives EMFILE where POSIX uses ELOOP
In commit-graph.c:fill_oids_from_packs, if open_pack_index failed,
memory allocated and returned by add_packed_git will leak. Simply
add close_pack and free(p) will solve this problem.
Signed-off-by: Lidong Yan <502024330056@smail.nju.edu.cn>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In sequencer.c:todo_list_rearrange_squash, if it fails, memory
allocated in `next`, `tail`, `subjects` and `subject2item` will leak.
Jump to cleanup label before return could fix this leak problem.
Signed-off-by: Lidong Yan <502024330056@smail.nju.edu.cn>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In mailinfo.c:decode_header, if convert_to_utf8 failed, the strbuf stored
in dec will leak. Simply add strbuf_release and free(dec) will solve
this problem.
Signed-off-by: Lidong Yan <502024330056@smail.nju.edu.cn>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In 3e81bccdf3 (sequencer: factor out todo command name parsing,
2019-06-27), a `return` statement was introduced that basically was a
long sequence of conditions, combined with `&&`, except for the last
condition which is not really a condition but an assignment.
The point of this construct was to return 1 (i.e. `true`) from the
function if all of those conditions held true, and also assign the `bol`
pointer to the end of the parsed command.
Some static analyzers are really unhappy about such constructs. And
human readers are at least puzzled, if not confused, by seeing a single
`=` inside a chain of conditions where they would have expected to see
`==` instead and, based on experience, immediately suspect a typo.
Let's help all of this by turning this into the more verbose, more
readable form of an `if` construct that both assigns the pointer as well
as returns 1 if all of the conditions hold true.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In c429bed102 (bundle-uri: store fetch.bundleCreationToken, 2023-01-31)
code was introduced that assumes that an `sscanf()` call leaves its
output variables unchanged unless the return value indicates success.
However, the POSIX documentation makes no such guarantee:
https://pubs.opengroup.org/onlinepubs/9699919799/functions/sscanf.html
So let's make sure that the output variable `maxCreationToken` is
always well-defined.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The code is a bit too hard to reason about to fully assess whether the
`fill_commit_graph_info()` function is called at all after
`write_commit_graph()` returns (and hence the stack variable
`topo_levels` goes out of context).
Let's simply make sure that the stack address is no longer used at that
stage, thereby making the code quite a bit easier to reason about.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
CodeQL reports empty `if` blocks that only contain a comment as "futile
conditional". The comment talks about potential plans to turn this into
a warning, but that seems not to have been necessary. Replace the entire
construct with a concise comment.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
While `if (i <= 0) ... else if (i > 0) ...` is technically equivalent to
`if (i <= 0) ... else ...`, the latter is vastly easier to read because
it avoids writing out a condition that is unnecessary. Let's drop such
unnecessary conditions.
Pointed out by CodeQL.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
As pointed out by CodeQL, `branch_get()` may return `NULL`, in which
case `branch_has_merge_config()` would return early, but we can even
avoid enumerating the refs prefixes in that case, saving even more CPU
cycles.
Technically, we should enclose these two statements in an `if (branch)
{...}` block, but the indentation is already quite deep, therefore I
refrained from doing that.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
One thing that might be non-obvious to readers (or to analyzers like
CodeQL) is that the function essentially does nothing when the Git index
is empty, and in particular that it does not look at the value of
`len_eq_last` (which would be uninitialized at that point).
Let's make this much easier to understand, by returning early if the Git
index is empty, and by avoiding empty `else` blocks.
This commit changes indentation and is hence best viewed using
`--ignore-space-change`.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
While 3145ea957d (upload-pack: introduce fetch server command,
2018-03-15) added support for the `fetch` command, from the server's
point of view it is an upload, and hence the `enum` should really be
called `upload_state` instead of `fetch_state`. Likewise, rename its
values.
This also helps unconfuse CodeQL which would otherwise be at sixes or
sevens about having _two_ non-local definitions of the same `enum` with
the same values.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We do need a context to write the commit graph, but that context is only
needed during the life time of `commit_graph_write()`, therefore it can
easily be a stack variable.
This also helps CodeQL recognize that it is safe to assign the address
of other local variables to the context's fields.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
As pointed out by CodeQL, it is a potentially dangerous practice to
store local variables' addresses in non-local structs. Yet this is
exactly what happens with the `acked_commits` attribute that is used in
`cmd_fetch()`: The pointer to a local variable is assigned to it.
Now, it is Git's convention that `cmd_*()` functions are essentially
only returning just before exiting the process, therefore there is
little danger that this attribute is used after the code flow returns
from that function.
However, code in `cmd_*()` function is often so useful that it gets
lifted into a library function, at which point this issue could become a
real problem.
Let's make sure to clear the `acked_commits` attribute out after it was
used, and before the function returns (at which point the address would
go stale).
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The difference of two unsigned integers is defined to be unsigned, and
therefore it is misleading to check whether it is greater than zero
(instead, the more natural way would be to check whether the difference
is zero or not).
Let's instead avoid the subtraction altogether, and compare the two
operands directly, which makes the code more obvious as a side effect.
Pointed out by CodeQL's rule with the ID
`cpp/unsigned-difference-expression-compared-zero`.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The commit message is processed to remove unnecessary empty lines.
In particular, it is ensured that the text ends with at most one LF
character. This one is always present, because the Tk text widget
ensures that is present.
However, did not consider that the processed text is written to the
commit message file using `puts`, which also appends a LF character,
so that the final commit message ends with two LF. Trim all trailing
LF characters, and while we are here, use `string trim`, which lets
us remove the leading LF in the same command.
Reported-by: Gareth Fenn <garethfenn@gmail.com>
Reviewed-by: Oswald Buddenhagen <oswald.buddenhagen@gmx.de>
Signed-off-by: Johannes Sixt <j6t@kdbg.org>
A global variable exists that holds the color name used to highlight
search results everywhere, except that in the commit list the color
is still hard-coded to "yellow". Use the global variable there as well.
Signed-off-by: Alexander Ogorodov <bnfour@bnfour.net>
Replace the_repository everywhere with repo, feed repo from cmd_replay()
to all the other functions in the file that need it, and remove the
UNUSED annotation on repo.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
During fsck, we use "strbuf_read" to read the content of "packed-refs"
without using mmap mechanism. This is a bad practice which would consume
more memory than using mmap mechanism. Besides, as all code paths in
"packed-backend.c" use this way, we should make "fsck" align with the
current codebase.
As we have introduced the helper function "allocate_snapshot_buffer", we
can simply use this function to use mmap mechanism.
Suggested-by: Jeff King <peff@peff.net>
Suggested-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: shejialuo <shejialuo@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"load_contents" would choose which way to load the content of the
"packed-refs". However, we cannot directly use this function when
checking the consistency due to we don't want to open the file. And we
also need to reuse the logic to avoid causing repetition.
Let's create a new helper function "allocate_snapshot_buffer" to extract
the snapshot allocation logic in "load_contents" and update the
"load_contents" to align with the behavior.
Suggested-by: Jeff King <peff@peff.net>
Suggested-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: shejialuo <shejialuo@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We assume the "packed-refs" won't be empty and instead has at least one
line in it (even when there are no refs packed, there is the file header
line). Because there is no terminating LF in the empty file, we will
report "packedRefEntryNotTerminated(ERROR)" to the user.
However, the runtime code paths would accept an empty "packed-refs"
file, for example, "create_snapshot" would simply return the "snapshot"
without checking the content of "packed-refs". So, we should skip
checking the content of "packed-refs" when it is empty during fsck.
After 694b7a1999 (repack_without_ref(): write peeled refs in the
rewritten file, 2013-04-22), we would always write a header into the
"packed-refs" file. So, versions of Git that are not too ancient never
write such an empty "packed-refs" file.
As an empty file often indicates a sign of a filesystem-level issue, the
way we want to resolve this inconsistency is not make everybody totally
silent but notice and report the anomaly.
Let's create a "FSCK_INFO" message id "EMPTY_PACKED_REFS_FILE" to report
to the users that "packed-refs" is empty.
Signed-off-by: shejialuo <shejialuo@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The --maintenance option for 'scalar reconfigure' has three possible
values. Improve the documentation by specifying the option in the -h
help menu and usage information.
Signed-off-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The build process fails in POSIXLY_CORRECT mode:
$ gitk@master:1005> POSIXLY_CORRECT=1 make
* new Tcl/Tk interpreter location
GEN gitk-wish
Generating catalog po/zh_cn.msg
msgfmt --statistics --tcl po/zh_cn.po -l zh_cn -d po/
msgfmt: --tcl requires a "-l locale" specification
Try 'msgfmt --help' for more information.
make: *** [Makefile:76: po/zh_cn.msg] Error 1
The reason is that option arguments cannot occur after the first
non-option argument. Move the file name last.
Reported-by: Nathan Royce <nroycea+kernel@gmail.com>
Signed-off-by: Johannes Sixt <j6t@kdbg.org>
`hostname` is a popular command available on both Linux and macOS. As
per the man-page[1], `hostname -f` command returns the fully qualified
domain name (FQDN) of the system. The current Net::Domain perl module
being used in the script for the same has been quite unrealiable in many
cases. Thankfully, we now have a better check for valid_fqdn, which does
reject the invalid FQDNs given by this module properly, but at the same
time, it will result in a fallback to 'localhost.localdomain' being
used. `hostname -f` has been quite reliable (probably even more reliable
than the Net::Domain module) and before falling back to
'localhost.localdomain', we should try to use it. Interestingly, the
`hostname` command is actually used by perl modules like Net::Domain[2]
and Sys::Hostname[3] to get the hostname. So, lets give `hostname -f` a
chance as well!
[1]: https://man7.org/linux/man-pages/man1/hostname.1.html
[2]: https://github.com/Perl/perl5/blob/blead/cpan/libnet/lib/Net/Domain.pm#L88
[3]: https://github.com/Perl/perl5/blob/blead/ext/Sys-Hostname/Hostname.pm#L93
Signed-off-by: Aditya Garg <gargaditya08@live.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git add 'f?o'" did not add 'foo' if 'f?o', an unusual pathname,
also existed on the working tree, which has been corrected.
* kj/glob-path-with-special-char:
dir.c: literal match with wildcard in pathspec should still glob
Code clean-up around stale CI elements and building with Visual Studio.
* js/ci-buildsystems-cleanup:
config.mak.uname: drop the `vcxproj` target
contrib/buildsystems: drop support for building . vcproj/.vcxproj files
ci: stop linking the `prove` cache
With 7304bd2bc39 (ci: wire up Visual Studio build with Meson,
2025-01-22) we have introduced a CI job that builds and tests Git with
Microsoft Visual Studio via Meson. This job is only being executed by
default on GitHub Workflows though -- on GitLab CI it is marked as a
"manual" job, so the developer has to actively trigger these jobs.
The consequence of this split is that any breakage specific to this job
is only noticed by developers who mainly work with GitHub. Let's improve
this situation by also running the job by default on GitLab CI.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The Git project has started to wire up Meson as a build system in Git
v2.48.0. Wire up support for Meson in "git-gui" so that we can trivially
include it as a subproject in Git.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
The "GITGUI_VERSION" variable is made available by generating and
including the "GIT-VERSION-FILE" file. Its value has been used in
various build steps, but in the preceding commits we have refactored
those to instead source the "GIT-VERSION-FILE" directly. As a result,
the variable is now only used in a single recipe, and this use can be
trivially replaced with sed(1).
Refactor the recipe to do so and stop including "GIT-VERSION-FILE" to
simplify the build process.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Extract script to generate the macOS app. This change allows us to reuse
the build logic with the Meson build system.
Note that as part of this change we also modify the TKEXECUTABLE
variable to track its full path. Like this we don't have to propagate
both the TKEXECUTABLE and TKFRAMEWORK variables into the script, and the
basename can be trivially computed from TKEXECUTABLE anyway.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Extract script to generate the macOS wrapper for git-gui. This change
allows us to reuse the build logic with the Meson build system.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Extract script to generate "tclIndex". This change allows us to reuse
the build logic with the Meson build system.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Extract script to generate "git-gui". This change allows us to reuse the
build logic with the Meson build system.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
The value of the GITGUI_SCRIPT variable is only used in a single place
as part of an sed(1) script that massages the "git-gui.sh" script.
Interestingly, this specific replacement does seem to be a no-op: we
replace "^ argv0=$$0" with " argv=$(GITGUI_SCRIPT)", which has a value
of "$$0". The result would thus be completely unchanged.
Drop the replacement and its variable.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
The output of GIT-VERSION-GEN can be sourced by our Makefile to make the
version available there. The output has a couple of spaces around the
equals sign, which is perfectly valid for parsing it in our Makefile.
But in subsequent steps we'll also want to source the file in a couple
of newly-introduced shell scripts, but having spaces around variable
assignments is invalid there.
Prepare for this step by dropping the spaces surrounding the equals
sign. Like this, we can easily use the same file both in our Makefile
and in shell scripts.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
The GIT-VERSION-GEN unconditionally writes version information into the
source directory in the form of the "GIT-VERSION-FILE". We are about to
introduce the Meson build system though, which enforces out-of-tree
builds by default, and in that context we cannot continue to write
version information into the source tree.
Prepare the script for out-of-tree builds by treating the source
directory different from the output file.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
The GIT-GUI-VARS file is used to track whether any of our build options
has changed. Unfortunately, the format of that file does not allow us to
propagate those build options to other scripts. But as we are about to
introduce support for the Meson build system, we will extract a couple
of scripts to deduplicate core build logic across Makefiles and Meson.
With this refactoring, it will become necessary to make build options
more widely accessible.
Replace GIT-GUI-VARS with a new GIT-GUI-BUILD-OPTIONS file that is being
populated from a template. This file can easily be sourced from build
scripts in subsequent steps.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
This can be squashed into the previous step. That is how our "git
pack-redundant" conversion did.
Theoretically, however, those who want to gauge the need to keep the
command by exposing their users to patches before this one may want
to wait until their experiment finishes before they formally say
"this will go away".
This change is made into a separate patch from the previous step
precisely to help those folks.
While at it, update the documentation page to use the new [synopsis]
facility to mark-up the SYNOPSIS part.
Helped-by: Elijah Newren <newren@gmail.com>
[en: typofix]
Signed-off-by: Junio C Hamano <gitster@pobox.com>
As we made "git whatchanged" require "--i-still-use-this" and asked
the users to report if they still want to use it, the logical next
step is to allow us build Git without "whatchanged" to prepare for
its eventual removal.
If we were to follow the pattern established in 8ccc75c2 (remote:
announce removal of "branches/" and "remotes/", 2025-01-22), we can
do this together with the documentation update to officially list
that the command will be removed in the BreakingChanges document,
but let's just keep the changes separate just in case we want to
proceed a bit slower.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The documentation of "git whatchanged" is pretty explicit that the
command was retained for historical reasons to help those whose fingers
cannot be retrained. Let's see if they still are finding it hard to
type "git log --raw" instead of "git whatchanged" by marking the
command as "nominated for removal", and require "--i-still-use-this"
on the command line. Adjust the tests so that the option is passed
when we invoke the command. In addition, we test that the command
fails when "--i-still-use-this" is not given.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Some tests on fast-import run "git whatchanged" without even
checking the output from the command. It is tempting to remove the
calls altogether since they are not doing anything useful, but they
presumably were added there while the tests were developed to manually
sanity check which paths were touched.
Replace these calls with "git log --raw", which is a rough
equivalent in the more modern Git.
This does not remove "git whatchanged", but we no longer have to
worry about adjusting these places when we eventually do.
Helped-by: Elijah Newren <newren@gmail.com>
[en: log message]
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git diff --minimal" used to give non-minimal output when its
optimization kicked in, which has been disabled.
* ng/xdiff-truly-minimal:
xdiff: disable cleanup_records heuristic with --minimal
"git index-pack --fix-thin" used to abort to prevent a cycle in
delta chains from forming in a corner case even when there is no
such cycle.
* ds/fix-thin-fix:
index-pack: allow revisiting REF_DELTA chains
t5309: create failing test for 'git index-pack'
test-tool: add pack-deltas helper
Further refinement on CI messages when an optional external
software is unavailable (e.g. due to third-party service outage).
* jc/ci-skip-unavailable-external-software:
ci: download JGit from maven, not eclipse.org
ci: update the message for unavailble third-party software
Further code clean-up in the object-store layer.
* ps/object-store-cleanup:
object-store: drop `repo_has_object_file()`
treewide: convert users of `repo_has_object_file()` to `has_object()`
object-store: allow fetching objects via `has_object()`
object-store: move function declarations to their respective subsystems
object-store: move and rename `odb_pack_keep()`
object-store: drop `loose_object_path()`
object-store: move `struct packed_git` into "packfile.h"
Update send-email to work better with Outlook's smtp server.
* ag/send-email-outlook:
send-email: add --[no-]outlook-id-fix option
send-email: retrieve Message-ID from outlook SMTP server
Some documentation examples reference "whatchanged", either as a
placeholder command or an example of source structure.
To reduce the need for future edits when `whatchanged` is removed,
replace these references with alternatives:
- In `MyFirstObjectWalk.adoc`, use `version` as the nearby anchor
point for `walken`, instead of `whatchanged`.
- In `user-manual.adoc`, cite `show` instead of `whatchanged` as
a command whose source lives in the same file as `log`.
Helped-by: Elijah Newren <newren@gmail.com>
[en: log message]
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Commands slated for removal like "git pack-redundant" now require
an explicit "--i-still-use-this" option to run. This is to
discourage casual use and surface their pending deprecation to
users.
The warning message is long, so factor it into a helper function
you_still_use_that() to simplify reuse by other commands.
Also add a missing test to ensure this enforcement works for
"pack-redundant".
Helped-by: Elijah Newren <newren@gmail.com>
[en: log message]
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We store the replacement data in an oidmap, which is itself a pointer in
the raw_object_store struct. But there's no need for an extra pointer
indirection here. It is always allocated and initialized along with the
containing struct, and we never check it for NULL-ness.
Let's embed the map directly in the struct, which is simpler and avoids
extra pointer chasing.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Callers which want to know how many items are in an oidmap have to look
at the underlying hashmap struct, leaking an implementation detail.
Let's provide a type-appropriate wrapper and use it.
Note in the call from lookup_replace_object(), the caller was actually
looking at the hashmap's tablesize parameter (the allocated size of the
table) rather than hashmap_get_size(), the number of items in the table.
This probably should have been checking the number of items all along,
but the two are functionally equivalent here since we only add to the
map and never remove anything. Thus if there was any allocation, it was
because there is at least one item.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This function does not free the oidmap struct itself; it just drops all
items from the map (using hashmap_clear_() internally). It should be
called oidmap_clear(), per CodingGuidelines.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In pack-bitmap.c:load_bitmap_entries_v1, the function `read_bitmap_1`
allocates a bitmap and reads index data into it. However, if any of
the validation checks following the allocation fail, the allocated bitmap
is not freed, resulting in a memory leak. To avoid this, the validation
checks should be performed before the bitmap is allocated.
Signed-off-by: Lidong Yan <502024330056@smail.nju.edu.cn>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The "stats" directory contains a couple of scripts to do some statistics
on a repository:
- "git-common-hash" shows the longest common hash prefixes and can be
used to determine the minimum prefix length to use for object names
to be unique. The script has last been touched in 53474eb92ff
(contrib: update stats/mailmap script, 2012-12-12) and searching for
it on the internet doesn't really surface any potential use cases or
even mentions of it.
Modern Git also shouldn't really need this tool as it knows to
automatically scale printed prefixes via some heuristics.
- "mailmap.pl" performs some statistics on the number of mailmapped
commits in a repository. It has last been modified in 53474eb92ff
(contrib: update stats/mailmap script, 2012-12-12) and has since
been bitrotting. It doesn't even compile nowadays anymore:
$ perl contrib/stats/mailmap.pl
Experimental keys on scalar is now forbidden at contrib/stats/mailmap.pl line 57.
Type of arg 1 to keys must be hash or array (not hash element) at contrib/stats/mailmap.pl line 57, near "}) "
Experimental keys on scalar is now forbidden at contrib/stats/mailmap.pl line 57.
Type of arg 1 to keys must be hash or array (not private variable) at contrib/stats/mailmap.pl line 57, near "$h)"
Experimental keys on scalar is now forbidden at contrib/stats/mailmap.pl line 64.
Type of arg 1 to keys must be hash or array (not private variable) at contrib/stats/mailmap.pl line 64, near "$h)"
Execution of contrib/stats/mailmap.pl aborted due to compilation errors.
This should be good-enough signal to indicate that nobody is using
this script at all anymore.
- "packinfo.pl" takes the output from git-verify-pack(1) and performs
some pretty printing thereof. On the one hand it reformats the
output to be easier to read and provide some summaries. On the other
hand it may also print filenames of blobs.
We don't have any replacement for this tool. Ideally, we should move
its functionality into git-verify-pack(1) itself.
Remove the first two scripts, but retain "packinfo.pl".
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The "git-new-workdir" command has been introduced to make it possible to
have a separate working directory in a different place. The command thus
predates git-worktree(1), which is what people use nowadays to create
any such working directory. As such, the script doesn't really have much
of a reason to exist nowadays anymore.
It also doesn't seem like the script is still in use: the last time it
has received an update was in e32afab7b03 (git-new-workdir: don't fail
if the target directory is empty, 2014-11-26), more than a decade ago.
Remove it as well as the tests that depend on it.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
While the "emacs/" directory still exists, all of its code has been
replaced with stubs in 6d5ed4836db (git{,-blame}.el: remove old
bitrotting Emacs code, 2018-04-11). Instead, the recommendation is to
use Emacs' own vc-annotate mode.
Remove the code altogether.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The "git-resurrect.sh" script can be used to find traces of a branch tip
in the reflog and resurrect that branch. Despite a couple of global
cleanups, the script hasn't seen any activity since it was introduced in
e1ff064e1bf (contrib git-resurrect: find traces of a branch name and
resurrect it, 2009-02-04).
Furthermore, the tool does not work with the "reftable" backend at all
as it directly reads ".git/logs/HEAD". As reflogs are stored as part of
the individual tables though that file wouldn't exist in a "reftable"-
enabled repository.
Last but not least, the tool doesn't even work unless it is explicitly
invoked via `git resurrect` as it sources "git-sh-setup". As none of our
build systems know to install this script, users thus have to go out of
their way to really make it work, which is highly unlikely.
Another source that indicates that this tool can be removed is a
question for how to restore deleted branches on StackOverflow [1]. The
top-voted answer uses git-reflog(1) directly and has received more than
3000 votes to date. While "git-resurrect.sh" is also mentioned, it only
got 16 upvotes, and comments mention the above caveat that users have to
do some manual setup to make it work.
It's thus rather clear that the tool doesn't have a lot or even any
users. Remove it.
[1]: https://stackoverflow.com/questions/3640764/can-i-recover-a-branch-after-its-deletion-in-git
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The "persistent-https" remote helper supposedly speeds up SSL operations
by running a daemon that keeps a connection open to a remote server. It
is effectively unmaintained nowadays: the last time it received an
update was in accb613afd2 (contrib/persistent-https: use Git version for
build label, 2016-07-20) and its parent commits to make it compile with
Go 1.7+.
This Go toolchain is somewhat dated by now though and unsupported. The
oldest still-supported toolchain is Go 1.23, which was released in
August 2024. It is not possible to compile the remote helper with that
Go version anymore:
$ go version
go version go1.23.8 linux/amd64
$ make
case $(go version) in \
"go version go"1.[0-5].*) EQ=" " ;; *) EQ="=" ;; esac && \
go build -o git-remote-persistent-https \
-ldflags "-X main._BUILD_EMBED_LABEL${EQ}GIT_VERSION=2.49.0.943.g965a70ebf62"
go: cannot find main module, but found .git/config in /home/pks/Development/git
to create a module there, run:
cd ../.. && go mod init
make: *** [Makefile:31: git-remote-persistent-https] Error 1
The problem is that modern Go toolchains require a "go.mod" file, but we
don't have any such files. This requirement exists since quite a while
already, so it's clear that nobody has tried to use this remote helper
anytime recent.
Remove the remote helper.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The "mw-to-git" directory contains tools for accessing MediaWiki via
Git. The scripts are essentially unmaintained in Git: despite a couple
of global cleanups, the last changes were a couple of security-related
issues part of 9a8606465e8 (remote-mediawiki: use "sh" to eliminate
unquoted commands, 2020-09-21) and its parents. We don't ever run any of
the tests so it is more likely than not that many of the tests have been
bitrotting, like e.g. documented in f8ab018dafc (remote-mediawiki tests:
annotate failing tests, 2020-09-21).
According to Matthieu Moy [1], one of the original developers of this
tool, it didn't receive any attention recently and there is no
motivation to keep maintaining it anymore in the community. The project
has been spun out of Git [2] and thus has a new official home, but did
not receive much attention over there, either.
As such, it seems like the MediaWiki transport helper is slowly fading
away. But given that there is a new home, it doesn't make sense to have
it as part of Git anymore only to let it rot. Remove the directory.
[1]: <108f297a-b415-4742-80e4-51ea02af18e9@matthieu-moy.fr>
[2]: https://github.com/Git-Mediawiki/Git-Mediawiki
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The "hooks" directory contains a handful of example hooks. Most of these
hooks are highly specific and haven't really received any updates over
the last couple of years, except for some global cleanups. The multimail
hook has also been removed in f74d11471fa (multimail: stop shipping a
copy, 2021-06-10) in favor of its upstream project [1].
Remove those hooks. If we want to provide examples for how to use Git
hooks we should do that as part of our documentation, for example in
githooks(5).
[1]: https://github.com/git-multimail/git-multimail
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The "thunderbird-patch-inline" directory in "contrib/" contains a script
to send patch files via Thunderbird. This script depends on the
ExternalEditor extension [1], which seems to be effectively unmaintained
with the last update being in 2008. While the extension has eventually
been maintained in [2], that fork hasn't received any updates since
2020, either.
As such, the ExternalEditor extension does not work with modern versions
of Thunderbird anymore, and as the "thunderbird-patch-inline" script
depends on the ExternalEditor extension it likely doesn't work anymore,
either. The fact that this script hasn't been touched for the last 10
years outside of some global cleanup supports the idea that it is not
useful anymore.
Remove it.
[1]: https://globs.org/articles.php?lng=en&pg=2
[2]: https://github.com/exteditor/exteditor/releases
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The "remote-helpers" directory contains two remote helper scripts for
Mercurial and Bazaar. These scripts have since been converted into stubs
in b2c851a8e67 (Revert "Merge branch 'jc/graduate-remote-hg-bzr' (early
part)", 2014-05-20) as the helpers have been moved into their own
upstream projects [1][2].
Given that these stubs have been created more than a decade ago it is
very unlikely that anybody still tries to use them. Remove them.
[1]: https://github.com/felipec/git-remote-bzr
[1]: https://github.com/felipec/git-remote-hg
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The "examples" directory used to contain scripted versions of some of
our builtins. These have all been removed in 49eb8d39c78 (Remove
contrib/examples/*, 2018-03-25), but we left a note in the directory to
make it discoverable that there used to be examples.
It is unlikely that anybody still looks at these examples more than 7
years after they have been removed. Remove the note and its directory.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Remotes can be configured either via a repository's config or by using
the ".git/branches/" or ".git/remotes/" directories. Back when the new
config-based mechanism has been introduced we also introduced a helper
script that migrates from the old-style remote configuration to the new
config-based mechanism.
With the recent removal announcement for the two directories we also
started to instruct users to migrate repositories that still use these
mechanism to use config-based remotes. Notably though, the migration
path doesn't even use the migration script. Instead, git-remote(1)
itself knows how to migrate any such remote via `git remote rename`.
In fact, a full migration _cannot_ use the script as it only knows to
migrate remotes from ".git/remotes/", but not ".git/branches/". As such,
the migration path via `git remote rename` is the only feasible way to
fully migrate repositories over to the new format.
Last but not least, the script doesn't even work as-is as it sources
"git-sh-setup". For this to work it would need to be invoked either via
Git so that this script is in our PATH, users would have to manually
call it with an adjusted PATH, or distributions need to install the
script into "$prefix/libexec/git-core" with a "git-" prefix. All of
these steps are unlikely enough to underpin the claim that this script
is not used at all.
So given that:
- The script cannot perform a full migration of all deprecated remote
types.
- We don't advertise it anywhere.
- It has been basically untouched since 2007.
- It doesn't even work unless users do manual steps.
It should be safe enough to just remove it. Do so.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In fd888311fbc (reftable/table: move reading block into block reader,
2025-04-07), we have refactored how reftable blocks are read so that
most of the logic is contained in the "block.c" subsystem itself. Most
importantly, the whole logic to read the data itself is now contained in
that subsystem.
This change caused a significant performance regression though when
reading blocks that aren't of the specific type one is searching for:
Benchmark 1: update-ref: create 100k refs (revision = fd888311fbc~)
Time (mean ± σ): 2.171 s ± 0.028 s [User: 1.189 s, System: 0.977 s]
Range (min … max): 2.117 s … 2.206 s 10 runs
Benchmark 2: update-ref: create 100k refs (revision = fd888311fbc)
Time (mean ± σ): 3.418 s ± 0.030 s [User: 2.371 s, System: 1.037 s]
Range (min … max): 3.377 s … 3.473 s 10 runs
Summary
update-ref: create 100k refs (revision = fd888311fbc~) ran
1.57 ± 0.02 times faster than update-ref: create 100k refs (revision = fd888311fbc)
The root caute of the performance regression is that we changed when
exactly blocks of an uninteresting type are being discarded. Previous to
the refactoring in the mentioned commit we'd load the block data, read
its type, notice that it's not the wanted type and discard the block.
After the commit though we don't discard the block immediately, but we
fully decode it only to realize that it's not the desired type. We then
discard the block again, but have already performed a bunch of pointless
work.
Fix the regression by making `reftable_block_init()` return early in
case the block is not of the desired type. This fixes the performance
hit:
Benchmark 1: update-ref: create 100k refs (revision = HEAD~)
Time (mean ± σ): 2.712 s ± 0.018 s [User: 1.990 s, System: 0.716 s]
Range (min … max): 2.682 s … 2.741 s 10 runs
Benchmark 2: update-ref: create 100k refs (revision = HEAD)
Time (mean ± σ): 1.670 s ± 0.012 s [User: 0.991 s, System: 0.676 s]
Range (min … max): 1.652 s … 1.693 s 10 runs
Summary
update-ref: create 100k refs (revision = HEAD) ran
1.62 ± 0.02 times faster than update-ref: create 100k refs (revision = HEAD~)
Note that the baseline performance is lower than in the original due to
a couple of unrelated performance improvements that have landed since
the original commit.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In builtin/am.c:split_mail_stgit_series, if `fopen` failed,
`series_dir_buf` allocated by `xstrdup` will leak. Add `free` in
`!fp` if branch will prevent the leak.
Signed-off-by: Lidong Yan <502024330056@smail.nju.edu.cn>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
'test_path_is_file' is a modern path checking method in Git's development.
Replace the basic shell command 'test -f' with this approach.
Signed-off-by: Rodrigo Carvalho <rodrigorsdc@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When refering to environment variables in the documentation, use the
ENV_VARIABLE format instead of $ENV_VARIABLE. The latter is used in the
documentation to refer to the actual value of the variable, not the name
of the variable.
Signed-off-by: Jean-Noël Avila <jn.avila@free.fr>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
To unify mark-up used in our documentation to a newer convention,
started by 22293895 (doc: apply synopsis simplification on git-clone
and git-init, 2024-09-24), update the documentation pages for 'git
verify-commit', 'git verify-tag', and 'git verify-pack' to
* use [synopsis], not [verse] in the SYNOPSIS section
* enclose `--option=<value>` in backquotes
* do not describe non-option arguments in the OPTIONS section
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Helped-by: Jean-Noël Avila <jn.avila@free.fr>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
To unify mark-up used in our documentation to a newer convention,
started by 22293895 (doc: apply synopsis simplification on git-clone
and git-init, 2024-09-24), update the documentation for 'git var' and
'git write-tree' to
* use [synopsis], not [verse] in the SYNOPSIS section
* enclose `--option=<value>` in backquotes
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Helped-by: Jean-Noël Avila <jn.avila@free.fr>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
To unify mark-up used in our documentation to a newer convention,
started by 22293895 (doc: apply synopsis simplification on git-clone
and git-init, 2024-09-24), update the documentation of 'git daemon'
to
* use [synopsis], not [verse] in the SYNOPSIS section
* enclose `--option=<value>` in backquotes
Also, split '--[no-]option' into '--option' and '--no-option'
to make it easier to grep for them.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Helped-by: Jean-Noël Avila <jn.avila@free.fr>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In reftable/writer.c:writer_index_hash(), if `reftable_buf_add` failed,
key allocated by `reftable_malloc` will not be insert into `obj_index_tree`
thus leaks. Simple add reftable_free(key) will solve this problem.
Signed-off-by: Lidong Yan <502024330056@smail.nju.edu.cn>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In reftable/writer.c:padded_write(), if w->writer failed, zeroed
allocated in `reftable_calloc` will leak. w->writer could be
`reftable_write_data` in reftable/stack.c, and could fail due to
some write error. Simply add reftable_free(zeroed) will solve this
problem.
Signed-off-by: Lidong Yan <502024330056@smail.nju.edu.cn>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Use "proc makedroplist" function to support combobox on legacy widgets
mode. "proc makedroplist" uses "ttk::combobox" for themed mode, and uses
"tk_optionMenu" for legacy mode to get rid of the problem.
Signed-off-by: YOKOTA Hiroshi <yokota.hgml@gmail.com>
Signed-off-by: Johannes Sixt <j6t@kdbg.org>
Many contributors to software use a Language Server Protocol
implementation to allow their editor to learn structural information
about the code they write and provide additional features, such as
jumping to the declaration or definition of a function or type. In C,
the usual implementation is clangd, which requires compiling with clang.
Because C and C++ projects lack a standard file system layout and build
system, unlike languages such as Rust and Go, clangd requires a
compilation database to be generated by the clang compiler in order to
pass the proper compilation flags and discover all of the files
necessary to make the LSP work. This is done by setting
GENERATE_COMPILATION_DATABASE to "yes".
However, when that's enabled and the user runs "make" a second time,
all of the files are re-compiled, which is inconvenient for contributors
to Git, since it makes small changes or rebases recompile the entirety
of the codebase. This happens because the directory holding the
compilation database is updated anytime an object is built, so its
modification date will always be newer than the first object built.
To solve this, use the same trick we do just above for the .depend
directory and filter the compilation database directory out if it
already exists, which avoids making it a target to be built.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Helped-by: Philippe Blain <levraiphilippeblain@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
It has been reported that "git rebase --rebase-merges" can create
corrupted reflog entries like
e9c962f2ea0 HEAD@{8}: <binary>�: Merged in <branch> (pull request #4441)
This is due to a use-after-free bug that happens because
reflog_message() uses a static `struct strbuf` and is not called to
update the current reflog message stored in `ctx->reflog_message` when
creating the merge. This means `ctx->reflog_message` points to a stale
reflog message that has been freed by subsequent call to
reflog_message() by a command such as `reset` that used the return value
directly rather than storing the result in `ctx->reflog_message`.
Fix this by creating the reflog message nearer to where the commit is
created and storing it in a local variable which is passed as an
additional parameter to run_git_commit() rather than storing the message
in `struct replay_ctx`. This makes it harder to forget to call
`reflog_message()` before creating a commit and using a variable with a
narrower scope means that a stale value cannot carried across a from one
iteration of the loop to the next which should prevent any similar
use-after-free bugs in the future.
A existing test is modified to demonstrate that merges are now created
with the correct reflog message.
Reported-by: Kristoffer Haugsbakk <code@khaugsbakk.name>
Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In the next commit these functions will be called from pick_one_commit()
so move them above that function to avoid a forward declaration.
Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* 'master' of https://github.com/j6t/gitk:
gitk: add Tamil translation
gitk: limit PATH search to bare executable names
gitk: _search_exe is no longer needed
gitk: override $PATH search only on Windows
gitk: adjust indentation to match the style used in this script
* 'master' of https://github.com/j6t/git-gui:
git-gui: treat the message template file as a built file
git-gui: heed core.commentChar/commentString
git-gui: po/README: update repository location and maintainer
* js/po-update-workflow:
git-gui: treat the message template file as a built file
git-gui: po/README: update repository location and maintainer
Signed-off-by: Johannes Sixt <j6t@kdbg.org>
"git mv a a/b dst" would ask to move the directory 'a' itself, as
well as its contents, in a single destination directory, which is
a contradicting request that is impossible to satisfy. This case is
now detected and the command errors out.
* ps/mv-contradiction-fix:
builtin/mv: convert assert(3p) into `BUG()`
builtin/mv: bail out when trying to move child and its parent
hashmap API clean-up to ensure hashmap_clear() leaves a cleared map
in a reusable state.
* en/hashmap-clear-fix:
hashmap: ensure hashmaps are reusable after hashmap_clear()
This commit adds the `git-credential-outlook` and `git-credential-gmail`
helpers to the list of OAuth helpers.
Signed-off-by: Aditya Garg <gargaditya08@live.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
OAuth2.0 is a new authentication method that is being used by many email
providers, including Outlook and Gmail. Recently, the Authen::SASL perl
module has been updated to support OAuth2.0 authentication, thus making
the git-send-email script be able to use this authentication method as
well. So lets improve the documentation to reflect this change.
I also had a hard time finding a reliable OAuth2.0 access token
generator for Outlook and Gmail. So I added a link to the such
generators which I developed myself after seaching through lots of code
and API documentation to make things easier for others.
Signed-off-by: Aditya Garg <gargaditya08@live.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The current implementation of a valid Fully Qualified Domain Name
is not that strict. It just checks whether it has a dot (.) and
if using macOS, it should not end with .local. As per RFC1035[1],
from what I understood, the following checks need to be done:
- The domain must contain atleast one dot
- Each label (separated by dots) must be 1-63 characters long
- Labels must start and end with an alphanumeric character
- Labels can contain alphanumeric characters and hyphens
Here are some examples of valid and invalid labels:
'example.com', # Valid
'sub.example.com', # Valid
'my-domain.org', # Valid
'localhost', # Invalid (no dot)
'MacBook..', # Invalid (double dots)
'-example.com', # Invalid (starts with a hyphen)
'example-.com', # Invalid (ends with a hyphen)
'example..com', # Invalid (double dots)
'example', # Invalid (no TLD)
'example.local', # Invalid on macOS
'valid-domain.co.uk', # Valid
'123.example.com', # Valid
'example.com.', # Invalid (trailing dot)
'toolonglabeltoolonglabeltoolonglabeltoolonglabeltoolonglabeltoolonglabel.com', # Invalid (label > 63 chars)
Due to current implementation, I was not able to send emails from
Ubuntu. Upon debugging, I found that the SMTP domain being passed
to Outlook's servers was not valid.
Net::SMTP=GLOB(0x5db4351225f8)>>> EHLO MacBook..
Net::SMTP=GLOB(0x5db4351225f8)<<< 501 5.5.4 Invalid domain name
Net::SMTP=GLOB(0x5db4351225f8)>>> HELO MacBook..
Notice that an invalid domain name "MacBook.." is sent by git-send-email.
We have a fallback code that checks output from Net::Domain::domainname()
or asking domain method of an Net::SMTP instance to detect a misconfigured
hostname and replace it with fallback "localhost.localdomain", but the
valid_fqdn apparently is failing to say "MacBook.." is not a valid fqdn.
With this patch, the rule used in valid_fqdn is tightened, the beginning
part of the SMTP exchange looked like this:
Net::SMTP=GLOB(0x58c8af71e930)>>> EHLO localhost.localdomain
Net::SMTP=GLOB(0x58c8af71e930)<<< 250-PN4P287CA0064.outlook.office365.com Hello
[1]: https://datatracker.ietf.org/doc/html/rfc1035
Signed-off-by: Aditya Garg <gargaditya08@live.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Some distros, notably Fedora, want to install non-core Perl libraries
into specific directory, namely /usr/share/perl5/vendor_perl.
The Makefile build system allows this by overriding perllibdir variable,
let's make meson works on par with our Makefile.
Signed-off-by: Đoàn Trần Công Danh <congdanhqx@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When users want to enable the latest and greatest configuration options
recommended by Scalar after a Git upgrade, 'scalar reconfigure --all' is
a great option that iterates over all repos in the multi-valued
'scalar.repos' config key.
However, this feature previously forced users to enable background
maintenance. In some environments this is not preferred.
Add a new --maintenance=<mode> option to 'scalar reconfigure' that
provides options for enabling (default), disabling, or leaving
background maintenance config as-is.
Helped-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When creating a new enlistment via 'scalar clone', the default is to set
up situations that work for most user scenarios. Background maintenance
is one of those highly-recommended options for most users.
However, when using 'scalar clone' to create an enlistment in a
different situation, such as prepping a VM image, it may be valuable to
disable background maintenance so the manual maintenance steps do not
get blocked by concurrent background maintenance activities.
Add a new --no-maintenance option to 'scalar clone'.
Signed-off-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When registering a repository with Scalar to get the latest opinionated
configuration, the 'scalar register' command will also set up background
maintenance. This is a recommended feature for most user scenarios.
However, this is not always recommended in some scenarios where
background modifications may interfere with foreground activities.
Specifically, setting up a clone for use in automation may require doing
certain maintenance steps in the foreground that could become blocked by
concurrent background maintenance operations.
Allow the user to specify --no-maintenance to 'scalar register'. This
requires updating the method prototype for register_dir(), so use the
default of enabling this value when otherwise specified.
Signed-off-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In advance of adding a --[no-]maintenance option to several 'scalar'
subcommands, extend the register_dir() method to include an option for
how it should handle background maintenance.
It's important that we understand the context of toggle_maintenance()
that will enable _or disable_ maintenance depending on its input value.
Add a doc comment with this information.
Similarly, update register_dir() to either enable maintenance or leave
it alone.
Signed-off-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Follow the lead of 5377abc0c9d5 ("po/git.pot: don't check in result
of "make pot"", 2022-05-26) in the Git repository and do not track
git-gui.pot anymore.
Instead, translators are expected to integrate an up-to-date version
from the master branch into their translation file using
make ALL_POFILES=po/xx.po update-po
Update README to describe the new process. It is now understood that
different translations need not be based on the same message template
file, but rather individual translators should base their translation
on the most up-to-date code. Remove the section that addresses the
i18n coordinator as it does not apply when no common base is required
among translators.
Signed-off-by: Johannes Sixt <j6t@kdbg.org>
While git-gc(1) knows to garbage collect the rerere cache,
git-maintenance(1) does not yet have a task for this cleanup. Introduce
a new "rerere-gc" task to plug this gap.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In a subsequent commit we are going to introduce a new "rerere-gc" task
for git-maintenance(1). To prepare for this, refactor the code that
spawns `git rerere gc` into a separate function.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
While git-gc(1) knows to prune stale worktrees, git-maintenance(1) does
not yet have a task for this cleanup. Introduce a new "worktree-prune"
task to plug this gap.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In a subsequent commit we will introduce a new "worktree-prune" task for
git-maintenance(1). To prepare for this, refactor the code that spawns
`git worktree prune` into a separate function.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We use a couple of global variables to assemble command line arguments
for subprocesses we execute in git-gc(1). All of these variables except
the one for git-repack(1) are only used in a single place though, so
they don't really add anything but confusion.
Remove those variables.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The parameters of `cmd_gc()` aren't indented properly. Fix this.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Compiling/linking 82e79c63642c on an older MacOs machine (like Xcode
14.3.1, the last version of 14.x series) leads to this:
Undefined symbols for architecture x86_64:
"_false_but_the_compiler_does_not_know_it_", referenced from:
_start_command in libgit.a(run-command.o)
The linker fails to pick up compiler-tricks/not-constant.o that
defines the needed false_but_the_compiler_does_not_know_it_ symbol,
which is the only thing defined in that object file, from the
libgit.a archive.
Initializing the variable explicitly to 0 works around the linker
bug; the symbol type changes from 'C' to 'S' and is picked up by the
linker.
Xcode 15 introduces a new linker, which seems to fix the bug, making
the workaround here unnecessary, and Apple requires to build with
Xcode 16 or later in order to upload to their App Store Connect
since April 24, 2025, but not everybody is expected to upgrade their
toolchain immediately.
Helped-by: Koji Nakamaru <koji.nakamaru@gree.net>
Signed-off-by: Torsten Bögershausen <tboegi@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
No, this is not about a quiz on regexp compatibility between Perl
and sed.
Back when cdbdc6bf (t: refactor tests depending on Perl substitution
operator, 2025-04-03) rewrote many uses of perl with sed, the general
pattern of the original scripts were
chmod +w some_read_only_file &&
perl -p -e "regexp to munge" some_read_only_file >some_tmp &&
mv some_tmp some_read_only_file
persumably because the author knew that replacing some_read_only_file
with "mv" at the last step would not work without "mv -f" in some
environments (GNU seems to succeed without giving any prompt when
not running interactively, which is what happens when running t/
scripts). Replacing perl with sed would be fine as long as sed with
updated regexp does the equivalent munging.
But one place used to use a different construct in the original:
perl -i.bak -p -e "regexp to munge" some_read_only_file
With _no_ temporary file or "mv", "perl -i" allows you to replace a
read-only file in place.
When we replaced the use of "perl" with "sed" in the said commit,
however, because "sed -i" is not portable, we rewrote that in-place
replacement to
sed "regexp to munge" some_read_only_file >some_tmp &&
mv some_tmp some_read_only_file
Again, unfortunately that does not work in some environment, without
"mv -f".
We could run "mv -f" here, but we would then need to remove "chmod
+w" and have them use "mv -f" instead at all places that were
touched cdbdc6bf (t: refactor tests depending on Perl substitution
operator, 2025-04-03) to be consistent (and more concise).
For now, let's make it consistent in the other direction by mimick
the other places that made the target read-write before moving.
Speaking of portability, the outcome of using "sed" on non-text
files is unspecified, so the entire exercise of cdbdc6bf may have
needed to be reverted if people still used ancient version of
"standard compliant" sed that barfs on non-text files, but these
days we may be able to get away with "BSDs and GNU seem OK with it"
;-) But one fix at a time.
Reported-by: Torsten Bögershausen <tboegi@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
As documented on NetBSD's man page, open with the O_NOFOLLOW flag and a
symlink returns -1 and sets errno to EFTYPE which differs from POSIX.
This patch fixes the following test failure:
$ sh t0602-reffiles-fsck.sh --verbose
--- expect 2025-05-02 23:05:23.920890147 +0000
+++ err 2025-05-02 23:05:23.916794959 +0000
@@ -1 +1 @@
-error: packed-refs: badRefFiletype: not a regular file but a symlink
+error: unable to open '.git/packed-refs': Inappropriate file type or format
not ok 12 - the filetype of packed-refs should be checked
FreeBSD has the same issue for EMLINK instead of EFTYPE.
This portability issue was introduced in cfea2f2da8 (packed-backend:
check whether the "packed-refs" is regular file, 2025-02-28)
Signed-off-by: Collin Funk <collin.funk1@gmail.com>
Acked-by: brian m. carlson <sandals@crustytoothpaste.net>
Acked-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add an equivalent to "make hdr-check" target to meson based builds.
* kn/meson-hdr-check:
makefile/meson: add 'check-headers' as alias for 'hdr-check'
meson: add support for 'hdr-check'
meson: rename 'third_party_sources' to 'third_party_excludes'
meson: move headers definition from 'contrib/coccinelle'
coccinelle: meson: rename variables to be more specific
ci/github: install git before checking out the repository
Code clean-up for meson-based build infrastructure.
* es/meson-cleanup:
meson: only check for missing networking syms on non-Windows; add compat impls
meson: fix typo in function check that prevented checking for hstrerror
meson: add a couple missing networking dependencies
meson: do a full usage-based compile check for sysinfo
meson: check for getpagesize before using it
meson: simplify and parameterize various standard function checks
The build procedure based on Meson learned to drive the
benchmarking tests.
* ps/meson-build-perf-bench:
meson: wire up benchmarking options
meson: wire up benchmarks
t/perf: fix benchmarks with out-of-tree builds
t/perf: use configured PERL_PATH
t/perf: fix benchmarks with alternate repo formats
Update to arm64 Windows port.
* js/windows-arm64:
max_tree_depth: lower it for clangarm64 on Windows
mingw(arm64): do move the `/etc/git*` location
msvc: do handle builds on Windows/ARM64
mingw: do not use nedmalloc on Windows/ARM64
config.mak.uname: add support for clangarm64
bswap.h: add support for built-in bswap functions
Our CI needs to be aware of the location of the test output directory so
that it knows where to find test results. Some of our CI jobs achieve
this by setting the `TEST_OUTPUT_DIRECTORY` environment variable, which
ensures that the output will be written to that directory. Other jobs,
especially on GitHub Workflows, don't set that environment variable and
instead expect test results to be located in the source directory in
"t/".
The latter logic does not work with Meson though, as the test results
are not written into the source directory by default, but instead into
the build directory. As such, any job that uses Meson without setting
the environment variable will be unable to locate and aggregate results.
Fix this by explicitly setting the test output directory when we set up
the Meson build directory. Like this, we can easily default to "t/" in
the source directory when the value hasn't been set explicitly.
Reported-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Now that we dropped `contrib/buildsystems/generate` to generate Visual
Studio Solution files, it is time to also drop the `vcxproj` Makefile
target that depended on that script.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Before we had CMake support, the only way to build Git in Visual Studio
was via this hacky `generate` script.
For a while I tried to fix whenever things got broken, in particular to
allow building confidence in embargoed releases by running the CI builds
in Azure Pipelines in a private Azure DevOps project. I even carried the
patches in Git for Windows with the intention of upstreaming them,
eventually.
However, it is a lot of work with too little benefit. CMake is much
better supported by Visual Studio. So let's drop this hacky script (plus
support code).
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
It is not useful because we do not have any persisted directory anymore,
not since dropping our Travis CI support.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
7b399322a2e (doc: apply new format to git-branch man page, 2025-03-19)
updated the formatting for this doc to, among other things, use backtick
for some elements. In the process `è` was used by accident instead
of backtick.
Signed-off-by: Kristoffer Haugsbakk <code@khaugsbakk.name>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The tilde (~) count doesn’t match the length of the heading. In turn
you get a bunch of `<sub>~</sub>` instead of the intended `<h3>` in the
HTML output.
Signed-off-by: Kristoffer Haugsbakk <code@khaugsbakk.name>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When the `win+Meson` job was added to Git's CI, modeled after the
`win+vs` job, it overlooked that the latter built the Git artifacts in
release mode.
The reason for this is that there is code in `compat/mingw.c` that turns
on the modal assertion dialogs in debug mode, which are very useful when
debugging interactively (as they offer to attach Visual Studio's
debugger), but they are scarcely useful in CI builds (where that modal
dialog would sit around, waiting for a human being to see and deal with
it, which obviously won't ever happen).
This problem was not realized immediately because of a separate bug: the
`win+Meson` job erroneously built using the `gcc` that is in the `PATH`
by default on hosted GitHub Actions runners. Since that bug was fixed by
switching to `--vsenv`, though, the t7001-mv test consistently timed out
after six hours in the CI builds on GitHub, quite often, and wasting
build minutes without any benefit in return.
The reason for this timeout was a symptom of aforementioned debug mode
problem, where the test case 'nonsense mv triggers assertion failure and
partially updated index' in t7001-mv triggered an assertion.
I originally proposed this here patch to address the timeouts in CI
builds. The Git project decided to address this timeout differently,
though: by fixing the bug that the t7001-mv test case demonstrated. This
does not address the debug mode problem, though, as an `assert()` call
could be triggered in other ways in CI, and it should still not cause
the CI build to hang but should cause Git to error out instead. To avoid
having to accept this here patch, it was then proposed to replace all
`assert()` calls in Git's code base by `BUG()` calls. This might be
reasonable for independent reasons, but it obviously still does not
address the debug mode problem, as `assert()` calls could be easily
re-introduced by mistake, and besides, Git has a couple of dependencies
that all may have their own `assert()` calls (which are then safely
outside the control of the Git project to remove), therefore this here
patch is still needed.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Acked-by: Patrick Steinhardt <ps@pks.im>
[jc: rebased on 'maint' to enable fast-tracking the change down]
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When a path with wildcard characters, e.g. 'f*o', exists in the
working tree, "git add -- 'f*o'" stops after happily finding
that there is 'f*o' and adding it to the index, without
realizing there may be other paths, e.g. 'foooo', that may match
the given pathspec.
This is because dir.c:do_match_pathspec() disables further
matches with pathspec when it finds an exact match.
Reported-by: piotrsiupa <piotrsiupa@gmail.com>
Helped-by: Jeff King <peff@peff.net>
Signed-off-by: K Jayatheerth <jayatheerthkulkarni2005@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When commit 50ddb089ff68 (tree-walk.c: remove the_repo from
get_tree_entry(), 2019-06-27) added an extra parameter to
get_tree_entry(), it did not fix the ordering comment about the meaning
of the parameters. Rather than just changing "third"->"fourth" and
"fourth"->"fifth", give the paramemters meaningful names (or actually,
just take the existing names from the get_tree_entry() definition in the
tree-walk.c file) and while at it, tweak the rest of the description to
incorporate the other parameter names as well.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The use of asserts is discouraged in our codebase because they lead to
different behaviour depending on how Git is built. When being unsure
enough whether a condition always holds so that one adds the assert,
then the assert should probably trigger regardless of how Git is being
built.
Drop the call to assert(3p) in git-mv(1) and instead use `BUG()`.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We have a known issue in git-mv(1) where moving both a child and any of
its parents causes an assert to trigger because the child cannot be
found anymore in the index. We have added a test for this in commit
0fcd473fdd3 (t7001: add failure test which triggers assertion,
2024-10-22) without addressing the issue, which is why the test itself
is marked as `test_expect_failure`.
The behaviour of that test relies on a call to assert(3p) though, which
may or may not be compiled into the resulting binary depending on
whether or not we pass `-DNDEBUG`. When these asserts are compiled into
Git this may cause our CI to hang on Windows though, because asserts may
cause a modal window to be shown.
While we could work around the issue by converting this into a call to
`BUG()`, let's rather address the root cause of the issue by bailing out
in case we see that both a child and any of its parents are being moved
in the same command.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Reduce requirement for Perl in our documentation build and a few
scripts.
* ps/fewer-perl:
Documentation: stop depending on Perl to generate command list
Documentation: stop depending on Perl to massage user manual
request-pull: stop depending on Perl
filter-branch: stop depending on Perl
Overhaul of the reftable API.
* ps/reftable-api-revamp:
reftable/table: move printing logic into test helper
reftable/constants: make block types part of the public interface
reftable/table: introduce iterator for table blocks
reftable/table: add `reftable_table` to the public interface
reftable/block: expose a generic iterator over reftable records
reftable/block: make block iterators reseekable
reftable/block: store block pointer in the block iterator
reftable/block: create public interface for reading blocks
git-zlib: use `struct z_stream_s` instead of typedef
reftable/block: rename `block_reader` to `reftable_block`
reftable/block: rename `block` to `block_data`
reftable/table: move reading block into block reader
reftable/block: simplify how we track restart points
reftable/blocksource: consolidate code into a single file
reftable/reader: rename data structure to "table"
reftable: fix formatting of the license header
Since a call to repo_config() can be called with repo set to NULL
these days, a command that is marked as RUN_SETUP in the builtin
command table does not have to check repo with NULL before making
the call.
* ua/call-repo-config-with-possibly-null-repository:
builtin/difftool: remove unnecessary if statement
builtin/add: remove unnecessary if statement
The cleanup_records function marks some lines as changed before running
the actual diff algorithm. For most lines, this is a good performance
optimization, but it also marks lines that are surrounded by many
changed lines as changed as well. This can cause redundant changes and
longer-than-necessary diffs.
Whether this results in better-looking diffs is subjective. However, the
--minimal flag explicitly requests the shortest possible diff.
The change results in shorter diffs in about 1.3% of all diffs in Git's
history. Performance wise, I have measured the impact on
"git log -p -3000 --minimal > /dev/null". With this change, I get
Time (mean ± σ): 2.363 s ± 0.023 s (25 runs)
and without this patch I measured
Time (mean ± σ): 2.362 s ± 0.035 s (25 runs).
As the difference is well within the margin of error, this does not seem
to have an impact on performance.
Signed-off-by: Niels Glodny <n.glodny@campus.lmu.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Before accessing an array element at a given index, it should be
verified that the index is within the desired bounds, not afterwards,
otherwise it may not make sense to even access the array element in the
first place. This is the point of CodeQL's
`cpp/offset-use-before-range-check` rule.
This CodeQL rule unfortunately is also triggered by the
`fill_es_indent_data()` code, even though the condition `off < len - 1`
does not even need to guarantee that the offset is in bounds (`s` points
to a NUL-terminated string, for which `s[off] == '\r'` would fail before
running out of bounds).
Let's work around this rare false positive to help us use an otherwise
mostly useful tool is a worthy thing to do.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In the preceding commits we have converted all users of
`repo_has_object_file()` and its `_with_flags()` variant to instead use
`has_object()`. Drop these functions.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
As the comment of `repo_has_object_file()` and its `_with_flags()`
variant tells us, these functions are considered to be deprecated in
favor of `has_object()`. There are a couple of slight benefits in favor
of the replacement:
- The new function has a short-and-sweet name.
- More explicit defaults: `has_object()` doesn't fetch missing objects
via promisor remotes, and neither does it reload packfiles if an
object wasn't found by default. This ensures that it becomes
immediately obvious when a simple object existence check may result
in expensive actions.
Most importantly though, it is confusing that we have two sets of
functions that ultimately do the same thing, but with different
defaults.
Start sunsetting `repo_has_object_file()` and its `_with_flags()`
sibling by replacing all callsites with `has_object()`:
- `repo_has_object_file(...)` is equivalent to
`has_object(..., HAS_OBJECT_RECHECK_PACKED | HAS_OBJECT_FETCH_PROMISOR)`.
- `repo_has_object_file_with_flags(..., OBJECT_INFO_QUICK | OBJECT_INFO_SKIP_FETCH_OBJECT)`
is equivalent to `has_object(..., 0)`.
- `repo_has_object_file_with_flags(..., OBJECT_INFO_SKIP_FETCH_OBJECT)`
is equivalent to `has_object(..., HAS_OBJECT_RECHECK_PACKED)`.
- `repo_has_object_file_with_flags(..., OBJECT_INFO_QUICK)`
is equivalent to `has_object(..., HAS_OBJECT_FETCH_PROMISOR)`.
The replacements should be functionally equivalent.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We're about to fully remove `repo_has_object_file()` in favor of
`has_object()`. The latter function does not yet have a way to fetch
missing objects via a promisor remote though, which means that it cannot
fully replace all usecases of `repo_has_object_file()`.
Introduce a new flag `HAS_OBJECT_FETCH_PROMISOR` that causes the
function to optionally fetch missing objects which are part of a
promisor pack. This flag will be used in the subsequent commit.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We carry declarations for a couple of functions in "object-store.h" that
are not defined in "object-store.c", but in a different subsystem. Move
these declarations to the respective headers whose matching code files
carry the corresponding definition.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The function `odb_pack_keep()` creates a file at the passed-in path. If
this fails, then the function re-tries by first creating any potentially
missing leading directories and then trying to create the file once
again. As such, this function doesn't host any kind of logic that is
specific to the object store, but is rather a generic helper function.
Rename the function to `safe_create_file_with_leading_directories()` and
move it into "path.c". While at it, refactor it so that it loses its
dependency on `the_repository`.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The function `loose_object_path()` is a trivial wrapper around
`odb_loose_path()`, with the only exception that it always uses the
primary object database of the given repository. This doesn't really add
a ton of value though, so let's drop the function and inline it at every
callsite.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The "object-store.h" header contains the definition of `struct
packed_git`. As this structure hosts all kind of information about a
specific packfile it is arguably a bit out of place in a generic place
like "object-store.h".
Move the structure as well as `pack_map_entry_cmp()` into "packfile.h".
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add an option to allow users to specifically enable or disable
retrieving the Message-ID from the Outlook SMTP server. This can be used
for other hosts mimicking the behaviour of Outlook, or for users who set
a custom domain to be a CNAME for the Outlook SMTP server.
While at it, lets also add missing * in description of --no-smtp-auth.
Helped-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Aditya Garg <gargaditya08@live.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In the series merged at bf0a430f70b5 (Merge branch 'en/strmap',
2020-11-21), strmap was built on top of hashmap and hashmap was extended
in a few ways to support strmap and be more generally useful. One of
the extensions was that hashmap_partial_clear() was introduced to allow
reuse of the hashmap without freeing the table. Peff believed that it
also made sense to introduce a hashmap_clear() which freed everything
while allowing reuse.
I added hashmap_clear(), but in doing so, overlooked the fact that for
a hashmap to be reusable, it needs a defined cmpfn and data (the
HASHMAP_INIT macro requires these fields as parameters, for example).
So, if we want the hashmap to be reusable, we shouldn't zero out those
fields. We probably also shouldn't zero out do_count_items. (We could
zero out grow_at and shrink_at, but whether we zero those or not is
irrelevant as they'll be automatically updated whenever a new entry is
inserted.)
Since clearing is associated with freeing map->table, and the only thing
required for consistency after freeing map->table is zeroing tablesize
and private_size, let's only zero those fields out.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
As detailed in the previous changes to t5309-pack-delta-cycles.sh, the
logic within 'git index-pack' to analyze an incoming thin packfile with
REF_DELTAs is suspect. The algorithm is overly cautious around delta
cycles, and that leads in fact to failing even when there is no cycle.
This change adjusts the algorithm to no longer fail in these cases. In
fact, these cycle cases will no longer fail but more importantly the
valid cases will no longer fail, either. The resulting packfile from the
--fix-thin operation will not have cycles either since REF_DELTAs are
forbidden from the on-disk format and OFS_DELTAs are impossible to write
as a cycle.
The crux of the matter is how the algorithm works when the REF_DELTAs
point to base objects that exist in the local repository. When reading
the thin packfile, the object IDs for the delta objects are unknown so
we do not have the delta chain structure automatically. Instead, we need
to start somewhere by selecting a delta whose base is inside our current
object database.
Consider the case where the packfile has two REF_DELTA objects, A and B,
and the delta chain looks like "A depends on B" and "B depends on C" for
some third object C, where C is already in the current repository. The
algorithm _should_ start with all objects that depend on C, finding B,
and then moving on to all objects depending on B, finding A.
However, if the repository also already has object B, then the delta
chain can be analyzed in a different order. The deltas with base B can
be analyzed first, finding A, and then the deltas with base C are
analyzed, finding B. The algorithm currently continues to look for
objects that depend on B, finding A again. This fails due to A's
'real_type' member already being overwritten from OBJ_REF_DELTA to the
correct object type.
This scenario is possible in a typical 'git fetch' where the client does
not advertise B as a 'have' but requests A as a 'want' (and C is noticed
as a common object based on other 'have's). The reason this isn't
typically seen is that most Git servers use OFS_DELTAs to represent
deltas within a packfile. However, if a server uses only REF_DELTAs,
then this kind of issue can occur. There is nothing in the explicit
packfile format that states this use of inter-pack REF_DELTA is
incorrect, only that REF_DELTAs should not be used in the on-disk
representation to avoid cycles.
This die() was introduced in ab791dd138 (index-pack: fix race condition
with duplicate bases, 2014-08-29). Several refactors have adjusted the
error message and the surrounding logic, but this issue has existed for
a longer time as that was only a conversion from an assert().
The tests in t5309 originated in 3b910d0c5e (add tests for indexing
packs with delta cycles, 2013-08-23) and b2ef3d9ebb (test index-pack on
packs with recoverable delta cycles, 2013-08-23). These changes make
note that the current behavior of handling "resolvable" cycles is mostly
a documentation-only test, not that this behavior is the best way for
Git to handle the situation.
The fix here is somewhat complicated due to the amount of state being
adjusted by the loop within threaded_second_pass(). Instead of trying to
resume the start of the loop while adjusting the necessary context, I
chose to scan the REF_DELTAs depending on the current 'parent' and skip
any that have already been processed. This necessarily leaves us in a
state where 'child' and 'child_obj' could be left as NULL and that must
be handled later. There is also some careful handling around skipping
REF_DELTAs when there are also OFS_DELTAs depending on that parent.
There may be value in extending 'test-tool pack-deltas' to allow writing
OFS_DELTAs in order to exercise this logic across the delta types.
Signed-off-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This new test demonstrates some behavior where a valid packfile is being
rejected by the Git client due to the order in which it is resolving
REF_DELTAs.
The thin packfile has a REF_DELTA chain A->B->C where C is not included
in the packfile. However, the client repository contains both C and B
already. Thus, 'git index-pack' is able to resolve A before resolving B.
When resolving B, it then attempts to resolve any other REF_DELTAs that
are pointing to B as a base. This "revisits" A and complains as if there
is a cycle, but it did not actually detect a cycle.
A fix will arrive in the next change.
Signed-off-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When trying to demonstrate certain behavior in tests, it can be helpful
to create packfiles that have specific delta structures. 'git
pack-objects' uses various algorithms to select deltas based on their
compression rates, but that does not always demonstrate all possible
packfile shapes. This becomes especially important when wanting to test
'git index-pack' and its ability to parse certain pack shapes.
We have prior art in t/lib-pack.sh, where certain delta structures are
produced by manually writing certain opaque pack contents. However,
producing these script updates is cumbersome and difficult to do as a
contributor.
Instead, create a new test-tool, 'test-tool pack-deltas', that reads a
list of instructions for which objects to include in a packfile and how
those objects should be written in delta form.
At the moment, this only supports REF_DELTAs as those are the kinds of
deltas needed to exercise a bug in 'git index-pack'.
Signed-off-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Wire up a couple of benchmarking options that we end up writing into our
"GIT-BUILD-OPTIONS" file. These options allow users to control how
exactly benchmarks are executed.
Note that neither `GIT_PERF_MAKE_COMMAND` nor `GIT_PERF_MAKE_OPTS` are
exposed as a build option. Those options are used by "t/perf/run", which
is not used by Meson.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Wire up benchmarks in Meson. The setup is mostly the same as how we wire
up our tests. The only difference is that benchmarks get wired up via
the `benchmark()` option instead of via `test()`, which gives them a bit
of special treatment:
- Benchmarks never run in parallel.
- Benchmarks aren't run by default when tests are executed.
- Meson does not inject the `MALLOC_PERTURB` environment variable.
Using benchmarks is quite simple:
```
$ meson setup build
# Run all benchmarks.
$ meson test -C build --benchmark
# Run a specific benchmark.
$ meson test -C build --benchmark p0000-*
```
Other than that the usual command line arguments accepted when running
tests are also accepted when running benchmarks.
Note that the benchmarking target is somewhat limited because it will
only run benchmarks for the current build. Other use cases, like running
benchmarks against multiple different versions of Git, are not currently
supported. Users should continue to use "t/perf/run" for those use
cases. The script should get extended at one point in time to support
Meson, but this is outside of the scope of this series.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The "perf-lib.sh" script is sourced by all of our benchmarking suites to
make available common infrastructure. The script assumes that build and
source directory are the same, which works for our Makefile. But the
assumption breaks with both CMake and Meson, where the build directory
can be located in an arbitrary place.
Adapt the script so that it works with out-of-tree builds. Most
importantly, this requires us to figure out the location of the build
directory:
- When running benchmarks via our Makefile the build directory is the
same as the source directory. We already know to derive the test
directory ("t/") via `$(pwd)/..`, which works because we chdir into
"t/perf" before executing benchmarks. We can thus derive the build
directory by appending another "/.." to that path.
- When running benchmarks via Meson the build directory is located at
an arbitrary location. The build system thus has to make the path
known by exporting the `GIT_BUILD_DIR` environment variable.
This change prepares us for wiring up benchmarks in Meson.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Our benchmarks use a couple of Perl scripts to compute results. These
Perl scripts get executed directly, and as the shebang is hardcoded to
"/usr/bin/perl" this will fail on any system where the Perl interpreter
is located in a different path.
Our build infrastructure already lets users configure the location of
Perl, which ultimately gets written into the GIT-BUILD-OPTIONS file.
This file is being sourced by "test-lib.sh", and consequently we already
have the "PERL_PATH" variable available that contains its configured
location.
Use "PERL_PATH" to execute Perl scripts, which makes them work on more
esoteric systems like NixOS. Furthermore, adapt the shebang to use
env(1) to execute Perl so that users who have Perl in PATH, but in a
non-standard location can execute the script directly.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Many of our benchmarks operate on a user-defined repository that we copy
over before running the benchmarked logic. To keep unintentional side
effects caused by on-disk state at bay we skip copying some files. This
includes for example hooks, but also the repo's configuration.
It is quite sensible to not copy over the configuration, as it is quite
easy to inadvertently carry over configuration that may significantly
impact the performance measurements. But we cannot fully ignore the
configuration either, as it may contain information about the repository
format. This will cause failures when for example using a repository
with SHA256 object format or the reftable ref format.
Fix the issue by parsing the reference and object formats from the
source repository and passing them to git-init(1).
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The change to the bundle-uri unbundling refspec now includes tags, so this
adds a very, very simple test to make sure that tags in a bundle are
properly added to the cloned repository and will be included in ref
negotiation with the subsequent fetch.
Signed-off-by: Scott Chacon <schacon@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When downloading bundles via the bundle-uri functionality, we only copy the
references from refs/heads into the refs/bundle space. I'm not sure why this
refspec is hardcoded to be so limited, but it makes the ref negotiation on
the subsequent fetch suboptimal, since it won't use objects that are
referenced outside of the current heads of the bundled repository.
This change to copy everything in refs/ in the bundle to refs/bundles/
significantly helps the subsequent fetch, since nearly all the references
are now included in the negotiation.
The update to the bundle-uri unbundling refspec puts all the heads from a
bundle file into refs/bundle/heads instead of directly into refs/bundle/ so
the tests also need to be updated to look in the new heirarchy.
Signed-off-by: Scott Chacon <schacon@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The script generates a Message-ID alongwith the other headers when
gen_header is called, and is sent alongwith the email. For most email
providers, including gmail, the Message-ID goes unchanged to the
recipient.
But, this does not seem to be a case with Outlook. In Outlook, when we
send our own Message-ID as a part of the headers, it discards it. Then
it generates a new random Message-ID and that is what the recipient
gets.
This is a problem because the Message-ID is crucial when we are sending
multiple emails in a thread. The current implementation for threads in
the script replies to the Message-ID it generated, but due to Outlook's
behavior, it is not the same as the one that the recipient got, thus
breaking threads. So a need arises to retrieve the Message-ID from the
server response and set it in the In-Reply-To and References email
headers instead of using the self generated one for the purpose of
replies.
The $smtp->message variable in this script for outlook is something like
this:
2.0.0 OK <Message-ID> [Hostname=Some-hostname]
The Message-ID here is the one the recipient gets, rather than the one
the script generated.
This patch uses the fact above and retrieves the Message-ID from the
server response. It then changes the value of the $message_id variable
to the one received from the server. This value will be used when next
and subsequent messages are sent as replies to the message, thus
preserving the threading of the messages.
Signed-off-by: Aditya Garg <gargaditya08@live.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Meson detects the path of the target shell via `find_program("sh")`,
which essentially does a lookup via `PATH`. This may easily lead to a
subtly-broken Git distribution when the build host has its shell in a
location that the target host doesn't know about.
Fix the issue by appending "/bin" to the custom program path, which
causes us to prefer "/bin/sh" over a `PATH`-based lookup. While
"/bin/sh" isn't standardized, this path tends to work alright on Linux
and BSD distributions. Furthermore, "/bin/sh" is also the path we pick
in our Makefile by default, which further demonstrates that this shell
fulfills our needs.
Note that we intentionally append, not prepend, to the custom program
path. This is because the program path can be configured by the user via
the `-Dsane_tool_path=` build option, which should take precedence over
any defaults we pick for the user.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Git needs to know about a couple of executable paths to pick at runtime.
This includes the system shell, but may also optionally include the Perl
and Python interpreters. Meson detects the location of these paths
automatically via `find_program()`, which does a lookup via the `PATH`
environment variable. As such, it may not be immediately obvious to the
developer which paths have been autodetected.
Improve this by exposing runtime executable paths at setup time.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
These are added in the Makefile, but not in meson. They probably won't
work well on systems without them.
CMake adds them, but only on non-Windows. Actually, it only performs
compiler checks for hstrerror, but excludes that check on Windows with
the note that it is "incompatible with the Windows build". This seems to
be misleading -- it is not incompatible, it simply doesn't exist. Still,
the compat version should not be used.
I interpret this cmake logic to mean we shouldn't even be checking for
symbol availability on Windows. In addition to making it simple to add
compat definitions, this also probably shaves off a second or two of
configure time on Windows as no compiler check needs to be performed.
Signed-off-by: Eli Schwartz <eschwartz@gentoo.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Nowhere in the codebase do we otherwise check for strerror. Nowhere in
the codebase do we make use of -DNO_STRERROR. `strerror` is not a
networking function at all.
We do utilize `hstrerror` though, which is a networking function we
should have been checking here.
Signed-off-by: Eli Schwartz <eschwartz@gentoo.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
As evidenced in config.mak.uname and configure.ac, there are various
possible scenarios where these libraries are default-enabled in the
build, which mainly boils down to: SunOS. -lresolv is simply not the
only library that, when it exists, probably needs to be linked to for
networking.
Check for and add -lnsl -lsocket as well.
Signed-off-by: Eli Schwartz <eschwartz@gentoo.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
On Solaris, sys/sysinfo.h is a completely different file and doesn't
resemble the linux file at all. There is also a sysinfo() function, but
it takes a totally different call signature, which asks for:
- the field you wish to receive
- a `char *buf` to copy the data to
and is very useful IFF you want to know, say, the hardware provider. Or,
get *specific* fields from uname(2).
https://docs.oracle.com/cd/E86824_01/html/E54765/sysinfo-2.html
It is surely possible to do this manually via `sysconf(3)` without the
nice API. I can't find anything more direct. Either way, I'm not very
attached to Solaris, so someone who cares can add it. Either way, it's
wrong to assume that sysinfo.h contains what we are looking for.
Check that sysinfo.h defines the struct we actually utilize in
builtins/gc.c, which will correctly fail on systems that don't have it.
Signed-off-by: Eli Schwartz <eschwartz@gentoo.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
It is deprecated and removed in SUS v3 / POSIX 2001, so various systems
may not include it. Solaris, in particular, carefully refrains from
defining it except inside of a maze of `#ifdef` to make sure you have
kept your nose clean and only used it in code that *targets* SUS v2 or
earlier.
config.mak.uname defines this automatically, though only for QNX.
Signed-off-by: Eli Schwartz <eschwartz@gentoo.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This is repetitive logic. We either want to use some -lc function, or if
it is not available we define it as -DNO_XXX and usually (but not
always) provide some custom compatibility impl instead.
Checking the intent of each block when reading through the file is slow
and not very DRY. Switch to taking an array of checkable functions
instead.
Not all functions are straightforward to move, since different macro
prefixes are used.
Signed-off-by: Eli Schwartz <eschwartz@gentoo.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
As Matthias Sohn, JGit maintainer, recommends, update the JGit
download link from repo.eclipse.org to a one in maven.org
Helped-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
An earlier fix added an extra message immediately after failing to
download a third-party package. But near the end of the script,
their availability is checked again and given a message.
Remove the new ones added with a recent fix, as they are redundant.
If we were to add more places to download these software (e.g. for
other platforms we currently do not download them on), the existing
warnning near the end of the script will also trigger.
While at it, as Dscho suggests, rewrite the WARNING: label on the
warning message to :⚠️:, which presumably should be shown a
bit more prominently in the CI summary.
Suggested-by: Johannes Schindelin <Johannes.Schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Various build tweaks, including CSPRNG selection on some platforms.
* rj/build-tweaks:
config.mak.uname: set CSPRNG_METHOD to getrandom on Linux
config.mak.uname: add arc4random to the cygwin build
config.mak.uname: add sysinfo() configuration for cygwin
builtin/gc.c: correct RAM calculation when using sysinfo
config.mak.uname: add clock_gettime() to the cygwin build
config.mak.uname: add HAVE_GETDELIM to the cygwin section
config.mak.uname: only set NO_REGEX on cygwin for v1.7
config.mak.uname: add a note about NO_STRLCPY for Linux
Makefile: remove NEEDS_LIBRT build variable
meson.build: set default help format to html on windows
meson.build: only set build variables for non-default values
Makefile: only set some BASIC_CFLAGS when RUNTIME_PREFIX is set
meson.build: remove -DCURL_DISABLE_TYPECHECK
Update parse-options API to catch mistakes to pass address of an
integral variable of a wrong type/size.
* ps/parse-options-integers:
parse-options: detect mismatches in integer signedness
parse-options: introduce precision handling for `OPTION_UNSIGNED`
parse-options: introduce precision handling for `OPTION_INTEGER`
parse-options: rename `OPT_MAGNITUDE()` to `OPT_UNSIGNED()`
parse-options: support unit factors in `OPT_INTEGER()`
global: use designated initializers for options
parse: fix off-by-one for minimum signed values
Document the convention to disable hooks altogether by setting the
hooksPath configuration variable to /dev/nulll
* ds/doc-disable-hooks:
docs: document core.hooksPath=/dev/null
Code clean-up.
* ps/object-file-cleanup:
object-store: merge "object-store-ll.h" and "object-store.h"
object-store: remove global array of cached objects
object: split out functions relating to object store subsystem
object-file: drop `index_blob_stream()`
object-file: split up concerns of `HASH_*` flags
object-file: split out functions relating to object store subsystem
object-file: move `xmmap()` into "wrapper.c"
object-file: move `git_open_cloexec()` to "compat/open.c"
object-file: move `safe_create_leading_directories()` into "path.c"
object-file: move `mkdir_in_gitdir()` into "path.c"
Make sure outage of third-party sites that supply P4, Git-LFS, and
JGit we use for testing would not prevent our CI jobs from running
at all.
* jc/ci-skip-unavailable-external-software:
ci: skip unavailable external software
Ever since we issued 2.49, external forces broke our CI jobs in
various ways, and we had to adjust our code to work them around.
Backmerge them from the 'master' front to make it easier to test
real changes to the maintenance track.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Make sure outage of third-party sites that supply P4, Git-LFS, and
JGit we use for testing would not prevent our CI jobs from running
at all.
* jc/ci-skip-unavailable-external-software:
ci: skip unavailable external software
The ci/install-dependencies.sh script used in a very early phase of
our CI jobs downloads Perforce, Git-LFS, and JGit, used for running
the test scripts. The test framework is prepared to properly skip
the tests that depend on these external software, but the CI script
is unnecessarily strict (due to its use of "set -e" in ci/lib.sh)
and fails the entire CI run before even starting to test the rest of
the system.
Notice a failure to download to any of these external software, but
keep going. We need to be careful about cleaning after a failed
wget, as a later part of the script that does:
if type jgit >/dev/null 2>&1
then
echo "$(tput setaf 6)JGit Version$(tput sgr0)"
jgit version
else
echo >&2 "WARNING: JGit wasn't installed, see above for clues why"
fi
will (surprise!) succeed running "type jgit", and then fail with
"jgit version", taking the whole thing down due to "set -e".
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* ps/object-file-cleanup:
object-store: merge "object-store-ll.h" and "object-store.h"
object-store: remove global array of cached objects
object: split out functions relating to object store subsystem
object-file: drop `index_blob_stream()`
object-file: split up concerns of `HASH_*` flags
object-file: split out functions relating to object store subsystem
object-file: move `xmmap()` into "wrapper.c"
object-file: move `git_open_cloexec()` to "compat/open.c"
object-file: move `safe_create_leading_directories()` into "path.c"
object-file: move `mkdir_in_gitdir()` into "path.c"
"git log --{left,right}-only A...B", when A and B does not share
any common ancestor, now behaves as expected.
* mh/left-right-limited:
revision: fix --left/right-only use with unrelated histories
Doc mark-up updates.
* ja/doc-reset-mv-rm-markup-updates:
doc: add markup for characters in Guidelines
doc: fix asciidoctor synopsis processing of triple-dots
doc: convert git-mv to new documentation format
doc: move synopsis git-mv commands in the synopsis section
doc: convert git-rm to new documentation format
doc: fix synopsis analysis logic
doc: convert git-reset to new documentation format
Optimize the code to dedup references recorded in a bundle file.
* kn/bundle-dedup-optim:
bundle: fix non-linear performance scaling with refs
t6020: test for duplicate refnames in bundle creation
"make perf" fixes.
* pb/perf-test-fixes:
p7821: fix instructions for testing with threads
p9210: fix 'scalar clone' when running from a detached HEAD
p7821: fix test_perf invocation for prereqs
When using the launchctl scheduler, the weekly job runs daily, and the
daily job runs on the first six days of each month. This appears to be
due to specifying "Day" in the calendar intervals, which according to
launchd.plist(5) is for specifying days of the month rather than days of
the week. The behaviour of running a job on the 0th day is undocumented,
but in my testing appears to be the same as not specifying "Day" in the
calendar interval, in which case the job will run daily.
Use "Weekday" in the calendar intervals, which is the correct way to
schedule jobs to run on specific days of the week.
Signed-off-by: Josh Heinrichs <joshiheinrichs@gmail.com>
Acked-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The 'hdr-check' target in Meson and makefile is used to check if headers
can be compiled individually. The naming however isn't readable as 'hdr'
is not a common shortforme for 'header', neither is it an abbreviation.
Let's introduce 'check-headers' as an alternative target for 'hdr-check'
and add a `TODO` to deprecate the latter after 2 releases. Since this
is an internal tool, we can use a shorter deprecation cycle.
Change existing usage of 'hdr-check' in 'ci/run-static-analysis.sh' to
also use 'check-headers'.
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The Makefile supports a target called 'hdr-check', which checks if
individual header files can be independently compiled. Let's port this
functionality to Meson, our new build system too. The implementation
resembles that of the Makefile and provides the same check.
Since meson builds are out-of-tree, header dependencies are not
automatically met. So unlike the Makefile version, we also need to add
the required dependencies.
Also add the 'xdiff/' dir to the list of 'third_party_sources' as those
headers must be skipped from the checks too. This also skips the folder
from the 'coccinelle' checks, this is okay, since this code is an
external dependency.
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The 'third_party_sources' variable was moved to the root 'meson.build'
file in the previous commit. The variable is actually used to exclude
third party sources, so rename it accordingly to 'third_party_excludes'
to avoid confusion. While here, remove a duplicate from the list.
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The Meson build for coccinelle static analysis lists all headers to
analyse. Due to the way Meson exports variables between subdirs, this
variable is also available in the root Meson build.
An upcoming commit, will add a new check complimenting 'hdr-check' in
the Makefile. This would require the list of headers. So move the
'coccinelle_headers' to the root Meson build and rename it to 'headers',
remove the root path being appended to each header and retain that in
the coccinelle Meson build since it is specific to the coccinelle build.
Also move the 'third_party_sources' variable to the root Meson build
since it is also a dependency for the 'headers' variable. This also
makes it easier to understand as the variable is now propagated from the
top level to the bottom.
While 'headers_to_check' is only computed when we have a repository and
the 'git' executable is present, the variable itself is exposed as an
empty array. This allows dependencies in upcoming commits to simply
check for length of the array and not worry about dependencies required
to actually populate the array.
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In Meson, included subdirs export their variables to top level Meson
builds. In 'contrib/coccinelle/meson.build', we define two such
variables `sources` and `headers`. While these variables are specific to
the checks in the 'contrib/coccinelle/' directory, they also pollute the
top level 'meson.build'.
Rename them to be more specific, this ensures that they aren't
mistakenly used in the upper levels and avoid variable name collisions.
While here, change the empty list denotation to be consistent with other
places.
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The GitHub's CI workflow uses 'actions/checkout@v4' to checkout the
repository. This action defaults to using the GitHub REST API to obtain
the repository if the `git` executable isn't available.
The step to build Git in the GitHub workflow can be summarized as:
...
- uses: actions/checkout@v4 #1
- run: ci/install-dependencies.sh #2
...
- run: sudo --preserve-env --set-home --user=builder ci/run-build-and-tests.sh #3
...
Step #1, clones the repository, since the `git` executable isn't present
at this step, it uses GitHub's REST API to obtain a tar of the
repository.
Step #2, installs all dependencies, which includes the `git` executable.
Step #3, sets up the build, which includes setting up meson in the meson
job. At this point the `git` executable is present.
This means while the `git` executable is present, the repository doesn't
contain the '.git' folder. To keep both the CI's (GitLab and GitHub)
behavior consistent and to ensure that the build is performed on a
real-world scenario, install `git` before the repository is checked out.
This ensures that 'actions/checkout@v4' will clone the repository
instead of using a tarball. We also update the package cache while
installing `git`, this is because some distros will fail to locate the
package without updating the cache.
Helped-by: Phillip Wood <phillip.wood123@gmail.com>
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Just as in b64d78ad02ca (max_tree_depth: lower it for MSVC to avoid
stack overflows, 2023-11-01), I encountered the same problem with the
clang builds on Windows/ARM64.
The symptom is an exit code 127 when t6700 tries to verify that `git
archive big` fails.
This exit code is reserved on Unix/Linux to mean "command not found".
Unfortunately in this case, it is the fall-back chosen by
Cygwin's `pinfo::status_exit()` method when encountering
the NSTATUS `STATUS_STACK_OVERFLOW`, see
https://github.com/cygwin/cygwin/blob/cygwin-3.6.1/winsup/cygwin/pinfo.cc#L171
I verified manually that the stack overflow always happens somewhere
around tree depth 1403, therefore 1280 should be a safe bound in these
instances.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In fb5e3378f8 (mingw: move Git for Windows' system config where users
expect it, 2021-06-22), I moved the location of Git for Windows' system
config and system Git attributes file to the top-level `/etc/` directory
(because it is a much more obvious location than, say, `/mingw64/etc/`).
The patch relied on a very specific scenario that the newly-supported
Windows/ARM64 builds of `git.exe` fails to fall into. So let's broaden
the condition a bit, so that Windows/ARM64 builds also use that location
(instead of the even more obscure `/clangarm64/etc/` directory).
This fixes https://github.com/git-for-windows/git/issues/5431.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Git for Windows/ARM64 settled on using `clang` to compile `git.exe`, and
hence needs to run in a system where `MSYSTEM` is set to `CLANGARM64`
and the prefix to use is `/clangarm64`.
We already did that in the `MINGW` arm, i.e. for regular Git for Windows
builds using MINGW GCC (or `clang`'s shim pretending to be GCC), now it
is time to do the same in the MS Visual C part.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
[jc: adjust config.mak.uname for c18400c6]
Signed-off-by: Junio C Hamano <gitster@pobox.com>
It does not compile there, and seeing as nedmalloc has been pretty much
unmaintained since at least November 2017, as per
https://github.com/ned14/nedmalloc/issues/20#issuecomment-343432314,
there is also no hope that any fixes will materialize there.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
[jc: adjust config.mak.uname for c18400c6]
Signed-off-by: Junio C Hamano <gitster@pobox.com>
CLANGARM64 is a relatively new MSYSTEM added by the MSYS2 team. In order
to have Git build correctly for this platform, let's add some
configuration for it to config.mak.uname.
Signed-off-by: Dennis Ameling <dennis@dennisameling.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Newer compiler versions, like GCC 10 and Clang 12, have built-in
functions for bswap32 and bswap64. This comes in handy, for example,
when targeting CLANGARM64 on Windows, which would not be supported
without this logic.
Signed-off-by: Dennis Ameling <dennis@dennisameling.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Remove the log_reencode field from struct rev-info, as it is not used.
This field was introduced in 52883fb, but it hasn't been used since its
introduction.
Helped-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Lucas Seiki Oshiro <lucasseikioshiro@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This perf script creates a midx by running "git multi-pack-index write"
with the "--stdin-packs" option. We feed that stdin by running "find" on
.git/objects/pack, using sed to strip off everything but the basename.
But that sed invocation also does something peculiar: it adds a "+" to
the start of each pack name. This causes the multi-pack-index command to
barf. The modified name does not match any pack it knows about, so it
ends up with an empty list of packs to put in the midx. And thus nothing
matches the --preferred-pack option we pass, which causes it die().
The fix is to remove the extra "+" (which also lets us simplify the sed
invocation a bit, as it is now just stripping the leading directories).
But that leaves the mystery of why it was ever there in the first place.
The answer is that an earlier iteration of the patch series had a
concept of "disjoint" packs in the midx. And one of its patches here:
https://lore.kernel.org/git/c52d7e7b27a9add4f58b8334db4fe4498af1c90f.1701198172.git.me@ttaylorr.com/
taught read_packs_from_stdin() to treat a leading "+" as marking a
disjoint pack. But in the second version of the series, which was
ultimately merged, that disjoint concept went away, and the code to
parse "+" did likewise. The regular regression tests were adjusted to
match, but this case in t/perf was forgotten.
Signed-off-by: Jeff King <peff@peff.net>
Acked-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The shell completion scripts in "contrib/completion" are being tested,
but none of our build systems support installing them. This is somewhat
confusing for Meson, where users can explicitly enable building these
scripts via `-Dcontrib=completion`. This option only controlls whether
the completions are built and tested against, where "building" is a bit
of an euphemism for "copying them into the build directory".
Teach both our Makefile and Meson to install our Bash completion script.
For now, this is the only completion script that we're installing given
that Bash completions "just work" with a canonical well-known location
nowadays. Other completion scripts, like for example the one for zsh,
don't have a well-known location and/or require extra steps by the user
to make them available. As such, we skip installing these scripts for
now, but we may do so in the future if we ever figure out a proper way
to do this.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Our tests for git-p4(1) depend on the p4d(1) and p4(1) executables to
exist. As we require specific versions of those binaries which typically
aren't available on common distributions, we install them manually via
"ci/install-dependencies.sh".
This script will put the binaries into "$CUSTOM_PATH", which gets
defined by "ci/lib.sh" -- if not explicitly overridden, its value will
be set to "$HOME/path". This causes issues though when running our tests
as unprivileged user, as we do both in GitLab CI and GitHub Actions,
because "$HOME" will be different when installing dependencies and when
running the tests. Consequently, the downloaded binaries will not be
found unless "$CUSTOM_PATH" is overridden to a common location.
We already do this for GitLab CI, where it points to "/custom". Let's do
the same for GitHub Actions so that Perforce-based tests are executed
again.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Since we already teach the `repo_config()` in "f29f1990b5
(config: teach repo_config to allow `repo` to be NULL, 2025-03-08)"
to allow `repo` to be NULL, no need to check if `repo` is NULL
before calling `repo_config()`.
Suggested-by: Patrick Steinhardt <ps@pks.im>
Mentored-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Usman Akinyemi <usmanakinyemi202@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Since we already teach the `repo_config()` in "f29f1990b5
(config: teach repo_config to allow `repo` to be NULL, 2025-03-08)"
to allow `repo` to be NULL, no need to check if `repo` is NULL
before calling `repo_config()`.
Suggested-by: Patrick Steinhardt <ps@pks.im>
Mentored-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Usman Akinyemi <usmanakinyemi202@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
A common way to run Git's performance benchmarks on repositories other
than Git's own repository (which is not exactly large when compared to
actually large repositories) is to run them like this:
GIT_PERF_LARGE_REPO=/path/to/my/large/repo \
./p1234-*.sh -ivx
Contrary to developers' common expectations, this failed to work when
Git was built with a different `GIT_PERF_LARGE_REPO` value specified at
build time: That build-time option would have been written to the
`GIT-BUILD-OPTIONS` file, which in turn would have been sourced by
`test-lib.sh`, which in turn would have been sourced by `perf-lib.sh`,
which in turn would have been sourced by the perf test script,
_overriding_ the environment variable specified in the way illustrated
above.
Since perf tests are not run as part of the build, this most likely
unintended behavior was not caught and certainly not fixed, as the
`GIT_PERF_*` values would have been empty at build-time.
However, in 4638e8806e3a (Makefile: use common template for
GIT-BUILD-OPTIONS, 2024-12-06), a subtle change of behavior was
introduced: Whereas before, a couple of build-time options (the
`GIT_PERF_*` ones included) were written to `GIT-BUILD-OPTIONS` only
when their values were non-empty. With this commit, they are also
written when they are empty.
The consequence is that above-mentioned way to run the perf tests will
not only fail to pick up the desired `GIT_PERF_*` settings when they
were specified differently while building Git, instead the desired
settings will be only respected when specified _while building_ Git.
Let's work around the original issue, i.e. let `GIT_PERF_*` environment
variables override what is recorded in `GIT-BUILD-OPTIONS`.
Note that this is just the tip of the iceberg, there are a couple of
`GIT_TEST_*` options that may want a similar fix in `test-lib.sh`. Due
to time constraints on my side, this here patch focuses exclusively on
the `GIT_PERF_*` settings.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The previous commit started to insist TAG_F1_ONLY to be missing,
which was not in the original. Let's not be overly eager in the
conversion.
Also, the other hunk in the commit introduced a shell syntax error,
causing the test to fail. Fix it.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Above the declaration of git_work_tree_cfg, we have:
/* This is set by setup_git_dir_gently() and/or git_default_config() */
char *git_work_tree_cfg;
It can be verified that there is no function called
'setup_git_dir_gently' by running grep on the codebase:
$ grep -R setup_git_dir_gently .
./environment.c:/* This is set by setup_git_dir_gently() and/or git_default_config() */
The comment, introduced in e90fdc39b6 (Clean up work-tree handling), is
the only occurrence of the name 'setup_git_dir_gently'.
It probably meant 'setup_git_directory_gently' as that is a name of a
real function in setup.c. Correct it.
Signed-off-by: Abhijeet Sonar <abhijeet.nkt@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Commit 05cd988dce ("wrapper: add a helper to generate numbers from a
CSPRNG", 2022-01-17) added a csprng_bytes() function which used one
of several interfaces to provide a source of cryptographically secure
pseudorandom numbers. The CSPRNG_METHOD make variable was provided to
determine the choice of available 'backends' for the source of random
bytes.
Commit 05cd988dce did not set CSPRNG_METHOD in the Linux section of
the config.mak.uname file, so it defaults to using '/dev/urandom' as
the source of random bytes. The 'backend' values which could be used
on Linux are 'arc4random', 'getrandom' or 'getentropy' ('openssl' is
an option, but seems to be discouraged).
The arc4random routines (arc4random_buf() is the one actually used) were
added to glibc in version 2.36, while both getrandom() and getentropy()
were included in 2.25. So, some of the more up-to-date distributions of
Linux (eg Debian 12, Ubuntu 24.04) would be able to use the 'arc4random'
setting. All currently supported distributions have glibc 2.25 or later
(RHEL 8 has v2.28) and, therefore, have support for the 'getrandom' and
'getentropy' settings.
The arc4random routines on the *BSDs (along with cygwin) implement the
ChaCha20 stream cipher algorithm (see RFC8439) in userspace, rather than
as a system call, and are thus somewhat faster (having avoided a context
switch to the kernel). In contrast, on Linux all three functions are
simple wrappers around the same kernel CSPRNG syscall.
If the meson build system is used on a newer platform, then they will be
configured to use 'arc4random', whereas the make build will currently
default to using '/dev/urandom' on Linux. Since there is no advantage,
in terms of performance, to the 'arc4random' setting, the 'getrandom'
setting should be preferred from an availability perspective. (Also, the
current uses of csprng_bytes() are not in any hot path).
In order to set an appropriate default, set the CSPRNG_METHOD build
variable to 'getrandom' in the Linux section of the 'config.mak.uname'
file.
Signed-off-by: Ramsay Jones <ramsay@ramsayjones.plus.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Incorrect sorting of refs with bytes with high-bit set on platforms
with signed char led to a BUG, which has been corrected.
* ps/refname-avail-check-optim:
refs/packed: fix BUG when seeking refs with UTF-8 characters
Remove remnants of the recursive merge strategy backend, which was
superseded by the ort merge strategy.
* en/merge-recursive-debug:
builtin/{merge,rebase,revert}: remove GIT_TEST_MERGE_ALGORITHM
tests: remove GIT_TEST_MERGE_ALGORITHM and test_expect_merge_algorithm
merge-recursive.[ch]: thoroughly debug these
merge, sequencer: switch recursive merges over to ort
sequencer: switch non-recursive merges over to ort
merge-ort: enable diff-algorithms other than histogram
builtin/merge-recursive: switch to using merge_ort_generic()
checkout: replace merge_trees() with merge_ort_nonrecursive()
"git blame --porcelain" mode now talks about unblamable lines and
lines that are blamed to an ignored commit.
* kn/blame-porcelain-unblamable:
blame: print unblamable and ignored commits in porcelain mode
"git fetch [<remote>]" with only the configured fetch refspec
should be the only thing to update refs/remotes/<remote>/HEAD,
but the code was overly eager to do so in other cases.
* jk/fetch-follow-remote-head-fix:
fetch: make set_head() call easier to read
fetch: don't ask for remote HEAD if followRemoteHEAD is "never"
fetch: only respect followRemoteHEAD with configured refspecs
It was reported that "t5620-backfill.sh" fails on s390x and sparc64 in a
test that exercises the "--min-batch-size" command line option. The
symptom was that the option didn't seem to have an effect: we didn't
fetch objects with a batch size of 20, but instead fetched all objects
at once.
As it turns out, the root cause is that `--min-batch-size` uses
`OPT_INTEGER()` to parse the command line option. While this macro
expects the caller to pass a pointer to an integer, we instead pass a
pointer to a `size_t`. This coincidentally works on most platforms, but
it breaks apart on the mentioned platforms because they are big endian.
This issue isn't specific to git-backfill(1): there are a couple of
other places where we have the same type confusion going on. This
indicates that the issue really is the interface that the parse-options
subsystem provides -- it is simply too easy to get this wrong as there
isn't any kind of compiler warning, and things just work on the most
common systems.
Address the systemic issue by introducing two new build asserts
`BARF_UNLESS_SIGNED()` and `BARF_UNLESS_UNSIGNED()`. As the names
already hint at, those macros will cause a compiler error when passed a
value that is not signed or unsigned, respectively.
Adapt `OPT_INTEGER()`, `OPT_UNSIGNED()` as well as `OPT_MAGNITUDE()` to
use those asserts. This uncovers a small set of sites where we indeed
have the same bug as in git-backfill(1). Adapt all of them to use the
correct option.
Reported-by: Todd Zullinger <tmz@pobox.com>
Reported-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
Helped-by: SZEDER Gábor <szeder.dev@gmail.com>
Helped-by: Jeff King <peff@peff.net>
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This commit is the equivalent to the preceding commit, but instead of
introducing precision handling for `OPTION_INTEGER` we introduce it for
`OPTION_UNSIGNED`.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The `OPTION_INTEGER` option type accepts a signed integer. The type of
the underlying integer is a simple `int`, which restricts the range of
values accepted by such options. But there is a catch: because the
caller provides a pointer to the value via the `.value` field, which is
a simple void pointer. This has two consequences:
- There is no check whether the passed value is sufficiently long to
store the entire range of `int`. This can lead to integer wraparound
in the best case and out-of-bounds writes in the worst case.
- Even when a caller knows that they want to store a value larger than
`INT_MAX` they don't have a way to do so.
In practice this doesn't tend to be a huge issue because users typically
don't end up passing huge values to most commands. But the parsing logic
is demonstrably broken, and it is too easy to get the calling convention
wrong.
Improve the situation by introducing a new `precision` field into the
structure. This field gets assigned automatically by `OPT_INTEGER_F()`
and tracks the size of the passed value. Like this it becomes possible
for the caller to pass arbitrarily-sized integers and the underlying
logic knows to handle it correctly by doing range checks. Furthermore,
convert the code to use `strtoimax()` intstead of `strtol()` so that we
can also parse values larger than `LONG_MAX`.
Note that we do not yet assert signedness of the passed variable, which
is another source of bugs. This will be handled in a subsequent commit.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
With the preceding commit, `OPT_INTEGER()` has learned to support unit
factors. Consequently, the major differencen between `OPT_INTEGER()` and
`OPT_MAGNITUDE()` isn't the support of unit factors anymore, as both of
them do support them now. Instead, the difference is that one handles
signed and the other handles unsigned integers.
Adapt the name of `OPT_MAGNITUDE()` accordingly by renaming it to
`OPT_UNSIGNED()`.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
There are two main differences between `OPT_INTEGER()` and
`OPT_MAGNITUDE()`:
- The former parses signed integers whereas the latter parses unsigned
integers.
- The latter parses unit factors like 'k', 'm' or 'g'.
While the first difference makes obvious sense, there isn't really a
good reason why signed integers shouldn't support unit factors, too.
This inconsistency will also become a bit of a problem with subsequent
commits, where we will fix a couple of callsites that pass an unsigned
integer to `OPT_INTEGER()`. There are three options:
- We could adapt those users to instead pass a signed integer, but
this would needlessly extend the range of accepted integer values.
- We could convert them to use `OPT_MAGNITUDE()`, as it only accepts
unsigned integers. But now we have the inconsistency that we also
start to accept unit factors.
- We could introduce `OPT_UNSIGNED()` as equivalent to `OPT_INTEGER()`
so that it knows to only accept unsigned integers without unit
suffix.
Introducing a whole new option type feels a bit excessive. There also
isn't really a good reason why `OPT_INTEGER()` cannot be extended to
also accept unit factors: all valid values passed to such options cannot
have a unit factors right now, so there wouldn't be any ambiguity.
Refactor `OPT_INTEGER()` to use `git_parse_int()`, which knows to
interpret unit factors. This removes the inconsistency between the
signed and unsigned options so that we can easily fix up callsites that
pass the wrong integer type right now.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
While we expose macros for most of our different option types understood
by the "parse-options" subsystem, not every combination of fields that
has one as that would otherwise quickly lead to an explosion of macros.
Instead, we just initialize structures manually for those variants of
fields that don't have a macro.
Callsites that open-code these structure initialization don't use
designated initializers though and instead just provide values for each
of the fields that they want to initialize. This has three significant
downsides:
- Callsites need to specify all values up to the last field that they
care about. This often includes fields that should simply be left at
their default zero-initialized state, which adds distraction.
- Any reader not deeply familiar with the layout of the structure
has a hard time figuring out what the respective initializers mean.
- Reordering or introducing new fields in the middle of the structure
is impossible without adapting all callsites.
Convert all sites to instead use designated initializers, which we have
started using in our codebase quite a while ago. This allows us to skip
any default-initialized fields, gives the reader context by specifying
the field names and allows us to reorder or introduce new fields where
we want to.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We accept a maximum value in `git_parse_signed()` that restricts the
range of accepted integers. As the intent is to pass `INT*_MAX` values
here, this maximum doesn't only act as the upper bound, but also as the
implicit lower bound of the accepted range.
This lower bound is calculated by negating the maximum. But given that
the maximum value of a signed integer with N bits is `2^(N-1)-1` whereas
the minimum value is `-2^(N-1)` we have an off-by-one error in the lower
bound.
Fix this off-by-one error by using `-max - 1` as lower bound instead.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The arc4random_buf() function has been available in cygwin since
about 2016 (somewhere in the v2.x branch). Set the CSPRNG_METHOD
build variable to 'arc4random', in the cygwin section, to enable
the use of this cryptographically-secure pseudorandom number
function. Note that the autoconf and new meson builds also enable
this function.
Signed-off-by: Ramsay Jones <ramsay@ramsayjones.plus.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Although sysinfo() is a 'Linux only' function, cygwin provides an
implementation which appears to be functional. The assumption that
this function is Linux only is reflected in the way the HAVE_SYSINFO
build variable is handled by the Makefile and config.mak.uname.
Rework the setting of HAVE_SYSINFO in the Linux section of the system
specific config file, along with the corresponding setting of the
BASIC_CFLAGS in the Makefile. Add the setting of HAVE_SYSINFO to the
cygwin section of 'config.mak.uname'. While here, add a test for the
sysinfo() function to the autoconf build system.
Signed-off-by: Ramsay Jones <ramsay@ramsayjones.plus.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The man page for sysinfo(2) on Linux states that (from v2.3.48) the
sizes of the memory and swap fields, of the returned structure, are
given as multiples of 'mem_unit' bytes. In earlier versions (prior to
v2.3.23 on i386 in particular), the 'mem_unit' field was not part of
the structure, and all sizes were measured in bytes. The man page does
not discuss the motivation for this change, but it is possible that the
change was intended for the, relatively rare, 32-bit platform with more
than 4GB of memory.
The total_ram() function makes the assumption that the 'totalram' field
of the 'struct sysinfo' is measured in bytes, or alternatively that the
'mem_unit' field is always equal to one. Having writen a program to call
the sysinfo() function and print the structure fields, it seems that, on
Linux x84_64 and i686 anyway, the 'mem_unit' field is indeed set to one
(note that the 32-bit system had only 2GB ram). However, cygwin also has
an sysinfo() implementation, which gives the following values:
$ ./sysinfo
uptime: 21381
loads: 0, 0, 0
total ram: 2074637
free ram: 843237
shared ram: 0
buffer ram: 0
total swap: 327680
free swap: 306932
procs: 15
total high: 0
free high: 0
mem_unit: 4096
total ram: 8497713152
$
[This laptop has 8GB ram, so a little bit seems to be missing. ;) ]
Modify the total_ram() function to allow for the possibility that the
memory size is not specified in bytes (ie 'mem_unit' is greater than
one).
Signed-off-by: Ramsay Jones <ramsay@ramsayjones.plus.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Cygwin supports the clock_gettime() function, along with the associated
CLOCK_MONOTONIC preprocessor symbol. The autoconf and meson builds both
enable the use of those symbols. In order to have the same configuration
for the make builds, add the HAVE_CLOCK_GETTIME and HAVE_CLOCK_MONOTONIC
build variables to the cygwin section of the config.mak.uname file.
Signed-off-by: Ramsay Jones <ramsay@ramsayjones.plus.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Cygwin has provided the getdelim() function as far back as (at least)
2011. The autoconf and meson builds enable the use of this symbol.
In order to have the same configuration for autoconf, meson and make,
enable the HAVE_GETDELIM build variable in the cygwin section of the
config.mak.uname file.
Signed-off-by: Ramsay Jones <ramsay@ramsayjones.plus.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Commit 92f63d2b05 ("Cygwin 1.7 needs compat/regex", 2013-07-19) set
the NO_REGEX build variable because the platform regex library failed
some of the tests (t4018 and t4034), which passed just fine with the
compat library.
After some time (maybe a year or two), the platform library had been
updated (with an import from FreeBSD, I believe) and now passed the full
test-suite. This would be about the time of the v1.7 -> v2.0 transition
in 2015. I had a patch ready to send, but just didn't get around to
submitting it to the list. At some point in the interim, the official
cygwin git package used the autoconf build system, which sets the
NO_REGEX variable to use the platform regex library functions. The new
meson build system does likewise.
The cygwin platform regex library, in addition to now passing the tests
which formerly failed, now passes an 'test_expect_failure' test in the
t7815-grep-binary test file. In particular, test #12 'git grep .fi a'
which determines that the regex pattern '.' matches a NUL character.
The commit f96e56733a ("grep: use REG_STARTEND for all matching if
available", 2010-05-22) added the test in question, but it does not
give any indication as to why the test was framed as an expected fail,
rather than a 'positive' test that the 'git grep' command fails to
match a NUL. Note that the previous test #11 was also originally
marked in that commit as a 'test_expect_failure', but was flipped to
an 'success' test in commit 7e36de5859 ("t/t7008-grep-binary.sh: un-TODO
a test that needs REG_STARTEND", 2010-08-17).
In order to produce the same NO_REGEX configuration from autoconf, meson
and make, modify config.mak.uname to only set NO_REGEX for cygwin v1.7.
In addition, skip test t7815.12 on cygwin, by adding the !CYGWIN pre-
requisite to the test header, which (among other things) removes an
'...; please update test(s)' comment.
Signed-off-by: Ramsay Jones <ramsay@ramsayjones.plus.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Commit 817151e61a ("Rename safe_strncpy() to strlcpy().", 2006-06-24)
added the NO_STRLCPY make variable to allow the conditional use of
the gitstrlcpy() compat function on those platforms which didn't
provide the 'standard' strlcpy() function.
Recently, in the summer of 2023, the strlcpy() and strlcat() functions
were added to the glibc library (v2.38), so some of the more up-to-date
Linux distributions no longer need to set NO_STRLCPY. For example, both
Ubuntu 24.04 LTS and RHEL 10 beta have glibc v2.39. However, several
distributions, which are still within their support window, have an
earlier version and must still use the 'compat' version of strlcpy().
If the meson or autoconf build systems are used on newer platforms, then
they will be configured to to use strlcpy() from glibc, whereas the make
build will always choose the 'compat' function instead. Add a note to
the config.mak.uname file, in the Linux section, to prompt make users to
override NO_STRLCPY in the config.mak file, if appropriate.
Signed-off-by: Ramsay Jones <ramsay@ramsayjones.plus.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Commit d19e3a5b21 ("Makefile: add NEEDS_LIBRT to optionally link with
librt", 2016-07-07) introduced the NEEDS_LIBRT build variable to
disassociate the HAVE_CLOCK_GETTIME variable with the unconditional
linking of the librt library. At one time, the clock_gettime() function
was not available as part of the libc library and (on some unix systems)
required linking with librt.
Commit 52fcec75ce ("config.mak.uname: define NEEDS_LIBRT under Linux, for
now", 2016-07-10) set the NEEDS_LIBRT variable in the Linux section of
the config.mak.uname file, since Debian 7 (wheezy) was one of the few
remaining distributions, with glibc 2.13, that required linking with
librt for clock_gettime(). Note that from glibc version 2.17, this is no
longer necessary.
Note that Debian 7.0 was released on May 4th, 2013 and benefited from
long term support until May 2018 when it went end-of-life. Since that
time, Linux distributions use a more up-to-date library, for example:
Distribution version end of support
Debian 8 2.19 30th June 2020
RHEL 8 2.28 31st May 2024 *
Ubuntu 16.04 2.23 30th Apr 2021
* paid 'Maintenance support' ends 31st May 2029
Since it is no longer required, remove NEEDS_LIBRT from the Makefile and
config.mak.uname.
Signed-off-by: Ramsay Jones <ramsay@ramsayjones.plus.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The build variable DEFAULT_HELP_FORMAT has an appropriate default
('man') set in the code, so there is no need to pass the -Define on
the compiler command-line, unless the build requires a non-standard
value.
In addition, on windows the make build overrides the default help
format to 'html', rather than 'man', in the 'config.mak.uname' file.
In order to suppress the -Define on the C compiler command-line, only
add the -Define to the 'libgit_c_args' variable when the requested
value is not the standard 'man'. In order to override the default value
on windows, add a 'platform' value to the 'default_help_format' combo
option and set it as the default choice. When this option is set to
'platform', use the 'host_machine.system()' method call to determine the
appropriate default value for the host system.
Signed-off-by: Ramsay Jones <ramsay@ramsayjones.plus.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Some preprocessor -Defines have defaults set in the source code when
they have not been provided to the C compiler. In this case, there is
no need to pass them on the command-line, unless the build requires a
non-standard value.
The build variables for DEFAULT_EDITOR and DEFAULT_PAGER have appropriate
defaults ('vi' and 'less') set in the code. Add the preprocessor -Defines
to the 'libgit_c_args' only if the values set with the corresponding
'options' are different to these standard values.
Also, the 'git-var' documentation contains some conditional text which
documents the chosen compiled in value, which would not read well for
the standard values. Similar to the above, only add the corresponding
'-a' attribute arguments to the 'asciidoc_common_options' variable, if
the values set in the 'options' are different to these standard values.
Signed-off-by: Ramsay Jones <ramsay@ramsayjones.plus.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Several build variables only have any meaning when the RUNTIME_PREFIX
variable has been set. In particular, the following build variables are
otherwise ignored:
HAVE_BSD_KERN_PROC_SYSCTL
PROCFS_EXECUTABLE_PATH
HAVE_NS_GET_EXECUTABLE_PATH
HAVE_ZOS_GET_EXECUTABLE_PATH
HAVE_WPGMPTR
Make setting BASIC_CFLAGS, for each of these variables, conditional on
the RUNTIME_PREFIX being defined.
Signed-off-by: Ramsay Jones <ramsay@ramsayjones.plus.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Commit 9371322a60 ("sparse: suppress some \"using sizeof on a function\"
warnings", 2013-10-06) used target-specific variable assignments to add
-DCURL_DISABLE_TYPECHECK to SPARSE_FLAGS for each of the files affected
by the "typecheck-gcc.h" warnings. (http-push.c, http.c, http-walker.c
and remote-curl.c).
These warnings are only issued by sparse, and not by gcc, so we do not
want to disable the 'type checking' for non-sparse targets. The meson
build does not provide any sparse targets, so there is no need to use
the CURL_DISABLE_TYPECHECK preprocessor flag with the c compiler.
In order to re-enable the curl 'type checking' in the meson build, remove
the assignment of -DCURL_DISABLE_TYPECHECK to libgit_c_args.
Signed-off-by: Ramsay Jones <ramsay@ramsayjones.plus.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git cat-file --batch" and friends learned to allow "--filter=" to
omit certain objects, just like the transport layer does.
* ps/cat-file-filter-batch:
builtin/cat-file: use bitmaps to efficiently filter by object type
builtin/cat-file: deduplicate logic to iterate over all objects
pack-bitmap: introduce function to check whether a pack is bitmapped
pack-bitmap: add function to iterate over filtered bitmapped objects
pack-bitmap: allow passing payloads to `show_reachable_fn()`
builtin/cat-file: support "object:type=" objects filter
builtin/cat-file: support "blob:limit=" objects filter
builtin/cat-file: support "blob:none" objects filter
builtin/cat-file: wire up an option to filter objects
builtin/cat-file: introduce function to report object status
builtin/cat-file: rename variable that tracks usage
"make test" used to have a hard dependency on (basic) Perl; tests
have been rewritten help environment with NO_PERL test the build as
much as possible.
* ps/test-wo-perl-prereq:
t5703: refactor test to not depend on Perl
t5316: refactor `max_chain()` to not depend on Perl
t0210: refactor trace2 scrubbing to not use Perl
t0021: refactor `generate_random_characters()` to not depend on Perl
t/lib-httpd: refactor "one-time-perl" CGI script to not depend on Perl
t/lib-t6000: refactor `name_from_description()` to not depend on Perl
t/lib-gpg: refactor `sanitize_pgp()` to not depend on Perl
t: refactor tests depending on Perl for textconv scripts
t: refactor tests depending on Perl to print data
t: refactor tests depending on Perl substitution operator
t: refactor tests depending on Perl transliteration operator
Makefile: stop requiring Perl when running tests
meson: stop requiring Perl when tests are enabled
t: adapt existing PERL prerequisites
t: introduce PERL_TEST_HELPERS prerequisite
t: adapt `test_readlink()` to not use Perl
t: adapt `test_copy_bytes()` to not use Perl
t: adapt character translation helpers to not use Perl
t: refactor environment sanitization to not use Perl
t: skip chain lint when PERL_PATH is unset
"git help --build-options" reports SHA-1 and SHA-256 backends used
in the build.
* jt/help-sha-backend-info-in-build-options:
help: include unsafe SHA-1 build info in version
help: include SHA implementation in version info
Updating multiple references have only been possible in all-or-none
fashion with transactions, but it can be more efficient to batch
multiple updates even when some of them are allowed to fail in a
best-effort manner. A new "best effort batches of updates" mode
has been introduced.
* kn/non-transactional-batch-updates:
update-ref: add --batch-updates flag for stdin mode
refs: support rejection in batch updates during F/D checks
refs: implement batch reference update support
refs: introduce enum-based transaction error types
refs/reftable: extract code from the transaction preparation
refs/files: remove duplicate duplicates check
refs: move duplicate refname update check to generic layer
refs/files: remove redundant check in split_symref_update()
Auth-related (and unrelated) error handling in send-email has been
made more robust.
* zy/send-email-error-handling:
send-email: finer-grained SMTP error handling
send-email: capture errors in an eval {} block
"git rev-list" learns machine-parsable output format that delimits
each field with NUL.
* jt/rev-list-z:
rev-list: support NUL-delimited --missing option
rev-list: support NUL-delimited --boundary option
rev-list: support delimiting objects with NUL bytes
rev-list: refactor early option parsing
rev-list: inline `show_object_with_name()` in `show_object()`
Random build fixes.
* ps/misc-build-fixes:
ci: use Visual Studio for win+meson job on GitHub Workflows
meson: distinguish build and target host binaries
meson: respect 'tests' build option in contrib
gitweb: fix generation of "gitweb.js"
meson: fix handling of '-Dcurl=auto'
A few traditional unit tests have been rewritten to use the clar
framework.
* sk/clar-trailer-urlmatch-norm-test:
t/unit-tests: convert urlmatch-normalization test to clar
t/unit-tests: convert trailer test to use clar
The tests use grep to search the output of `git tag` for tagnames they
expect to exist, which can incorrectly pass if an unxpected tag
has the expected tag as its substring. We fix this by using `git
show-ref --verify` instead.
Additionally, we add a negative test to verify that a possible
uninteded tag does not show up in the imported repository.
This change also fixes an additional problem, where piping the
output of `git tag` caused the exit codes to be lost.
Signed-off-by: Anthony Wang <anthonywang513@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
If a user wishes to disable hooks, then they can do so using the
established pattern of setting 'core.hooksPath' to /dev/null. This is
already tested in t1350-config-hooks-path.sh, but has not previously
been visible in the documentation.
Update the documentation to include this as an option.
Signed-off-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The "cmd-list.perl" script is used to extract the list of commands part
of a specific category and extracts the description of each command from
its respective manpage. The generated output is then included in git(1)
to list all Git commands.
The script is written in Perl. Refactor it to use shell scripting
exclusively so that we can get rid of the mandatory dependency on Perl
to build our documentation.
The converted script is slower compared to its Perl implementation. But
by being careful and not spawning external commands in `format_one ()`
we can mitigate the performance hit to a reasonable level:
Benchmark 1: Perl
Time (mean ± σ): 10.3 ms ± 0.2 ms [User: 7.0 ms, System: 3.3 ms]
Range (min … max): 10.0 ms … 11.1 ms 200 runs
Benchmark 2: Shell
Time (mean ± σ): 74.4 ms ± 0.4 ms [User: 48.6 ms, System: 24.7 ms]
Range (min … max): 73.1 ms … 75.5 ms 200 runs
Summary
Perl ran
7.23 ± 0.13 times faster than Shell
While a sevenfold slowdown is significant, the benefit of not requiring
Perl for a fully-functioning Git installation outweighs waiting a couple
of milliseconds longer during the build process.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The "fix-texi.perl" script is used to fix up the output of
`docbook2x-texi`:
- It changes the filename to be "git.info".
- It changes the directory category and entry.
The script is written in Perl, but it can be rather trivially converted
to a shell script. Do so to remove the dependency on Perl for building
the user manual.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
While git-request-pull(1) is written as a shell script, for it to
function we depend on Perl being available. The script gets installed
unconditionally though, regardless of whether or not Perl is even
available on the system. When it's not available, the `@PERL_PATH@`
variable may be substituted with a nonexistent executable path and thus
cause the script to fail.
Refactor the script so that it does not depend on Perl at all anymore.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
While git-filter-branch(1) is written as a shell script, the
`--state-branch` feature depends on Perl to save and extract the object
ID mappings. This can lead to subtle breakage though:
- We execute `perl` directly without respecting the `PERL_PATH`
configured by the distribution. As such, it may happen that we use
the wrong version of Perl.
- We install the script unchanged even if Perl isn't available at all
on the system, so using `--state-branch` would lead to failure
altogether in that case.
Fix this by dropping Perl and instead implementing the feature with
shell scripting exclusively.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The image pointed to by the fedora:latest tag has moved from fedora
41 to 42. The fedora 41 container images have awk installed while
the fedora 42 images do not. That change is most likely just part
of reducing the size of the base container images.
In both AlmaLinux and Fedora (as well as other RHEL
derivatives/relatives), awk is provided by the gawk package.
On Fedora, `dnf install awk` would work, by using the package
filelist data to determine that /usr/bin/awk is provided by gawk and
installs gawk as a result.
On AlmaLinux (8 & 9, by quick testing by Todd), that is not the case
and you'd need to use `dnf install gawk` or `dnf install '*bin/awk'`
to get it installed. Having said that, awk _is_ included in the
current AlmaLinux 8 and 9 images, so it isn't strictly needed. But
it's probably better to be explicit that we need it installed, as a
defense against some future change to the AlmaLinux container
removing awk.
Because we know that on both of these distros, our scripts that call
for 'awk' had been using 'gawk' that was installed as part of the
base image, let's make sure that we explicitly install 'gawk'. If
the image already has it, it would be a no-op that does not cause
breakage.
Suggested-by: Todd Zullinger <tmz@pobox.com>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git-merge-file" documentation source, which has lines that look
like conflict markers, lacked custom conflict marker size defined,
which has been corrected..
* pw/custom-conflict-marker-size-for-merge-related-docs:
merge-file doc: set conflict-marker-size attribute
Code clean-up.
* js/comma-semicolon-confusion:
detect-compiler: detect clang even if it found CUDA
clang: warn when the comma operator is used
compat/regex: explicitly mark intentional use of the comma operator
wildmatch: avoid using of the comma operator
diff-delta: avoid using the comma operator
xdiff: avoid using the comma operator unnecessarily
clar: avoid using the comma operator unnecessarily
kwset: avoid using the comma operator unnecessarily
rebase: avoid using the comma operator unnecessarily
remote-curl: avoid using the comma operator unnecessarily
"git clone" still gave the message about the default branch name;
this message has been turned into an advice message that can be
turned off.
* jt/clone-guess-remote-head-fix:
advice: allow disabling default branch name advice
builtin/clone: suppress unexpected default branch advice
remote: allow `guess_remote_head()` to suppress advice
The job to coalesce loose objects into packfiles in "git
maintenance" now has configurable batch size.
* ds/maintenance-loose-objects-batchsize:
maintenance: add loose-objects.batchSize config
maintenance: force progress/no-quiet to children
Fix lockfile contention in reftable code on Windows.
* ps/mingw-creat-excl-fix:
compat/mingw: fix EACCESS when opening files with `O_CREAT | O_EXCL`
meson: fix compat sources when compiling with MSVC
"git reflog" learns "drop" subcommand, that discards the entire
reflog data for a ref.
* kn/reflog-drop:
reflog: implement subcommand to drop reflogs
reflog: improve error for when reflog is not found
The object layer has been updated to take an explicit repository
instance as a parameter in more code paths.
* ps/object-wo-the-repository:
hash: stop depending on `the_repository` in `null_oid()`
hash: fix "-Wsign-compare" warnings
object-file: split out logic regarding hash algorithms
delta-islands: stop depending on `the_repository`
object-file-convert: stop depending on `the_repository`
pack-bitmap-write: stop depending on `the_repository`
pack-revindex: stop depending on `the_repository`
pack-check: stop depending on `the_repository`
environment: move access to "core.bigFileThreshold" into repo settings
pack-write: stop depending on `the_repository` and `the_hash_algo`
object: stop depending on `the_repository`
csum-file: stop depending on `the_repository`
Fix our use of zlib corner cases.
* jk/zlib-inflate-fixes:
unpack_loose_rest(): rewrite return handling for clarity
unpack_loose_rest(): simplify error handling
unpack_loose_rest(): never clean up zstream
unpack_loose_rest(): avoid numeric comparison of zlib status
unpack_loose_header(): avoid numeric comparison of zlib status
git_inflate(): skip zlib_post_call() sanity check on Z_NEED_DICT
unpack_loose_header(): fix infinite loop on broken zlib input
unpack_loose_header(): report headers without NUL as "bad"
unpack_loose_header(): simplify next_out assignment
loose_object_info(): BUG() on inflating content with unknown type
* ps/test-wo-perl-prereq:
t5703: refactor test to not depend on Perl
t5316: refactor `max_chain()` to not depend on Perl
t0210: refactor trace2 scrubbing to not use Perl
t0021: refactor `generate_random_characters()` to not depend on Perl
t/lib-httpd: refactor "one-time-perl" CGI script to not depend on Perl
t/lib-t6000: refactor `name_from_description()` to not depend on Perl
t/lib-gpg: refactor `sanitize_pgp()` to not depend on Perl
t: refactor tests depending on Perl for textconv scripts
t: refactor tests depending on Perl to print data
t: refactor tests depending on Perl substitution operator
t: refactor tests depending on Perl transliteration operator
Makefile: stop requiring Perl when running tests
meson: stop requiring Perl when tests are enabled
t: adapt existing PERL prerequisites
t: introduce PERL_TEST_HELPERS prerequisite
t: adapt `test_readlink()` to not use Perl
t: adapt `test_copy_bytes()` to not use Perl
t: adapt character translation helpers to not use Perl
t: refactor environment sanitization to not use Perl
t: skip chain lint when PERL_PATH is unset
The "object-store-ll.h" header has been introduced to keep transitive
header dependendcies and compile times at bay. Now that we have created
a new "object-store.c" file though we can easily move the last remaining
additional bit of "object-store.h", the `odb_path_map`, out of the
header.
Do so. As the "object-store.h" header is now equivalent to its low-level
alternative we drop the latter and inline it into the former.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Cached objects are virtual objects that can be set up without writing
anything into the object store directly, which is used by git-blame(1)
to create fake commits for the working tree.
These cached objects are stored in a global variable, which is another
roadblock for libification of the object subsystem. Refactor the code so
that we instead store the array as part of the raw object store.
This refactoring raises the question whether virtual objects should
really be specific to a single repository (or rather a single object
store). Hypothetical usecases might for example span across submodules,
and here it may or may not be the right thing to provide virtual objects
across submodule boundaries.
The only existing usecase is git-blame(1) though, which does not know to
blame across submodule boundaries in the first place. As such, storing
these objects both globally and per-repository would achieve the same
result right now. But arguably, if we learned to blame across submodule
boundaries, we would likely want to create separate fare working tree
commits for each of the submodules so that the user can learn which
worktree a specific uncommitted change belongs to. And even if we would
want to create the same fake commit for each of the submodules we could
do that when storing separate virtual objects per object store.
While this is all rather hypothetical, the takeaway is that handling
virtual objects per-object store gives us more flexibility compared to
storing them globally. In a hypothetical future where we have achieved
full libification one might be able to handle unrelated repositories in
a single process, where the state of one repository should not have an
impact on the state of another repository. As such, storing these cached
objects per object store will enable more usecases and should lead to
less surprising outcomes overall.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Split out functions relating to the object store subsystem from
"object.c". This helps us to separate concerns.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The `index_blob_stream()` function is a mere wrapper around
`index_blob_bulk_checkin()`. This has been the case since 568508e7657
(bulk-checkin: replace fast-import based implementation, 2011-10-28),
which has moved the implementation from `index_blob_stream()` (which was
still called `index_stream()`) into `index_bulk_checkin()` (which has
since been renamed to `index_blob_bulk_checkin()`).
Remove the redirection by dropping the wrapper. Move the comment to
`index_blob_bulk_checkin()` to retain its context.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The functions `hash_object_file()`, `write_object_file()` and
`index_fd()` reuse the same set of flags to alter their behaviour. This
not only adds confusion, but given that every function only supports a
subset of the flags it becomes very hard to see which flags can be
passed to what function. Last but not least, this entangles the
implementation of all three function families.
Split up concerns by creating separate flags for each of the function
families.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
While we have the "object-store.h" header, most of the functionality for
object stores is actually hosted in "object-file.c". This makes it hard
to find relevant functions and causes us to mix up concerns.
Split out functions relating to the object store subsystem into a new
"object-store.c" file.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The `xmmap()` function is provided by "object-file.c" even though its
functionality has nothing to do with the object file subsystem. Move it
into "wrapper.c", whose header already declares those functions.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The `git_open_cloexec()` wrapper function provides the ability to open a
file with `O_CLOEXEC` in a platform-agnostic way. This function is
provided by "object-file.c" even though it is not specific to the object
subsystem at all.
Move the file into "compat/open.c". This file already exists before this
commit, but has only been compiled conditionally depending on whether or
not open(3p) may return EINTR. With this change we now unconditionally
compile the object, but wrap `git_open_with_retry()` in an ifdef.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The `safe_create_leading_directories()` function and its relatives are
located in "object-file.c", which is not a good fit as they provide
generic functionality not related to objects at all. Move them into
"path.c", which already hosts `safe_create_dir()` and its relative
`safe_create_dir_in_gitdir()`.
"path.c" is free of `the_repository`, but the moved functions depend on
`the_repository` to read the "core.sharedRepository" config. Adapt the
function signature to accept a repository as argument to fix the issue
and adjust callers accordingly.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The `mkdir_in_gitdir()` function is similar to `safe_create_dir()`, but
the former is hosted in "object-file.c" whereas the latter is hosted in
"path.c". The latter code unit makes way more sense though as the logic
has nothing to do with object files in particular.
Move the file into "path.c". While at it, we:
- Rename the function to `safe_create_dir_in_gitdir()` so that the
function names are similar to one another.
- Remove the dependency on `the_repository` by making the callers pass
the repository instead.
Adjust callers accordingly.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In 7b31b55db1 (perf: amend the grep tests to test grep.threads,
2017-12-29), p7821 was tweaked to test the performance of 'git grep'
under different number of threads. These tests are run if
GIT_PERF_GREP_THREADS is set to a list of thread numbers, but the
comment at the top of the file instead mentions GIT_PERF_7821_THREADS.
Fix the comment.
Signed-off-by: Philippe Blain <levraiphilippeblain@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This rule was already implicitely applied in the converted man pages,
so let's state it loudly.
Signed-off-by: Jean-Noël Avila <jn.avila@free.fr>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The processing of triple dot notation is tricky because it can be
mis-interpreted as an ellipsis. The special processing of the ellipsis
is now complete and takes into account the case of
`git-mv <source>... <dest>`
Signed-off-by: Jean-Noël Avila <jn.avila@free.fr>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
- Switch the synopsis to a synopsis block which will automatically
format placeholders in italics and keywords in monospace
- Use _<placeholder>_ instead of <placeholder> in the description
- Use `backticks` for keywords and more complex option
descriptions. The new rendering engine will apply synopsis rules to
these spans.
Unfortunately, there's an inconsistency in the synopsis style, where
the ellipsis is used to indicate that the option can be repeated, but
it can also be used in Git's three-dot notation to indicate a range of
commits. The rendering engine will not be able to distinguish
between these two cases.
Signed-off-by: Jean-Noël Avila <jn.avila@free.fr>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This also entails changing the help output for the command to match the new
synopsis.
Signed-off-by: Jean-Noël Avila <jn.avila@free.fr>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
- Switch the synopsis to a synopsis block which will automatically
format placeholders in italics and keywords in monospace
- Use _<placeholder>_ instead of <placeholder> in the description
- Use `backticks` for keywords and more complex option
descriptions. The new rendering engine will apply synopsis rules to
these spans.
Signed-off-by: Jean-Noël Avila <jn.avila@free.fr>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The synopsis analysis logic was not able to handle backslashes and stars
which are used in the synopsis of the git-rm command. This patch fixes the
issue by updating the regular expression used to match the keywords.
Signed-off-by: Jean-Noël Avila <jn.avila@free.fr>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
- Switch the synopsis to a synopsis block which will automatically
format placeholders in italics and keywords in monospace
- Use _<placeholder>_ instead of <placeholder> in the description
- Use `backticks` for keywords and more complex option
descriptions. The new rendering engine will apply synopsis rules to
these spans.
Signed-off-by: Jean-Noël Avila <jn.avila@free.fr>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
packed_git_window_size and packed_git_limit are not used anywhere in
the codebase. A search found that all references were removed in
d284713bae (config: make `packed_git_(limit|window_size)` non-global
variables, 2024-12-03), except the ones in this file, as they were moved
to struct repo_settings.
Remove packed_git_window_size and packed_git_limit from environment.h.
Signed-off-by: Arnav Bhate <bhatearnav@gmail.com>
Acked-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Fix a typo in a comment in refs.c: "checking checking" → "checking".
Signed-off-by: Christian Fredrik Johnsen <christian@johnsen.no>
Acked-by: Martin Ågren <martin.agren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
It was reported that using git-pull(1) in a repository whose remote
contains branches with emojis leads to the following bug:
$ git pull
remote: Enumerating objects: 161255, done.
remote: Counting objects: 100% (55884/55884), done.
remote: Compressing objects: 100% (5518/5518), done.
remote: Total 161255 (delta 54253), reused 50509 (delta 50364),
pack-reused 105371 (from 4)
Receiving objects: 100% (161255/161255), 309.90 MiB | 16.87 MiB/s, done.
Resolving deltas: 100% (118048/118048), completed with 13416 local objects.
From github.com:github/github
97ab7ae3f3745..8fb2f9fa180ed master -> origin/master
[...snip many screenfuls of updates to origin remotes...]
BUG: refs/packed-backend.c:984: packed-refs backend yielded reference
preceding its prefix
error: fetch died of signal 6
This issue bisects to 22600c04529 (refs/iterator: implement seeking for
packed-ref iterators, 2025-03-12) where we have implemented seeking for
the packed-ref iterator. As part of that change we introduced a check
that verifies that the iterator only returns refnames bigger than the
prefix. In theory, this check should always hold: when a prefix is set
we know that we would've seeked that prefix first, so we should never
see a reference sorting before that prefix.
But in practice the check itself is misbehaving when handling unicode
characters. The particular issue triggered with a branch that got the
"shaved ice" unicode character in its name, which is composed of the
bytes "0xEE 0x90 0xBF". The bug triggers when we compare the refname
"refs/heads/<shaved-ice>" to something like "refs/heads/z", and it
specifically hits when comparing the first byte, "0xEE".
The root cause is that the most-significant bit of 0xEE is set. The
`refname` and `prefix` pointers that we use to compare bytes with one
another are both pointers to signed characters. As such, when we
dereference the 0xEE byte the result is a _negative_ value, and this
value will of course compare smaller than "z".
We can see that this issue is avoided in `cmp_packed_refname()`, where
we explicitly cast each byte to its unsigned form. Fix the bug by doing
the same in `packed_ref_iterator_advance()`.
Reported-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We ignore any error returned from set_head(), but 638060dcb9 (fetch
set_head: refactor to use remote directly, 2025-01-26) left its call in
a noop "if" conditional as a sort of note-to-self.
When c834d1a7ce (fetch: only respect followRemoteHEAD with configured
refspecs, 2025-03-18) added a "do_set_head" flag, it was rolled into the
same conditional, putting set_head() on the right-hand side of a
short-circuit AND.
That's not wrong, but it really hides the point of the line, which
is (maybe) calling the function.
Instead, let's have a full if() block for the flag, and then our comment
(with some rewording) will be sufficient to clarify the error handling.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The `sparse` job still uses the `ubuntu-20.04` runner pool, but that
pool is about to go away, so let's stop using it.
There is no `sparse-22.04` artifact provided by the "Build sparse for
Ubuntu" Azure Pipeline, but that is not necessary anyway because Ubuntu
22.04 has the `sparse` package: https://packages.ubuntu.com/jammy/sparse
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
With at least glibc 2.39, glibc provides a function declaration that
matches with this POSIX interface:
int regexec(const regex_t *restrict preg, const char *restrict string,
size_t nmatch, regmatch_t pmatch[restrict], int eflags);
such prototype requires variable-length-array for `pmatch'.
Thus, sparse reports this error:
> ../add-patch.c: note: in included file (through ../git-compat-util.h):
> /usr/include/regex.h:682:41: error: undefined identifier '__nmatch'
> /usr/include/regex.h:682:41: error: bad constant expression type
> /usr/include/regex.h:682:41: error: Variable length array is used.
Note: `__nmatch' is POSIX's nmatch.
The glibc's intention is informing their users to provides a large
enough buffer to hold `__nmatch' results and provides diagnosis if
necessary. It's merely a glibc' implementation detail.
Hide that usage from sparse by using standard C11's macro:
__STDC_NO_VLA__
Signed-off-by: Đoàn Trần Công Danh <congdanhqx@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Since we already teach the `repo_config()` in f29f1990 (config:
teach repo_config to allow `repo` to be NULL, 2025-03-08) to allow
`repo` to be NULL, no need to check if `repo` is NULL before calling
`repo_config()`.
Mentored-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Usman Akinyemi <usmanakinyemi202@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* ps/object-wo-the-repository:
hash: stop depending on `the_repository` in `null_oid()`
hash: fix "-Wsign-compare" warnings
object-file: split out logic regarding hash algorithms
delta-islands: stop depending on `the_repository`
object-file-convert: stop depending on `the_repository`
pack-bitmap-write: stop depending on `the_repository`
pack-revindex: stop depending on `the_repository`
pack-check: stop depending on `the_repository`
environment: move access to "core.bigFileThreshold" into repo settings
pack-write: stop depending on `the_repository` and `the_hash_algo`
object: stop depending on `the_repository`
csum-file: stop depending on `the_repository`
The 'git bundle create' command has non-linear performance with the
number of refs in the repository. Benchmarking the command shows that
a large portion of the time (~75%) is spent in the
`object_array_remove_duplicates()` function.
The `object_array_remove_duplicates()` function was added in
b2a6d1c686 (bundle: allow the same ref to be given more than once,
2009-01-17) to skip duplicate refs provided by the user from being
written to the bundle. Since this is an O(N^2) algorithm, in repos with
large number of references, this can take up a large amount of time.
Let's instead use a 'strset' to skip duplicates inside
`write_bundle_refs()`. This improves the performance by around 6 times
when tested against in repository with 100000 refs:
Benchmark 1: bundle (refcount = 100000, revision = master)
Time (mean ± σ): 14.653 s ± 0.203 s [User: 13.940 s, System: 0.762 s]
Range (min … max): 14.237 s … 14.920 s 10 runs
Benchmark 2: bundle (refcount = 100000, revision = HEAD)
Time (mean ± σ): 2.394 s ± 0.023 s [User: 1.684 s, System: 0.798 s]
Range (min … max): 2.364 s … 2.425 s 10 runs
Summary
bundle (refcount = 100000, revision = HEAD) ran
6.12 ± 0.10 times faster than bundle (refcount = 100000, revision = master)
Previously, `object_array_remove_duplicates()` ensured that both the
refname and the object it pointed to were checked for duplicates. The
new approach, implemented within `write_bundle_refs()`, eliminates
duplicate refnames without comparing the objects they reference. This
works because, for bundle creation, we only need to prevent duplicate
refs from being written to the bundle header. The `revs->pending` array
can contain duplicates of multiple types.
First, references which resolve to the same refname. For e.g. "git
bundle create out.bdl master master" or "git bundle create out.bdl
refs/heads/master refs/heads/master" or "git bundle create out.bdl
master refs/heads/master". In these scenarios we want to prevent writing
"refs/heads/master" twice to the bundle header. Since both the refnames
here would point to the same object (unless there is a race), we do not
need to check equality of the object.
Second, refnames which are duplicates but do not point to the same
object. This can happen when we use an exclusion criteria. For e.g. "git
bundle create out.bdl master master^!", Here `revs->pending` would
contain two elements, both with refname set to "master". However, each
of them would be pointing to an INTERESTING and UNINTERESTING object
respectively. Since we only write refnames with INTERESTING objects to
the bundle header, we perform our duplicate checks only on such objects.
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The commit b2a6d1c686 (bundle: allow the same ref to be given more than
once, 2009-01-17) added functionality to detect and remove duplicate
refnames from being added during bundle creation. This ensured that
clones created from such bundles wouldn't barf about duplicate refnames.
The following commit will add some optimizations to make this check
faster, but before doing that, it would be optimal to add tests to
capture the current behavior.
Add tests to capture duplicate refnames provided by the user during
bundle creation. This can be a combination of:
- refnames directly provided by the user.
- refname duplicate by using the '--all' flag alongside manual
references being provided.
- exclusion criteria provided via a refname "main^!".
- short forms of refnames provided, "main" vs "refs/heads/main".
Note that currently duplicates due to usage of short and long forms goes
undetected. This should be fixed with the optimizations made in the next
commit.
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This environment variable existed to allow the testsuite to reuse all
the merge-related tests in the testsuite while easily flipping between
the 'recursive' and the 'ort' backends. Now that we have removed
merge-recursive and remapped 'recursive' to mean 'ort', we don't need
this scaffolding anymore. Remove it from these three builtins.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Both of these existed to allow us to reuse all the merge-related tests
in the testsuite while easily flipping between the 'recursive' and the
'ort' backends. Now that we have removed merge-recursive and remapped
'recursive' to mean 'ort', we don't need this scaffolding anymore.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
As a wise man once told me, "Deleted code is debugged code!" So, move
the functions that are shared between merge-recursive and merge-ort from
the former to the latter, and then debug the remainder of
merge-recursive.[ch].
Joking aside, merge-ort was always intended to replace merge-recursive.
It has numerous advantages over merge-recursive (operates much faster,
can operate without a worktree or index, and fixes a number of known
bugs and suboptimal merges). Since we have now replaced all callers of
merge-recursive with equivalent functions from merge-ort, move the
shared functions from the former to the latter, and delete the remainder
of merge-recursive.[ch].
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
More precisely, replace calls to merge_recursive() with
merge_ort_recursive().
Also change t7615 to quit calling out recursive; it is not needed
anymore, and we are in fact using ort now.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The do_recursive_merge() function, which is somewhat misleadingly named
since its purpose in life is to do a *non*-recursive merge, had code to
allow either using the recursive or ort backends. The default has been
ort for a very long time, let's just remove the code path for allowing
the recursive backend to be selected.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The ort merge strategy has always used the histogram diff algorithm.
The recursive merge strategy, in contrast, defaults to the myers
diff algorithm, while allowing it to be changed.
Change the ort merge strategy to allow different diff algorithms, by
removing the hard coded value in merge_start() and instead just making
it a default in init_merge_options(). Technically, this also changes
the default diff algorithm for the recursive backend too, but we're
going to remove the final callers of the recursive backend in the next
two commits.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Switch from merge-recursive to merge-ort. Adjust the following
testcases due to the switch:
* t6430: most of the test differences here were due to improved D/F
conflict handling explained in more detail in ef527787089c (merge
tests: expect improved directory/file conflict handling in ort,
2020-10-26). These changes weren't made to this test back in that
commit simply because I had been looking at `git merge` rather than
`git merge-recursive`. The final test in this testsuite, though, was
expunged because it was looking for specific output, and the calls to
output_commit_title() were discarded from merge_ort_internal() in its
adaptation from merge_recursive_internal(); see 8119214f4e70
(merge-ort: implement merge_incore_recursive(), 2020-12-16).
* t6434: This test is built entirely around rename/delete conflicts,
which had a suboptimal handling under merge-recursive. As explained
in more detail in commits 1f3c9ba707 ("t6425: be more flexible with
rename/delete conflict messages", 2020-08-10) and 727c75b23f ("t6404,
t6423: expect improved rename/delete handling in ort backend",
2020-10-26), rename/delete conflicts should each have two entries in
the index rather than just one. Adjust the expectations for all the
tests in this testcase to see the two entries per rename/delete
conflict.
* t6424: merge-recursive had a special check-if-toplevel-trees-match
check that it ran at the beginning on both the merge-base and the
other side being merged in. In such a case, it exited early and
printed an "Already up to date." message. merge-ort got rid of
this, and instead checks the merge base tree matching the other
side throughout the tree instead of just at the toplevel, allowing
it to avoid recursing into various subtrees. As part of that, it
got rid of the specialty toplevel message. That message hasn't
been missed for years from `git merge`, so I don't think it is
necessary to keep it just for `git merge-recursive`, especially
since the latter is rarely used. (git itself only references it
in the testsuite, whereas it used to power one of the three
rebase backends that existed once upon a time.)
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Replace the use of merge_trees() from merge-recursive.[ch] with the
merge-ort equivalent.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Layout configuration in vimdiff backend didn't work as advertised,
which has been corrected.
* fr/vimdiff-layout-fixes:
mergetools: vimdiff: add tests for layout with REMOTE as the target
mergetools: vimdiff: fix layout where REMOTE is the target
Incrementally updating multi-pack index files.
* tb/incremental-midx-part-2:
midx: implement writing incremental MIDX bitmaps
pack-bitmap.c: use `ewah_or_iterator` for type bitmap iterators
pack-bitmap.c: keep track of each layer's type bitmaps
ewah: implement `struct ewah_or_iterator`
pack-bitmap.c: apply pseudo-merge commits with incremental MIDXs
pack-bitmap.c: compute disk-usage with incremental MIDXs
pack-bitmap.c: teach `rev-list --test-bitmap` about incremental MIDXs
pack-bitmap.c: support bitmap pack-reuse with incremental MIDXs
pack-bitmap.c: teach `show_objects_for_type()` about incremental MIDXs
pack-bitmap.c: teach `bitmap_for_commit()` about incremental MIDXs
pack-bitmap.c: open and store incremental bitmap layers
pack-revindex: prepare for incremental MIDX bitmaps
Documentation: describe incremental MIDX bitmaps
Documentation: remove a "future work" item from the MIDX docs
Make the code in reftable library less reliant on the service
routines it used to borrow from Git proper, to make it easier to
use by external users of the library.
* ps/reftable-sans-compat-util:
Makefile: skip reftable library for Coccinelle
reftable: decouple from Git codebase by pulling in "compat/posix.h"
git-compat-util.h: split out POSIX-emulating bits
compat/mingw: split out POSIX-related bits
reftable/basics: introduce `REFTABLE_UNUSED` annotation
reftable/basics: stop using `SWAP()` macro
reftable/stack: stop using `sleep_millisec()`
reftable/system: introduce `reftable_rand()`
reftable/reader: stop using `ARRAY_SIZE()` macro
reftable/basics: provide wrappers for big endian conversion
reftable/basics: stop using `st_mult()` in array allocators
reftable: stop using `BUG()` in trivial cases
reftable/record: don't `BUG()` in `reftable_record_cmp()`
reftable/record: stop using `BUG()` in `reftable_record_init()`
reftable/record: stop using `COPY_ARRAY()`
reftable/blocksource: stop using `xmmap()`
reftable/stack: stop using `write_in_full()`
reftable/stack: stop using `read_in_full()`
TCP keepalive behaviour on http transports can now be configured by
calling cURL library.
* tb/http-curl-keepalive:
http.c: allow custom TCP keepalive behavior via config
http.c: inline `set_curl_keepalive()`
http.c: introduce `set_long_from_env()` for convenience
http.c: remove unnecessary casts to long
Give more meaningful error return values from block writer layer of
the reftable ref-API backend.
* ms/reftable-block-writer-errors:
reftable: adapt write_object_record() to propagate block_writer_add() errors
reftable: adapt writer_add_record() to propagate block_writer_add() errors
reftable: propagate specific error codes in block_writer_add()
Ensure what we write in assert() does not have side effects,
and introduce ASSERT() macro to mark those that cannot be
mechanically checked for lack of side effects.
* en/assert-wo-side-effects:
treewide: replace assert() with ASSERT() in special cases
ci: add build checking for side-effects in assert() calls
git-compat-util: introduce ASSERT() macro
When updating multiple references through stdin, Git's update-ref
command normally aborts the entire transaction if any single update
fails. This atomic behavior prevents partial updates. Introduce a new
batch update system, where the updates the performed together similar
but individual updates are allowed to fail.
Add a new `--batch-updates` flag that allows the transaction to continue
even when individual reference updates fail. This flag can only be used
in `--stdin` mode and builds upon the batch update support added to the
refs subsystem in the previous commits. When enabled, failed updates are
reported in the following format:
rejected SP (<old-oid> | <old-target>) SP (<new-oid> | <new-target>) SP <rejection-reason> LF
Update the documentation to reflect this change and also tests to cover
different scenarios where an update could be rejected.
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Acked-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The `refs_verify_refnames_available()` is used to batch check refnames
for F/D conflicts. While this is the more performant alternative than
its individual version, it does not provide rejection capabilities on a
single update level. For batched updates, this would mean a rejection of
the entire transaction whenever one reference has a F/D conflict.
Modify the function to call `ref_transaction_maybe_set_rejected()` to
check if a single update can be rejected. Since this function is only
internally used within 'refs/' and we want to pass in a `struct
ref_transaction *` as a variable. We also move and mark
`refs_verify_refnames_available()` to 'refs-internal.h' to be an
internal function.
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Acked-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Git supports making reference updates with or without transactions.
Updates with transactions are generally better optimized. But
transactions are all or nothing. This means, if a user wants to batch
updates to take advantage of the optimizations without the hard
requirement that all updates must succeed, there is no way currently to
do so. Particularly with the reftable backend where batching multiple
reference updates is more efficient than performing them sequentially.
Introduce batched update support with a new flag,
'REF_TRANSACTION_ALLOW_FAILURE'. Batched updates while different from
transactions, use the transaction infrastructure under the hood. When
enabled, this flag allows individual reference updates that would
typically cause the entire transaction to fail due to non-system-related
errors to be marked as rejected while permitting other updates to
proceed. System errors referred by 'REF_TRANSACTION_ERROR_GENERIC'
continue to result in the entire transaction failing. This approach
enhances flexibility while preserving transactional integrity where
necessary.
The implementation introduces several key components:
- Add 'rejection_err' field to struct `ref_update` to track failed
updates with failure reason.
- Add a new struct `ref_transaction_rejections` and a field within
`ref_transaction` to this struct to allow quick iteration over
rejected updates.
- Modify reference backends (files, packed, reftable) to handle
partial transactions by using `ref_transaction_set_rejected()`
instead of failing the entire transaction when
`REF_TRANSACTION_ALLOW_FAILURE` is set.
- Add `ref_transaction_for_each_rejected_update()` to let callers
examine which updates were rejected and why.
This foundational change enables batched update support throughout the
reference subsystem. A following commit will expose this capability to
users by adding a `--batch-updates` flag to 'git-update-ref(1)',
providing both a user-facing feature and a testable implementation.
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Acked-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Replace preprocessor-defined transaction errors with a strongly-typed
enum `ref_transaction_error`. This change:
- Improves type safety and function signature clarity.
- Makes error handling more explicit and discoverable.
- Maintains existing error cases, while adding new error cases for
common scenarios.
This refactoring paves the way for more comprehensive error handling
which we will utilize in the upcoming commits to add batch reference
update support.
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Acked-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Extract the core logic for preparing individual reference updates from
`reftable_be_transaction_prepare()` into `prepare_single_update()`. This
dedicated function now handles all validation and preparation steps for
each reference update in the transaction, including object ID
verification, HEAD reference handling, and symref processing.
The refactoring consolidates all reference update validation into a
single logical block, which improves code maintainability and
readability. More importantly, this restructuring lays the groundwork
for implementing batched reference update support in the reftable
backend, which will be introduced in a followup commit.
No functional changes are included in this commit - it is purely a code
reorganization to support future enhancements.
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Acked-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Within the files reference backend's transaction's 'finish' phase, a
verification step is currently performed wherein the refnames list is
sorted and examined for multiple updates targeting the same refname.
It has been observed that this verification is redundant, as an
identical check is already executed during the transaction's 'prepare'
stage. Since the refnames list remains unmodified following the
'prepare' stage, this secondary verification can be safely eliminated.
The duplicate check has been removed accordingly, and the
`ref_update_reject_duplicates()` function has been marked as static, as
its usage is now confined to 'refs.c'.
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Acked-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Move the tracking of refnames in `affected_refnames` from individual
backends into the generic layer in 'refs.c'. This centralizes the
duplicate refname detection that was previously handled separately by
each backend.
Make some changes to accommodate this move:
- Add a `string_list` field `refnames` to `ref_transaction` to contain
all the references in a transaction. This field is updated whenever
a new update is added via `ref_transaction_add_update`, so manual
additions in reference backends are dropped.
- Modify the backends to use this field internally as needed. The
backends need to check if an update for refname already exists when
splitting symrefs or adding an update for 'HEAD'.
- In the reftable backend, within `reftable_be_transaction_prepare()`,
move the `string_list_has_string()` check above
`ref_transaction_add_update()`. Since `ref_transaction_add_update()`
automatically adds the refname to `transaction->refnames`,
performing the check after will always return true, so we perform
the check before adding the update.
This helps reduce duplication of functionality between the backends and
makes it easier to make changes in a more centralized manner.
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Acked-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In `split_symref_update()`, there were two checks for duplicate
refnames:
- At the start, `string_list_has_string()` ensures the refname is not
already in `affected_refnames`, preventing duplicates from being
added.
- After adding the refname, another check verifies whether the newly
inserted item has a `util` value.
The second check is unnecessary because the first one guarantees that
`string_list_insert()` will never encounter a preexisting entry.
The `item->util` field is assigned to validate that a rename doesn't
already exist in the list. The validation is done after the first check.
As this check is removed, clean up the validation and the assignment of
this field in `split_head_update()` and `files_transaction_prepare()`.
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Acked-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
By default, git-maintenance(1) uses the "gc" task to ensure that the
repository is well-maintained. This can be changed, for example by
either explicitly configuring which tasks should be enabled or by using
the "incremental" maintenance strategy. If so, git-maintenance(1) does
not know to expire reflog entries, which is a subtask that git-gc(1)
knows to perform for the user. Consequently, the reflog will grow
indefinitely unless the user manually trims it.
Introduce a new "reflog-expire" task that plugs this gap:
- When running the task directly, then we simply execute `git reflog
expire --all`, which is the same as git-gc(1).
- When running git-maintenance(1) with the `--auto` flag, then we only
run the task in case the "HEAD" reflog has at least N reflog entries
that would be discarded. By default, N is set to 100, but this can
be configured via "maintenance.reflog-expire.auto". When a negative
integer has been provided we always expire entries, zero causes us
to never expire entries, and a positive value specifies how many
entries need to exist before we consider pruning the entries.
Note that the condition for the `--auto` flags is merely a heuristic and
optimized for being fast. This is because `git maintenance run --auto`
will be executed quite regularly, so scanning through all reflogs would
likely be too expensive in many repositories.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We're about to introduce a new task for git-maintenance(1) that knows to
expire reflog entries. The logic will be shared with git-gc(1), which
already knows how to do this.
Pull out the common logic into a separate function so that we can share
the implementation between both builtins.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Make functions that are required to manage `reflog_expire_options`
available elsewhere by moving them into "reflog.c" and exposing them in
the corresponding header. The functions will be used in a subsequent
commit.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
As described in the preceding commit, the per-reflog expiry dates are
stored in a global pair of variables. Refactor the code so that they are
contained in `struct reflog_expire_options` to make the structure useful
in other contexts.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When expiring reflog entries, it is possible to configure expiry dates
that depend on the name of the reflog. This requires us to store a
couple of different expiry dates:
- The default expiry date for reflog entries that aren't otherwise
specified.
- The per-reflog expiry date.
- The currently active set of expiry dates for a given reference.
While the last item is stored in `struct reflog_expire_options`, the
other items aren't, which makes it hard to reuse the structure in other
places.
Refactor the code so that the default expiry date is stored as part of
the structure. The per-reflog expiry dates will be adapted accordingly
in the subsequent commit.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We're about to expose `struct cmd_reflog_expire_cb` via "reflog.h" so
that we can also use this structure in "builtin/gc.c". Once we make it
accessible to a wider scope though it becomes awkwardly named, as it
isn't only useful in the context of a callback. Instead, the function is
containing all kinds of options relevant to whether or not a reflog
entry should be expired.
Rename the structure to `reflog_expire_options` to prepare for this.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Code captured errors but did not process them further.
This treated all failures the same without distinguishing SMTP status.
Add handle-smtp_error to extract SMTP status codes using a regex (as
defined in RFC 5321) and handle errors as follows:
- No error present:
- If a result is provided, return 1 to indicate success.
- Otherwise, return 0 to indicate failure.
- Error present with a captured three-digit status code:
- For 4yz (transient errors), return 1 and allow retries.
- For 5yz (permanent errors), return 0 to indicate failure.
- For any other recognized status code, return 1, treating it as
a transient error.
- Error present but no status code found:
- Return 1 as a transient error.
Signed-off-by: Zheng Yuting <05ZYT30@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Auth relied solely on return values without catching errors. This misjudges
non-credential errors as auth failure without error info.
Patch wraps the entire auth process in an eval {} block to catch
all exceptions, including non-credential errors. It adds a new $error var,
uses 'or do' to prevent flow break, and returns $result ? 1 : 0. And merges
if/else branches, integrates SASL and basic auth, with comments for
future status code handling.
Signed-off-by: Zheng Yuting <05ZYT30@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The logic to print individual blocks in a table is hosted in the
reftable library. This is only the case due to historical reasons though
because users of the library had no interfaces to read blocks one by
one. Otherwise, printing individual blocks has no place in the reftable
library given that the format will not be generic in the first place.
We have now grown a public interface to iterate through blocks contained
in a table, and thus we can finally move the logic to print them into
the test helper.
Move over the logic and refactor it accordingly. Note that the iterator
also trivially allows us to access index sections, which we previously
didn't print at all. This omission wasn't intentional though, so start
dumping those sections as well so that we can assert that indices are
written as expected.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Now that reftable blocks can be read individually via the public
interface it becomes necessary for callers to be able to distinguish the
different types of blocks. Expose the relevant constants.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Introduce a new iterator that allows the caller to iterate through all
blocks contained in a table. This gives users more fine-grained control
over how exactly those blocks are being read and exposes information to
callers that was previously inaccessible.
This iterator will be required by a future patch series that adds
consistency checks for the reftable backend. In addition to that though
we will also reimplement `reftable_table_print_blocks()` on top of this
new iterator in a subsequent commit.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The `reftable_table` interface is an internal implementation detail that
callers have no access to. Having direct access to this structure is
important though for a subsequent patch series that will implement
consistency checks for the reftable backend.
Move the structure into "reftable-table.h" so that it part of the public
interface.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Expose a generic iterator over reftable records and expose it via the
public interface. Together with an upcoming iterator for reftable blocks
contained in a table this will allow users to trivially iterate through
blocks and their respective records individually.
This functionality will be used to implement consistency checks for the
reftable backend, which requires more fine-grained control over how we
read data.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Refactor the block iterators so that initialization and seeking are
different from one another. This makes the iterator trivially reseekable
by storing the pointer to the block at initialization time, which we can
then reuse on every seek.
This refactoring prepares the code for exposing a `reftable_iterator`
interface for blocks in a subsequent commit. Callsites are adjusted
accordingly.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The block iterator requires access to a bunch of data from the
underlying `reftable_block` that it is iterating over. This data is
stored by copying over relevant data into a separate set of variables.
This has multiple downsides:
- We require more storage space than necessary. This is more of a
theoretical issue as we shouldn't ever have many blocks.
- We have to perform more bookkeeping, and the variable names are
inconsistent across the two data structures. This can lead to some
confusion.
- The lifetime of the block iterator is tied to the block anyway, but
we hide that a bit by only storing pointers pointing into the block.
There isn't really any good reason why we rip out parts of the block
instead of storing a pointer to the block itself.
Refactor the code to do so. Despite being simpler, it also allows us to
decouple the lifetime of the block iterator from seeking in a subsequent
commit.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
While users of the reftable library wouldn't generally require access to
individual blocks in a reftable table, there are valid usecases where
one may require low-level access to them. One such upcoming usecase in
the Git codebase is to implement consistency checks for the reftable
library where we want to verify each block individually.
Create a public interface for reading blocks. The interface isn't yet
complete and lacks e.g. a way to read individual records from a block.
Such missing functionality will be backfilled in subsequent commits.
Note that this change also requires us to expose `reftable_buf`, which
is used by the `reftable_block_first_key()` function.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Throughout the Git codebase we're using the typedeffed version of
`z_stream`, which maps to `struct z_stream_s`. By using a typedef
instead of the struct it becomes somewhat harder to predeclare the
symbol so that headers depending on the struct can do so without having
to pull in "zlib-compat.h".
We don't yet have users that would really care about this: the only
users that declare `z_stream` as a pointer are in "reftable/block.h",
which is a header that is internal to the reftable library. But in the
next step we're going to expose the `struct reftable_block` publicly,
and that struct does contain a pointer to `z_stream`. And as the public
header shouldn't depend on "reftable/system.h", which is an internal
implementation detail, we won't have the typedef for `z_stream` readily
available.
Prepare for this change by using `struct z_stream_s` throughout our code
base. In case zlib-ng is used we use a define to map from `z_stream_s`
to `zng_stream_s`.
Drop the pre-declaration of `struct z_stream` while at it. This struct
does not exist in the first place, and the declaration wasn't needed
because "reftable/block.h" already includes "reftable/basics.h" which
transitively includes "reftable/system.h" and thus "git-zlib.h".
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The `block_reader` structure is used to access parsed data of a reftable
block. The structure is currently treated as an internal implementation
detail and not exposed via our public interfaces. The functionality
provided by the structure is useful to external users of the reftable
library though, for example when implementing consistency checks that
need to scan through the blocks manually.
Rename the structure to `reftable_block` now that the name has been made
available in the preceding commit. This name is in line with the naming
schema used for other data structures like `reftable_table` in that it
describes the underlying entity that it provides access to.
The new data structure isn't yet exposed via the public interface, which
is left for a subsequent commit.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The `reftable_block` structure associates a byte slice with a block
source. As such it only holds the data of a reftable block without
actually encoding any of the details for how to access that data.
Rename the structure to instead be called `reftable_block_data`. Besides
clarifying that this really only holds data, it also allows us to rename
the `reftable_block_reader` to `reftable_block` in the next commit, as
this is the structure that actually encapsulates access to the reftable
blocks.
Rename the `struct reftable_block_reader::block` member accordingly.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The logic to read blocks from a reftable is scattered across both the
table and the block subsystems. Besides causing somewhat fuzzy
responsibilities, it also means that we have to awkwardly pass around
the ownership of blocks between the subsystems.
Refactor the code so that we stop passing the block when initializing a
reader, but instead by passing in the block source plus the offset at
which we're supposed to read a block. Like this, the ownership of the
block itself doesn't need to get handed over as the block reader is the
one owning the block right from the start.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Restart points record the location of reftable records that do not use
prefix compression and are used to perform a binary search inside of a
block. These restart points are encoded at the end of a block, between
the record data and the footer of a table.
The block structure contains three different variables related to these
restart points:
- The block length contains the length of the reftable block up to the
restart points.
- The restart count contains the number of restart points contained in
the block.
- The restart bytes variable tracks where the restart point data
begins.
Tracking all three of these variables is unnecessary though as the data
can be derived from one another: the block length without restart points
is the exact same as the offset of the restart count data, which we
already track via the `restart_bytes` data.
Refactor the code so that we track the location of restart bytes not as
a pointer, but instead as an offset. This allows us to trivially get rid
of the `block_len` variable as described above. This avoids having the
confusing `block_len` variable and allows us to do less bookkeeping
overall.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The code that implements block sources is distributed across a couple of
files. Consolidate all of it into "reftable/blocksource.c" and its
accompanying header so that it is easier to locate and more self
contained.
While at it, rename some of the functions to have properly scoped names.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The `struct reftable_reader` subsystem encapsulates a table that has
been read from the disk. As such, the current name of that structure is
somewhat hard to understand as it only talks about the fact that we read
something from disk, without really giving an indicator _what_ that is.
Furthermore, this naming schema doesn't really fit well into how the
other structures are named: `reftable_merged_table`, `reftable_stack`,
`reftable_block` and `reftable_record` are all named after what they
encapsulate.
Rename the subsystem to `reftable_table`, which directly gives a hint
that the data structure is about handling the individual tables part of
the stack.
While this change results in a lot of churn, it prepares for us exposing
the APIs to third-party callers now that the reftable library is a
standalone library that can be linked against by other projects.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The license headers used across the reftable library doesn't follow our
typical coding style for multi-line comments. Fix it.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The 'git-blame(1)' command allows users to ignore specific revisions via
the '--ignore-rev <rev>' and '--ignore-revs-file <file>' flags. These
flags are often combined with the 'blame.markIgnoredLines' and
'blame.markUnblamableLines' config options. These config options prefix
ignored and unblamable lines with a '?' and '*', respectively.
However, this option was never extended to the porcelain mode of
'git-blame(1)'. Since the documentation does not indicate this
exclusion, it is a bug.
Fix this by printing 'ignored' and 'unblamable' respectively for the
options when using the porcelain modes.
Helped-by: Patrick Steinhardt <ps@pks.im>
Helped-by: Toon Claes <toon@iotcl.com>
Helped-by: Phillip Wood <phillip.wood123@gmail.com>
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We use Perl due to two different reasons in t5703:
- To filter advertised capabilities.
- To set up a CGI script with HTTPD.
Refactor the first category to use `test_grep` instead. Refactoring the
second category would be a bit more involved, so instead we add the
PERL_TEST_HELPERS prerequisite to those individual tests now.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The `max_chain()` helper function is used to extract the maximum delta
chain of a packfile as printed by git-index-pack(1). The script uses
Perl to extract that data, but it can be trivially refactored to use
awk(1) instead.
Refactor the helper accordingly so that we can drop a couple of
PERL_TEST_HELPERS prerequisites.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The output generated by our trace2 mechanism contains several fields
that are dependent on the environment they're being run in, which makes
it somewhat harder to test it. As a countermeasure we scrub the output
and strip out any fields that contain such information.
The logic to do so is implemented in Perl, but it can be trivially
ported to instead use sed(1). Refactor the code accordingly so that we
can drop the PERL_TEST_HELPERS prerequisite.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The `generate_random_characters()` helper function generates N
random characters in the range 'a-z' and writes them into a file. The
logic currently uses Perl, but it can be adapted rather easily by:
- Making `test-tool genrandom` generate an infinite stream.
- Using `tr -dc` to strip all characters which aren't in the range of
'a-z'.
- Using `test_copy_bytes()` to copy the first N bytes.
This allows us to drop the PERL_TEST_HELPERS prerequisite.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Our Apache HTTPD setup exposes an "one_time_perl" endpoint to access
repositories. If used, we execute the "apply-one-time-perl.sh" CGI
script that checks whether we have a "one-time-perl" script. If so, that
script gets executed so that it can munge what would be served. Once
done, the script gets removed so that it doesn't execute a second time.
As the name says, this functionality expects the user to pass a Perl
script. This isn't really necessary though: we can just as easily
implement the same thing with arbitrary scripts.
Refactor the code so that we instead expect an arbitrary script to
exist and rename the functionality to "one-time-script". Adapt callers
to use shell utilities instead of Perl so that we can drop the
PERL_TEST_HELPERS prerequisite.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The `name_from_description()` test helper uses Perl to munge a given
description and convert it into a name. Refactor it to instead use a
combination of sed(1) and tr(1) so that we drop PERL_TEST_HELPERS
prerequisites in users of this library.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The `sanitize_pgp()` test helper uses Perl to strip PGP signatures from
stdin. Refactor it to instead use sed(1) so that we drop the
PERL_TEST_HELPERS prerequisite in users of this library.
Note that we have to add PERL_TEST_HELPERS to a subset of tests in t6300
now that the test suite doesn't bail out early anymore in case the
prerequisite isn't set.
Helped-by: Johannes Schindelin <Johannes.Schindelin@gmx.de>
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We have a couple of tests that depend on Perl for textconv scripts.
Refactor these tests to instead be implemented via shell utilities so
that we can drop a couple of PERL_TEST_HELPERS prerequisites.
Note that the conversion in t4030 is not a one-to-one equivalent to the
previous textconv script. Before this change we used to essentially do a
hexdump via Perl. The obvious conversion here would be to use `test-tool
hexdump` like we do for the other tests. But this would lead to a ripple
effect where we would have to adapt a bunch of other tests with a bunch
of seemingly unrelated changes, which would be somewhat awkward.
Instead, we're going with the minimum viable change: the test files we
write contain "\001" and "\000", and the test's expectation is that
those get translated into proper ASCII characters. So instead of doing a
full hexdump, we simply use tr(1) to translate these specific bytes.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
A bunch of tests rely on Perl to print data in various different ways.
These usages fall into the following categories:
- Print data conditionally by matching patterns. These usecases can be
converted to use awk(1) rather easily.
- Print data repeatedly. These usecases can typically be converted to
use a combination of `test-tool genzeros` and sed(1).
- Print data in reverse. These usecases can be converted to use
awk(1) or `sort -r`.
Refactor the tests accordingly so that we can drop a couple of
PERL_TEST_HELPERS prerequisites.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We have a bunch of tests that use Perl to perform substitution via the
"s/" operator. These usecases can be trivially replaced with sed(1) and
tr(1).
Refactor the tests accordingly so that we can drop a couple of
PERL_TEST_HELPERS prerequisites.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We have a bunch of tests that use Perl to perform character
transliteration via the "y/" or "tr/" operator. These usecases can be
trivially replaced with tr(1).
Refactor the tests accordingly so that we can drop a couple of
PERL_TEST_HELPERS prerequisites.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The Makefile for our tests has a couple of targets that depend on Perl.
Adapt those targets to only run conditionally in case Perl is available
on the system so that it becomes possible to run the test suite without
Perl.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The Perl interpreter used to be a strict dependency for running our test
suite. This requirement is explicit in the Meson build system, where we
require Perl to be present unless tests have been disabled.
With the preceding commits we have loosened this restriction so that it
is now possible to run tests when Perl is unavailable. Loosen the above
requirement accordingly.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
A couple of our tests depend on the PERL prerequisite even though it
isn't needed. These tests fall into one of the following classes:
- The underlying logic used to be implemented in Perl but isn't
anymore. Here we can simply drop the dependency altogether.
- The test logic used to depend on Perl but doesn't anymore. Again, we
can simply drop the dependency.
- The test logic still relies on a Perl interpreter. These tests
should use the newly introduced PERL_TEST_HELPERS prerequisite.
Adapt test cases accordingly.
Note that in t1006 we have to introduce another new prerequisite
depending on whether or not the IPC::Open2 module is available. Funny
enough, when starting to use `test_lazy_prereq` to do so we also get a
conflict of variables with the "script" variable that contains the Perl
logic because `test_run_lazy_prereq_` also sets that variable. We thus
rename the variable in t1006 to "perl_script".
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In the early days of Git, Perl was used quite prominently throughout the
project. This has changed significantly as almost all of the executables
we ship nowadays have eventually been rewritten in C. Only a handful of
subsystems remain that require Perl:
- gitweb, a read-only web interface.
- A couple of scripts that allow importing repositories from GNU Arch,
CVS and Subversion.
- git-send-email(1), which can be used to send mails.
- git-request-pull(1), which is used to request somebody to pull from
a URL by sending an email.
- git-filter-branch(1), which uses Perl with the `--state-branch`
option. This command is typically recommended against nowadays in
favor of git-filter-repo(1).
- Our Perl bindings for Git.
- The netrc Git credential helper.
None of these subsystems can really be considered to be part of the
"core" of Git, and an installation without them is fully functional.
It is more likely than not that an end user wouldn't even notice that
any features are missing if those tools weren't installed. But while
Perl nowadays very much is an optional dependency of Git, there is a
significant limitation when Perl isn't available: developers cannot run
our test suite.
Preceding commits have started to lift this restriction by removing the
strict dependency on Perl in many central parts of the test library. But
there are still many tests that rely on small Perl helpers to do various
different things.
Introduce a new PERL_TEST_HELPERS prerequisite that guards all tests
that require Perl. This prerequisite is explicitly different than the
preexisting PERL prerequisite:
- PERL records whether or not features depending on the Perl
interpreter are built.
- PERL_TEST_HELPERS records whether or not a Perl interpreter is
available for our tests.
By having these two separate prerequisites we can thus distinguish
between tests that inherently depend on Perl because the underlying
feature does, and those tests that depend on Perl because the test
itself is using Perl.
Adapt all tests to set the PERL_TEST_HELPERS prerequisite as needed.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The `test_readlink()` helper function reads a symbolic link and returns
the path it is pointing to. It is thus equivalent to the readlink(1)
utility, which isn't available on all supported platforms. As such, it
is implemented using Perl so that we can use it even on platforms where
the shell utility isn't available.
While using readlink(1) is not an option, what we can do is to implement
the logic ourselves in our test-tool. Do so, which allows a bunch of
tests to pass when Perl is not available.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The `test_copy_bytes()` helper function copies up to N bytes from stdin
to stdout. This is implemented using Perl, but it can be trivially
adapted to instead use dd(1).
Refactor the helper accordingly, which allows a bunch of tests to pass
when Perl is not available.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We have a couple of helper functions that translate characters, e.g.
from LF to NUL or NUL to 'Q' and vice versa. These helpers use Perl
scripts, but they can be trivially adapted to instead use tr(1).
Note that one specialty here is the handling of NUL characters in tr(1),
which historically wasn't implemented correctly on all platforms. But
quoting tr(1p):
It was considered that automatically stripping NUL characters from
the input was not correct functionality. However, the removal of -n
in a later proposal does not remove the requirement that tr
correctly process NUL characters in its input stream.
So when tr(1) is implemented following the POSIX standard then it is
expected to handle the transliteration of NUL just fine.
Refactor the helpers accordingly, which allows a bunch of tests to pass
when Perl is not available.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Before executing tests we first sanitize the environment. Part of the
sanitization is to unset a couple of environment variables that we know
will change the behaviour of Git. This is done with a small Perl script,
which has the consequence that having a Perl interpreter available is a
strict requirement for running our unit tests.
The logic itself isn't particularly involved: we simply unset every
environment variable whose key starts with 'GIT_', but then explicitly
allow a subset of these.
Refactor the logic to instead use sed(1) so that it becomes possible to
execute our tests without Perl.
Based-on-patch-by: Johannes Schindelin <Johannes.Schindelin@gmx.de>
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Our chainlint script verifies that test files have proper '&&' chains.
This script is written in Perl and executed for every test file before
executing the test logic itself.
In subsequent commits we're about to refactor our test suite so that
Perl becomes an optional dependency, only. And while it is already
possible to disable this linter, developers that don't have Perl
available at all would always have to disable the linter manually, which
is rather cumbersome.
Disable the chain linter automatically in case PERL_PATH isn't set to
make this a bit less annoying. Bail out with an error in case the
developer has asked explicitly for the chain linter.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
While it is now possible to filter objects by type, this mechanism is
for now mostly a convenience. Most importantly, we still have to iterate
through the whole packfile to find all objects of a specific type. This
can be prohibitively expensive depending on the size of the packfiles.
It isn't really possible to do better than this when only considering a
packfile itself, as the order of objects is not fixed. But when we have
a packfile with a corresponding bitmap, either because the packfile
itself has one or because the multi-pack index has a bitmap for it, then
we can use these bitmaps to improve the runtime.
While bitmaps are typically used to compute reachability of objects,
they also contain one bitmap per object type that encodes which object
has what type. So instead of reading through the whole packfile(s), we
can use the bitmaps and iterate through the type-specific bitmap.
Typically, only a subset of packfiles will have a bitmap. But this isn't
really much of a problem: we can use bitmaps when available, and then
use the non-bitmap walk for every packfile that isn't covered by one.
Overall, this leads to quite a significant speedup depending on how many
objects of a certain type exist. The following benchmarks have been
executed in the Chromium repository, which has a 50GB packfile with
almost 25 million objects. As expected, there isn't really much of a
change in performance without an object filter:
Benchmark 1: cat-file with no-filter (revision = HEAD~)
Time (mean ± σ): 89.675 s ± 4.527 s [User: 40.807 s, System: 10.782 s]
Range (min … max): 83.052 s … 96.084 s 10 runs
Benchmark 2: cat-file with no-filter (revision = HEAD)
Time (mean ± σ): 88.991 s ± 2.488 s [User: 42.278 s, System: 10.305 s]
Range (min … max): 82.843 s … 91.271 s 10 runs
Summary
cat-file with no-filter (revision = HEAD) ran
1.01 ± 0.06 times faster than cat-file with no-filter (revision = HEAD~)
We still have to scan through all objects as we yield all of them, so
using the bitmap in this case doesn't really buy us anything. What is
noticeable in this benchmark is that we're I/O-bound, not CPU-bound, as
can be seen from the user/system runtimes, which combined are way lower
than the overall benchmarked runtime.
But when we do use a filter we can see a significant improvement:
Benchmark 1: cat-file with filter=object:type=commit (revision = HEAD~)
Time (mean ± σ): 86.444 s ± 4.081 s [User: 36.830 s, System: 11.312 s]
Range (min … max): 80.305 s … 93.104 s 10 runs
Benchmark 2: cat-file with filter=object:type=commit (revision = HEAD)
Time (mean ± σ): 2.089 s ± 0.015 s [User: 1.872 s, System: 0.207 s]
Range (min … max): 2.073 s … 2.119 s 10 runs
Summary
cat-file with filter=object:type=commit (revision = HEAD) ran
41.38 ± 1.98 times faster than cat-file with filter=object:type=commit (revision = HEAD~)
This is because we don't have to scan through all packfiles anymore, but
can instead directly look up relevant objects.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Pull out a common function that allows us to iterate over all objects in
a repository. Right now the logic is trivial and would only require two
function calls, making this refactoring a bit pointless. But in the next
commit we will iterate on this logic to make use of bitmaps, so this is
about to become a bit more complex.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Introduce a function that allows us to verify whether a pack is
bitmapped or not. This functionality will be used in a subsequent
commit.
Helped-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Introduce a function that allows the caller to iterate over all
bitmapped objects that match a given filter. This mechanism will be used
in a subsequent commit to optimize object filters in git-cat-file(1).
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The `show_reachable_fn` callback is used by a couple of functions to
present reachable objects to the caller. The function does not provide a
way for the caller to pass a payload though, which is functionality that
we'll require in a subsequent commit.
Change the callback type to accept a payload and adapt all callsites
accordingly.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Implement support for the "object:type=" filter in git-cat-file(1),
which causes us to omit all objects that don't match the provided object
type.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Implement support for the "blob:limit=" filter in git-cat-file(1), which
causes us to omit all blobs that are bigger than a certain size.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Implement support for the "blob:none" filter in git-cat-file(1), which
causes us to omit all blobs.
Note that this new filter requires us to read the object type via
`oid_object_info_extended()` in `batch_object_write()`. But as we try to
optimize away reading objects from the database the `data->info.typep`
pointer may not be set. We thus have to adapt the logic to conditionally
set the pointer in cases where the filter is given.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In batch mode, git-cat-file(1) enumerates all objects and prints them
by iterating through both loose and packed objects. This works without
considering their reachability at all, and consequently most options to
filter objects as they exist in e.g. git-rev-list(1) are not applicable.
In some situations it may still be useful though to filter objects based
on properties that are inherent to them. This includes the object size
as well as its type.
Such a filter already exists in git-rev-list(1) with the `--filter=`
command line option. While this option supports a couple of filters that
are not applicable to our usecase, some of them are quite a neat fit.
Wire up the filter as an option for git-cat-file(1). This allows us to
reuse the same syntax as in git-rev-list(1) so that we don't have to
reinvent the wheel. For now, we die when any of the filter options has
been passed by the user, but they will be wired up in subsequent
commits.
Further note that the filters that we are about to introduce don't
significantly speed up the runtime of git-cat-file(1). While we can skip
emitting a lot of objects in case they are uninteresting to us, the
majority of time is spent reading the packfile, which is bottlenecked by
I/O and not the processor. This will change though once we start to make
use of bitmaps, which will allow us to skip reading the whole packfile.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We have multiple callsites that report the status of an object, for
example when the objec tis missing or its name is ambiguous. We're about
to add a couple more such callsites to report on "excluded" objects.
Prepare for this by introducing a new function `report_object_status()`
that encapsulates the functionality.
Note that this function also flushes stdout, which is a requirement so
that request-response style batched modes can learn about the status
before proceeding to the next object. We already flush correctly at all
existing callsites, even though the flush in `batch_one_object()` only
comes after the switch statement. That flush is now redundant, and we
could in theory deduplicate it by moving it into all branches that don't
use `report_object_status()`. But that doesn't quite feel sensible:
- The duplicate flush should ultimately just be a no-op for us and
thus shouldn't impact performance significantly.
- By keeping the flush in `report_object_status()` we ensure that all
future callers get semantics correct.
So let's just be pragmatic and live with the duplicated flush.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The usage strings for git-cat-file(1) that we pass to `parse_options()`
and `usage_msg_optf()` are stored in a variable called `usage`. This
variable shadows the declaration of `usage()`, which we'll want to use
in a subsequent commit.
Rename the variable to `builtin_catfile_usage`, which is in line with
how the variable is typically called in other builtins.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In 06c92dafb8 (Makefile: allow specifying a SHA-1 for non-cryptographic
uses, 2024-09-26), support for unsafe SHA-1 is added. Add the unsafe
SHA-1 build info to `git version --build-info` and update corresponding
documentation.
Signed-off-by: Justin Tobler <jltobler@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When the `--build-options` flag is used with git-version(1), additional
information about the built version of Git is printed. During build
time, different SHA implementations may be configured, but this
information is not included in the version info.
Add the SHA implementations Git is built with to the version info by
requiring each backend to define a SHA1_BACKEND or SHA256_BACKEND symbol
as appropriate and use the value in the printed build options.
Signed-off-by: Justin Tobler <jltobler@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"Dubious ownership" checks on Windows has been tightened up.
* js/mingw-admins-are-special:
test-tool path-utils: support debugging "dubious ownership" issues
mingw: special-case administrators even more
The bash command line completion script (in contrib/) has been
updated to cope with remote repository nicknames with slashes in
them.
* dm/completion-remote-names-fix:
completion: fix bugs with slashes in remote names
completion: add helper to count path components
A documentation page was left out from formatting and installation,
which has been corrected.
* pw/build-breaking-changes-doc:
docs: add BreakingChanges to TECH_DOCS target
Doc mark-up updates.
* ja/doc-branch-markup:
doc: apply new format to git-branch man page
completion: take into account the formatting backticks for options
An earlier code refactoring of the hash machinery missed a few
required calls to init_fn.
* jh/hash-init-fixes:
index-pack, unpack-objects: restore missing ->init_fn
Bugfix in newly introduced large-object-promisor remote support.
* cc/lop-remote:
promisor-remote: compare remote names case sensitively
promisor-remote: fix possible issue when no URL is advertised
promisor-remote: fix segfault when remote URL is missing
t5710: arrange to delete the client before cloning
Using "git name-rev --stdin" as an example, improve the framework to
prepare tests to pretend to be in the future where the breaking
changes have already happened.
* jc/name-rev-stdin:
name-rev: remove "--stdin" support
t6120: further modernize
t6120: avoid hiding "git" exit status
t: introduce WITH_BREAKING_CHANGES prerequisite
t: extend test_lazy_prereq
t: document test_lazy_prereq
GitHub Actions CI switched on a CI/CD variable that does not exist
when choosing what packages to install etc., which has been
corrected.
* kn/ci-meson-check-build-docs-fix:
ci/github: add missing 'CI_JOB_IMAGE' env variable
The path search overrides used by gitk on Windows are applied to any
executable whose name is not 'absolute', meaning that
[exec foo/bar ...]
will search each element of $PATH to find one with subdirectory foo
containing bar. But, per POSIX, and Tcl implementation on all platforms,
foo/bar is taken as $(pwd)/foo/bar, and is not searched on $PATH.
Fix this descrepency using the same approach applied to git-gui in
commit 3f71c97e. The key is that the executable name must have no path
component, indicated by [file split $exename] having array length 1.
Signed-off-by: Mark Levedahl <mlevedahl@gmail.com>
Signed-off-by: Johannes Sixt <j6t@kdbg.org>
The _search_exe variable allows specifying the suffix used for executables,
typically {} on unix, .exe on Windows. But, the override code is now
used only on Windows, so _search_exe is no longer needed. Eliminate it.
Signed-off-by: Mark Levedahl <mlevedahl@gmail.com>
Signed-off-by: Johannes Sixt <j6t@kdbg.org>
Commit 4cbe9e0e2 was written to address problems that result from Tcl's
documented behavior on Windows where the current working directory and a
number of Windows system directories are automatically prepended to
$PATH when searching for executables [1]. This basic Windows behavior
has resulted in more than one CVE against git for Windows:
CVE-2023-23618, CVE-2022-41953 are listed on the git for Windows github
website for the Tcl components of git (gitk, git-gui).
4cbe9e0e2 is intended to restrict the search to looking only in
directories given in $PATH and in the given order, which is exactly the
Tcl behavior documented to exist on non-Windows platforms [1]. Thus,
this change could have been written to affect only Windows, leaving
other platforms alone.
However, 4cbe9e0e2 implements the override for all platforms. This
includes specialized code for Cygwin, copied from git-gui prior to
commit 7145c654 on https://github.com/j6t/git-gui, so targets a
long retired Cygwin port of the Windows Tcl/Tk using Windows pathnames.
Since 2012, Cygwin uses a Unix/X11 port requiring Unix pathnames,
meaning 4cbe9e0e2 is incompatible. 4cbe9e0e2 also induces an infinite
recursion as _which now invokes the exec wrapper that invokes _which.
This is part of git v2.49.0, so gitk on Cygwin is broken in that
release.
Rather than fix the unnecessary override code for Cygwin, let's just
limit the override of exec/open to Windows, leaving all other platforms
using their native exec/open as they did prior to 4cbe9e0e2. This patch
wraps the override code in an "if {[is_Windows]} { ... }" block while
removing the non-Windows code added in 4cbe9e0e2.
[1] see https://www.tcl-lang.org/man/tcl8.6/TclCmd/exec.htm
Signed-off-by: Mark Levedahl <mlevedahl@gmail.com>
Signed-off-by: Johannes Sixt <j6t@kdbg.org>
We do not use tab characters for intentation in general. A recent patch
introduced many lines that do use them. Replace them by 4 spaces each.
Signed-off-by: Johannes Sixt <j6t@kdbg.org>
This test currently passes, but for the wrong reason. The
repo_is_hardlinked function expects a .git directory or a bare
repository and currently fails because it cannot find the objects
directory.
One solution is to use the --bare argument, but then --show-toplevel
won't work. We could change that, but there's no need to, so just add
the missing .git directory.
In addition, use the built-in negation functionality of test_grep to
avoid mishandling real errors (such as a missing file) and, as a final
fix, remove the extra newline.
Reported-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* ps/reftable-sans-compat-util:
Makefile: skip reftable library for Coccinelle
reftable: decouple from Git codebase by pulling in "compat/posix.h"
git-compat-util.h: split out POSIX-emulating bits
compat/mingw: split out POSIX-related bits
reftable/basics: introduce `REFTABLE_UNUSED` annotation
reftable/basics: stop using `SWAP()` macro
reftable/stack: stop using `sleep_millisec()`
reftable/system: introduce `reftable_rand()`
reftable/reader: stop using `ARRAY_SIZE()` macro
reftable/basics: provide wrappers for big endian conversion
reftable/basics: stop using `st_mult()` in array allocators
reftable: stop using `BUG()` in trivial cases
reftable/record: don't `BUG()` in `reftable_record_cmp()`
reftable/record: stop using `BUG()` in `reftable_record_init()`
reftable/record: stop using `COPY_ARRAY()`
reftable/blocksource: stop using `xmmap()`
reftable/stack: stop using `write_in_full()`
reftable/stack: stop using `read_in_full()`
Add a new builtin driver for generic INI files (e. g. the gitconfig
files), where:
- the funcname regular expression matches section names, i. e. any
string between brackets at the beginning of the line, with or without
indentation;
- word_regex matches any word with one or more non-whitespace
characters without checking if it is a valid variable name or value.
Also add tests for the new userdiff driver. These files define sections
and subsections, with and without indentation.
Helped-by: Patrick Steinhardt <ps@pks.im>
Helped-by: D. Ben Knoble <ben.knoble@gmail.com>
Signed-off-by: Lucas Seiki Oshiro <lucasseikioshiro@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This is a similar fix as 023756f4eb (revision walker: --cherry-pick is a
limited operation), but for the --left-only and --right-only options.
When computing a symmetric difference between two unrelated histories,
no suitable merge base exists, and so no boundary commit is flagged as
UNINTERESTING. Previously, we relied on the presence of such boundary
to trigger limiting and thus consideration of either "revs->left_only"
or "revs->right_only".
A number of other entries in the option parser have started including
overrides for "revs->limited = 1". Do the same for these options.
Signed-off-by: Matt Hunter <m@lfurio.us>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
There are multiple places, especially in loops, where a signed and an
unsigned data type are compared. Git uses a mix of signed and unsigned
types to store lengths of arrays. This sometimes leads to using a signed
index for an array whose length is stored in an unsigned variable or
vice versa. In some cases, where both signed and unsigned data types
have been used to store lengths of arrays in the same function, only
one variable was used to iterate over both types.
Replace signed data types with unsigned data types and vice versa
wherever necessary. Where both types of iterators are required, move
the declaration inside the for loop. In cases where this is not
possible, add appropriate cast.
Remove #define DISABLE_SIGN_COMPARE_WARNINGS.
Signed-off-by: Arnav Bhate <bhatearnav@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In 7304bd2bc39 (ci: wire up Visual Studio build with Meson, 2025-01-22)
we have wired up a new CI job that builds and tests Git with Meson on a
Windows machine. The expectation here was that this build uses the
Visual Studio toolchain to do so, and that is true on GitLab CI. But on
GitHub Workflows it is not the case because we've got GCC in our PATH,
and thus Meson favors that compiler toolchain over Visual Studio's.
Fix this by explicitly asking Meson to use the Visual Studio toolchain.
While this is only really required for GitHub Workflows, let's also pass
the flag in GitLab CI so that we don't implicitly assume the toolchain
that Meson is going to pick.
Reported-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Almost all of the tools we discover during the build process need to be
native programs. There are only a handful of exceptions, which typically
are programs whose paths we need to embed into the resulting executable
so that they can be found on the target system when Git executes. While
this distinction typically doesn't matter, it does start to matter when
considering cross-compilation where the build and target machines are
different.
Meson supports cross-compilation via so-called machine files. These
machine files allow the user to override parameters for the build
machine, but also for the target machine when cross-compiling. Part of
the machine file is a section that allows the user to override the
location where binaries are to be found in the target system. The
following machine file would for example override the path of the POSIX
shell:
[binaries]
sh = '/usr/xpg4/bin/sh'
It can be handed over to Meson via `meson setup --cross-file`.
We do not handle this correctly right now though because we don't know
to distinguish binaries for the build and target hosts at all. Address
this by explicitly passing the `native:` parameter to `find_program()`:
- When set to `true`, we get binaries discovered on the build host.
- When set to `false`, we get either the path specified in the
machine file. Or, if no machine file exists or it doesn't specify
the binary path, then we fall back to the binary discovered on the
build host.
As mentioned, only a handful of binaries are not native: only the system
shell, Python and Perl need to be treated specially here.
Reported-by: Peter Seiderer <ps.report@gmx.net>
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Both the "netrc" credential helper and git-subtree(1) from "contrib/"
carry a couple of tests with them. These tests get wired up in Meson
unconditionally even in the case where `-Dtests=false`. As those tests
depend on the `test_enviroment` variable, which only gets defined in
case `-Dtests=true`, the result is an error:
```
$ meson setup -Dtests=false -Dcontrib=subtree build
[...]
contrib/subtree/meson.build:15:27: ERROR: Unknown variable "test_environment".
```
Fix the issue by not defining these tests at all in case the "tests"
option is set to `false`.
Reported-by: Sam James <sam@gentoo.org>
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In 19d8fe7da65 (Makefile: extract script to generate gitweb.js,
2024-12-06) we have extracted the logic to build "gitweb.js" into a
separate script. As part of that the rules that builds the script
has gained a new dependency on that script.
This refactoring is broken though because we use "$^" to determine
the set of JavaScript files that need to be concatenated, and this
implicit variable now also contains the build script itself. As a
result, the build script ends up ni the generated "gitweb.js" file,
which is wrong.
Fix the issue by filtering out non-JavaScript files.
Based-on-patch-by: Thorsten Glaser <tg@debian.org>
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The "curl" option controls whether or not a couple of features that
depend on curl shall be included. Most importantly, these features
include the HTTP remote helpers, which are rather quintessential for a
well-functioning Git installation. So while the dependency can in theory
be dropped, most users wouldn't consider the resulting installation to
be fully functional.
The "curl" option is defined as a feature, which means that it can be
"enabled", "disabled" or "auto", which has the effect that the feature
will be enabled if the dependency itself has been found. While most of
the other features have "auto" as default value, the "curl" option is
set to "enabled" by default due to it being so important. Consequently,
autoconfiguration of Git will fail by default if the library cannot be
found.
There is a bug though with how we handle the option in case the user
overrides the feature with `meson setup -Dcurl=auto`: while we will try
to find the library in that case, we won't ever use it because we later
on check for `get_option('curl').enabled()` when deciding whether or not
we want to build dependent sources. But `enabled()` only returns true if
the option has the value "enabled", for "auto" it will return false.
Fix the issue by instead checking for `curl.found()`, which is only true
if the library has been found. And as we only try to find the library
when `get_option('curl')` returns "true" or "auto" this is exactly what
we want.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
There are multiple places in loops, where a signed and an
unsigned data type are compared. Git uses a mix of signed and unsigned
types to store lengths of arrays. This sometimes leads to using a signed
index for an array whose length is stored in an unsigned variable or
vice versa.
get_ours_cache_pos is a special case where i, though derived from a
signed variable is never negative. Move this part to the caller side
and make i an unsigned argument of the function. Rename i to
pos to make it descriptive, now that it is a function argument.
Replace signed data types with unsigned data types and vice versa
wherever necessary. Where both signed and unsigned data types have been
used, define a new variable in the scope of the for loop for use as the
iterator. Remove #define DISABLE_SIGN_COMPARE_WARNINGS.
Signed-off-by: Arnav Bhate <bhatearnav@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Miscellaneous code clean-ups.
* en/random-cleanups:
merge-ort: remove extraneous word in comment
merge-ort: fix accidental strset<->strintmap
t7615: be more explicit about diff algorithm used
t6423: fix a comment that accidentally reversed two commits
stash: remove merge-recursive.h include
The xdiff code on 32-bit platform misbehaved when an insanely large
context size is given, which has been corrected.
* rs/xdiff-context-length-fix:
xdiff: avoid arithmetic overflow in xdl_get_hunk()
Enable -Wunreachable-code for developer builds.
* jk/use-wunreachable-code-for-devs:
config.mak.dev: enable -Wunreachable-code
git-compat-util: add NOT_CONSTANT macro and use it in atfork_prepare()
run-command: use errno to check for sigfillset() error
A corner-case bug in "git log --follow -B" has been fixed.
* en/diff-rename-follow-fix:
diffcore-rename: fix BUG when break detection and --follow used together
Certain "cruft" objects would have never been refreshed when there
are multiple cruft packs in the repository, which has been
corrected.
* tb/multi-cruft-pack-refresh-fix:
builtin/pack-objects.c: freshen objects from existing cruft packs
In protocol v2 where the refs advertisement is constrained, we try
to tell the server side not to limit the advertisement when there
is no specific need to, which has been the source of confusion and
recent bugs. Revamp the logic to simplify.
* jk/fetch-ref-prefix-cleanup:
fetch: use ref prefix list to skip ls-refs
fetch: avoid ls-refs only to ask for HEAD symref update
fetch: stop protecting additions to ref-prefix list
fetch: ask server to advertise HEAD for config-less fetch
refspec_ref_prefixes(): clean up refspec_item logic
t5516: beef up exact-oid ref prefixes test
t5516: drop NEEDSWORK about v2 reachability behavior
t5516: prefer "oid" to "sha1" in some test titles
t5702: fix typo in test name
First step of deprecating and removing merge-recursive.
* en/merge-ort-prepare-to-remove-recursive:
am: switch from merge_recursive_generic() to merge_ort_generic()
merge-ort: fix merge.directoryRenames=false
t3650: document bug when directory renames are turned off
merge-ort: support having merge verbosity be set to 0
merge-ort: allow rename detection to be disabled
merge-ort: add new merge_ort_generic() function
The code paths to check whether a refname X is available (by seeing
if another ref X/Y exists, etc.) have been optimized.
* ps/refname-avail-check-optim:
refs: reuse iterators when determining refname availability
refs/iterator: implement seeking for files iterators
refs/iterator: implement seeking for packed-ref iterators
refs/iterator: implement seeking for ref-cache iterators
refs/iterator: implement seeking for reftable iterators
refs/iterator: implement seeking for merged iterators
refs/iterator: provide infrastructure to re-seek iterators
refs/iterator: separate lifecycle from iteration
refs: stop re-verifying common prefixes for availability
refs/files: batch refname availability checks for initial transactions
refs/files: batch refname availability checks for normal transactions
refs/reftable: batch refname availability checks
refs: introduce function to batch refname availability checks
builtin/update-ref: skip ambiguity checks when parsing object IDs
object-name: allow skipping ambiguity checks in `get_oid()` family
object-name: introduce `repo_get_oid_with_flags()`
"git fast-export | git fast-import" learns to deal with commit and
tag objects with embedded signatures a bit better.
* cc/signed-fast-export-import:
fast-export, fast-import: add support for signed-commits
fast-export: do not modify memory from get_commit_buffer
git-fast-export.adoc: clarify why 'verbatim' may not be a good idea
fast-export: rename --signed-tags='warn' to 'warn-verbatim'
fast-export: fix missing whitespace after switch
git-fast-import.adoc: add missing LF in the BNF
In p9210-scalar-clone.sh, we test using 'scalar clone' to clone
$GIT_PERF_LARGE_REPO (copied locally as 'to-clone'), which defaults to
the git.git checkout we are running the test from.
When --branch is not specified (as in this test), 'scalar clone' tries
to get the default branch of the remote repository by parsing the output
of 'git ls-remote --symref $URL HEAD', as implemented in
scalar.c:remote_default_branch. When the git.git checkout we are running
the test from is in detached HEAD, this fails and we fall back to using
the name of the currently checked out branch in the newly initialized
repository, which in this case is the value returned earlier in
cmd_clone by repo_default_branch_name.
We then invoke 'git checkout -t origin/$branch', with $branch being the
name we got from remote_default_branch. This invocation fails if
'$branch' does not exist as a branch in the current git.git checkout.
Fix this by creating a local branch in 'to-clone' in the setup test
"enable server-side partial clone", making sure to use '-B' in case a
branch named 'test-branch' already exists.
Signed-off-by: Philippe Blain <levraiphilippeblain@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Since 5dccd9155f (t/perf: add iteration setup mechanism to perf-lib,
2022-04-04), perf tests need to declare their prerequisites with
'--prereq', after the test title. p7821 was forgotten in that commit,
such that running that test on a machine where the PCRE prereq is not
satisfied aborts the test with:
error: bug in the test script: test_wrapper_ needs 2 positional parameters
Fix this by correcting the two 'test_perf' invocations in that test
suite.
Signed-off-by: Philippe Blain <levraiphilippeblain@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When committing a conflict resolution for a merge containing
1f010d6bdf7 (doc: use .adoc extension for AsciiDoc files, 2025-01-20)
my pre-commit hook failed because "git diff --check" thought there was
a left over conflict marker in "merge-file.adoc". Fix this by setting
the "conflict-marker-size" attribute as we do for all the other
documentation files that contain example conflict markers.
Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* tb/incremental-midx-part-2:
midx: implement writing incremental MIDX bitmaps
pack-bitmap.c: use `ewah_or_iterator` for type bitmap iterators
pack-bitmap.c: keep track of each layer's type bitmaps
ewah: implement `struct ewah_or_iterator`
pack-bitmap.c: apply pseudo-merge commits with incremental MIDXs
pack-bitmap.c: compute disk-usage with incremental MIDXs
pack-bitmap.c: teach `rev-list --test-bitmap` about incremental MIDXs
pack-bitmap.c: support bitmap pack-reuse with incremental MIDXs
pack-bitmap.c: teach `show_objects_for_type()` about incremental MIDXs
pack-bitmap.c: teach `bitmap_for_commit()` about incremental MIDXs
pack-bitmap.c: open and store incremental bitmap layers
pack-revindex: prepare for incremental MIDX bitmaps
Documentation: describe incremental MIDX bitmaps
Documentation: remove a "future work" item from the MIDX docs
Before accessing an array element at a given index, we should make sure
that the index is within the desired bounds, otherwise it makes little
sense to access the array element in the first place.
In this instance, testing whether `ce->name[common]` is the trailing NUL
byte is technically different from testing whether `common` is within
the bounds of `previous_name`. It is also redundant, as the range-check
guarantees that `previous_name->buf[common]` cannot be NUL and therefore
the condition `ce->name[common] == previous_name->buf[common]` would not
be met if `ce->name[common]` evaluated to NUL.
However, in the interest of reducing the cognitive load to reason about
the correctness of this loop (so that I can focus on interesting
projects again), I'll simply move the range-check to the beginning of
the loop condition and keep the redundant NUL check.
This acquiesces CodeQL's `cpp/offset-use-before-range-check` rule.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Acked-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In my setup, clang finds `/usr/local/cuda` and hence the output of
`clang -v` ends with this line:
Found CUDA installation: /usr/local/cuda, version
This confuses the `detect-compiler` script because it matches _all_
lines that contain the needle "version" surrounded by spaces. As a
consequence, the `get_family` function returns two lines: "Ubuntu clang"
and above-mentioned line, which the `case` statement does not handle
well and hence reports "unknown compiler family" instead of the expected
set of "clang14", "clang13", ..., "clang1" output.
Let's unconfuse the script by letting it parse the first matching line
and ignore the rest.
Helped-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Acked-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When compiling Git using `clang`, the `-Wcomma` option can be used to
warn about code using the comma operator (because it is typically
unintentional and wants to use the semicolon instead).
Helped-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Acked-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The comma operator is a somewhat obscure C feature that is often used by
mistake and can even cause unintentional code flow. That is why the
`-Wcomma` option of clang was introduced: To identify unintentional uses
of the comma operator.
In the `compat/regex/` code, the comma operator is used twice, once to
avoid surrounding two conditional statements with curly brackets, the
other one to increment two counters simultaneously in a `do ... while`
condition.
The first one is replaced with a proper conditional block, surrounded by
curly brackets.
The second one would be harder to replace because the loop contains two
`continue`s. Therefore, the second one is marked as intentional by
casting the value-to-discard to `void`.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Acked-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The comma operator is a somewhat obscure C feature that is often used by
mistake and can even cause unintentional code flow. That is why the
`-Wcomma` option of clang was introduced: To identify unintentional uses
of the comma operator.
In this instance, the usage is intentional because it allows storing the
value of the current character as `prev_ch` before making the next
character the current one, all of which happens in the loop condition
that lets the loop stop at a closing bracket.
However, it is hard to read.
The chosen alternative to using the comma operator is to move those
assignments from the condition into the loop body; In this particular
case that requires special care because the loop body contains a
`continue` for the case where a character class is found that starts
with `[:` but does not end in `:]` (and the assignments should occur
even when that code path is taken), which needs to be turned into a
`goto`.
Helped-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Acked-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The comma operator is a somewhat obscure C feature that is often used by
mistake and can even cause unintentional code flow. That is why the
`-Wcomma` option of clang was introduced: To identify unintentional uses
of the comma operator.
Intentional uses include situations where one wants to avoid curly
brackets around multiple statements that need to be guarded by a
condition. This is the case here, as the repetitive nature of the
statements is easier to see for a human reader this way. At least in my
opinion.
However, opinions on this differ wildly, take 10 people and you have 10
different preferences.
On the Git mailing list, it seems that the consensus is to use the long
form instead, so let's do just that.
Suggested-by: Phillip Wood <phillip.wood123@gmail.com>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Acked-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The comma operator is a somewhat obscure C feature that is often used by
mistake and can even cause unintentional code flow. While the code in
this patch used the comma operator intentionally (to avoid curly
brackets around two statements, each, that want to be guarded by a
condition), it is better to surround it with curly brackets and to use a
semicolon instead.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Acked-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The comma operator is a somewhat obscure C feature that is often used by
mistake and can even cause unintentional code flow. In this instance, it
makes the code harder to read than necessary, too. Better use a
semicolon instead.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Acked-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The comma operator is a somewhat obscure C feature that is often used by
mistake and can even cause unintentional code flow. Better use a
semicolon instead.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Acked-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The comma operator is a somewhat obscure C feature that is often used by
mistake and can even cause unintentional code flow. Better use a
semicolon instead.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Acked-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The comma operator is a somewhat obscure C feature that is often used by
mistake and can even cause unintentional code flow. Better use a
semicolon instead.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Acked-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The merge-recursive and merge-ort machinery crashed in corner cases
when certain renames are involved.
* en/merge-process-renames-crash-fix:
merge-ort: fix slightly overzealous assertion for rename-to-self
t6423: add a testcase causing a failed assertion in process_renames
A handful of built-in command implementations have been rewritten
to use the repository instance supplied by git.c:run_builtin(), its
caller.
* ua/some-builtins-wo-the-repository:
builtin/checkout-index: stop using `the_repository`
builtin/for-each-ref: stop using `the_repository`
builtin/ls-files: stop using `the_repository`
builtin/pack-refs: stop using `the_repository`
builtin/send-pack: stop using `the_repository`
builtin/verify-commit: stop using `the_repository`
builtin/verify-tag: stop using `the_repository`
config: teach repo_config to allow `repo` to be NULL
The refname exclusion logic in the packed-ref backend has been
broken for some time, which confused upload-pack to advertise
different set of refs. This has been corrected.
* tb/refs-exclude-fixes:
refs.c: stop matching non-directory prefixes in exclude patterns
refs.c: remove empty '--exclude' patterns
"git fsck" becomes more careful when checking the refs.
* sj/ref-consistency-checks-more:
builtin/fsck: add `git refs verify` child process
packed-backend: check whether the "packed-refs" is sorted
packed-backend: add "packed-refs" entry consistency check
packed-backend: check whether the refname contains NUL characters
packed-backend: add "packed-refs" header consistency check
packed-backend: check if header starts with "# pack-refs with: "
packed-backend: check whether the "packed-refs" is regular file
builtin/refs: get worktrees without reading head information
t0602: use subshell to ensure working directory unchanged
Add some tests to make sure that now "REMOTE" can be used as a target
(ie. can be used together with the "@" marker) inside
"mergetool.vimdiff.layout"
Signed-off-by: Fernando Ramos <greenfoo@u92.eu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"mergetool.vimdiff.layout" is used to define the vim layout (ie. how
windows, tabs and buffers are physically organized) when resolving
conflicts.
For example, if we set it to this:
"(LOCAL,BASE,REMOTE)/MERGED"
...vim will open and show this layout:
------------------------------------------
| | | |
| LOCAL | BASE | REMOTE |
| | | |
------------------------------------------
| |
| MERGED |
| |
------------------------------------------
By default, whatever ends up been written to the "MERGED" window will
become the file which conflict we are resolving.
However, it is possible to use the "@" symbol to specify a different
one. For example, if we use this slightly different version of the
previously used string:
"(LOCAL,BASE,@REMOTE)/MERGED"
...then the user should proceed to edit the contents of the top right
window (instead of the bottom window) as *that* is what will become the
conflicts free file once vim is closed.
Before this commit, the "@" marker worked for all targets *except* for
"REMOTE". In other words, these worked as expected:
"(@LOCAL,BASE,REMOTE)/MERGED"
"(LOCAL,@BASE,REMOTE)/MERGED"
"(LOCAL,BASE,REMOTE)/@MERGED"
...but this didn't:
"(LOCAL,BASE,@REMOTE)/MERGED"
This commit fixes that.
Reported-by: kawarimidoll <kawarimidoll+git@gmail.com>
Suggested-by: D. Ben Knoble <ben.knoble@gmail.com>
Signed-off-by: Fernando Ramos <greenfoo@u92.eu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Wiring up coccinelle in the build, depends on running git commands to
get the list of files to operate on. Reasonable, for a feature mainly
used by people developing on git. If building git itself from a tarball
distribution of git's own source code, one likely does not need to run
coccinelle.
But running those git commands failed, and caused the build to error
out, if `spatch` was installed -- because the build assumed that its
presence indicated a desire to use it on this source tree. Instead, we
can expand the conditional to check for both `spatch` and the `.git`
file or directory.
Meson's `opt.require()` method allows us to add a prerequisite for the
feature option. If the prerequisite fails, then the option either:
- converts autodetection to disabled
- emits an informative error if the feature was set to enabled:
```
ERROR: Feature coccinelle cannot be enabled: coccinelle can only be run from a git checkout
```
Signed-off-by: Eli Schwartz <eschwartz@gentoo.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The original documentation from 7b5cf8be18 (vimdiff: add tool
documentation, 2022-03-30) mistakenly described the marker as an
asterisk, which is the character "*". The code and examples have always
looked for an arobase ("@").
Signed-off-by: D. Ben Knoble <ben.knoble+github@gmail.com>
Acked-by: Fernando Ramos <greenfoo@u92.eu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The default branch name advice message is displayed when
`repo_default_branch_name()` is invoked and the `init.defaultBranch`
config is not set. In this scenario, the advice message is always shown
even if the `--no-advice` option is used.
Adapt `repo_default_branch_name()` to allow the default branch name
advice message to be disabled with the `--no-advice` option and
corresponding configuration.
Signed-off-by: Justin Tobler <jltobler@gmail.com>
Acked-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In 199f44cb2ead (builtin/clone: allow remote helpers to detect repo,
2024-02-27), clones started partially initializing the refdb before
executing the remote helpers by creating a HEAD file and "refs/"
directory. This has resulted in some scenarios where git-clone(1) now
prints the default branch name advice message where it previously did
not.
A side-effect of the HEAD file already existing, is that computation of
the default branch name is handled later in execution. This matters
because prior to 97abaab5f6 (refs: drop `git_default_branch_name()`,
2024-05-17), the default branch value would be computed during its first
execution and cached. Subsequent invocations would simply return the
cached value. Since the next `git_default_branch_name()` call site,
which is invoked through `guess_remote_head()`, is not configured to
suppress the advice message, computing the default branch name results
in the advice message being printed.
Configure `guess_remote_head()` to suppress the advice message,
restoring the previous behavior.
Signed-off-by: Justin Tobler <jltobler@gmail.com>
Acked-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The `repo_default_branch_name()` invoked through `guess_remote_head()`
is configured to always display the default branch advice message.
Adapt `guess_remote_head()` to accept flags and convert the `all`
parameter to a flag. Add the `REMOTE_GUESS_HEAD_QUIET` flag to to enable
suppression of advice messages. Call sites are updated accordingly.
Signed-off-by: Justin Tobler <jltobler@gmail.com>
Acked-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In file bulk-checkin.c, three warnings are emitted by
"-Wsign-compare", two of which are caused by trivial loop iterator
type mismatches. For the third case, the type of `rsize` from
ssize_t rsize = size < sizeof(ibuf) ? size : sizeof(ibuf);
can be changed to size_t as both options of the ternary expression are
unsigned and the signedness of the variable isn't really needed
anywhere.
To prevent `read_result != rsize` making a clash, it is to be noted
that `read_result` is checked not to hold negative values. Therefore
casting the variable to size_t is a safe operation and enough to
remove the sign-compare warning.
Fix issues accordingly, and remove `DISABLE_SIGN_COMPARE_WARNINGS` to
enable "-Wsign-compare" for the file.
Signed-off-by: Tuomas Ahola <taahol@utu.fi>
Acked-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
It is a bug to obtain the peer certificate without verifying it.
Having said that, from my reading of
https://www.openssl.org/docs/man1.1.1/man3/SSL_set_verify.html, it would
appear that Git is saved by the fact that it calls
`SSL_CTX_set_verify(ctx, SSL_VERIFY_PEER, NULL)` already early on.
In other words, that `SSL_VERIFY_PEER` combined with the `NULL`
parameter (i.e. no overridden callback) would _already_ verify the peer
certificate. The fact that we later call `SSL_get_peer_certificate()`
is mistaken by CodeQL to mean that that peer certificate still needs to
be verified, but that had already happened at that point.
Nevertheless, it is better to verify the peer certificate explicitly
than to rely on some side effect that is really hard to reason about
(and that took me more than one business day to analyze fully). It also
makes it easier for static analyzers to validate the correctness of the
code.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This adds a new sub-sub-command for `test-tool`, simply passing through
the command-line arguments to the `is_path_owned_by_current_user()`
function.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The check for dubious ownership has one particular quirk on Windows: if
running as an administrator, files owned by the Administrators _group_
are considered owned by the user.
The rationale for that is: When running in elevated mode, Git creates
files that aren't owned by the individual user but by the Administrators
group.
There is yet another quirk, though: The check I introduced to determine
whether the current user is an administrator uses the
`CheckTokenMembership()` function with the current process token. And
that check only succeeds when running in elevated mode!
Let's be a bit more lenient here and look harder whether the current
user is an administrator. We do this by looking for a so-called "linked
token". That token exists when administrators run in non-elevated mode,
and can be used to create a new process in elevated mode. And feeding
_that_ token to the `CheckTokenMembership()` function succeeds!
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The 'loose-objects' task of 'git maintenance run' first deletes loose
objects that exit within packfiles and then collects loose objects into
a packfile. This second step uses an implicit limit of fifty thousand
that cannot be modified by users.
Add a new config option that allows this limit to be adjusted or ignored
entirely.
While creating tests for this option, I noticed that actually there was
an off-by-one error due to the strict comparison in the limit check. I
considered making the limit check turn true on equality, but instead I
thought to use INT_MAX as a "no limit" barrier which should mean it's
never possible to hit the limit. Thus, a new decrement to the limit is
provided if the value is positive. (The restriction to positive values
is to avoid underflow if INT_MIN is configured.)
Signed-off-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The --no-quiet option for 'git maintenance run' is supposed to indicate
that progress should happen even while ignoring the value of isatty(2).
However, Git implicitly asks child processes to check isatty(2) since
these arguments are not passed through.
The pass through of --no-quiet will be useful in a test in the next
change.
Signed-off-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Previously, some calls to for-each-ref passed fixed numbers of path
components to strip from refs, assuming that remote names had no slashes
in them. This made completions like:
git push github/dseomn :com<Tab>
Result in:
git push github/dseomn :dseomn/completion-remote-slash
With this patch, it instead results in:
git push github/dseomn :completion-remote-slash
Signed-off-by: David Mandelberg <david@mandelberg.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
A follow-up commit will use this with for-each-ref to strip the right
number of path components from refnames.
Signed-off-by: David Mandelberg <david@mandelberg.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
clear_commit_marks_many() clears multiple commits one by one. Move the
code for handling a single commit to clear_commit_marks() and call it
instead of the other way around, to simplify the code.
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Now that the pack-bitmap machinery has learned how to read and interact
with an incremental MIDX bitmap, teach the pack-bitmap-write.c machinery
(and relevant callers from within the MIDX machinery) to write such
bitmaps.
The details for doing so are mostly straightforward. The main changes
are as follows:
- find_object_pos() now makes use of an extra MIDX parameter which is
used to locate the bit positions of objects which are from previous
layers (and thus do not exist in the current layer's pack_order
field).
(Note also that the pack_order field is moved into struct
write_midx_context to further simplify the callers for
write_midx_bitmap()).
- bitmap_writer_build_type_index() first determines how many objects
precede the current bitmap layer and offsets the bits it sets in
each respective type-level bitmap by that amount so they can be OR'd
together.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Acked-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Now that we have initialized arrays for each bitmap layer's type bitmaps
in the previous commit, adjust existing callers to use them in
preparation for multi-layered bitmaps.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Acked-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Prepare for reading the type-level bitmaps from previous bitmap layers
by maintaining an array for each type, where each element in that type's
array corresponds to one layer's bitmap for that type.
These fields will be used in a later commit to instantiate the 'struct
ewah_or_iterator' for each type.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Acked-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
While individual bitmap layers store different commit, type-level, and
pseudo-merge bitmaps, only the top-most layer is used to compute
reachability traversals.
Many functions which implement the aforementioned traversal rely on
enumerating the results according to the type-level bitmaps, and so
would benefit from a conceptual type-level bitmap that spans multiple
layers.
Implement `struct ewah_or_iterator` which is capable of enumerating
multiple EWAH bitmaps at once, and OR-ing the results together. When
initialized with, for example, all of the commit type bitmaps from each
layer, callers can pretend as if they are enumerating a large type-level
bitmap which contains the commits from *all* bitmap layers.
There are a couple of alternative approaches which were considered:
- Decompress each EWAH bitmap and OR them together, enumerating a
single (non-EWAH) bitmap. This would work, but has the disadvantage
of decompressing a potentially large bitmap, which may not be
necessary if the caller does not wish to read all of it.
- Recursively call bitmap internal functions, reusing the "result" and
"haves" bitmap from the top-most layer. This approach resembles the
original implementation of this feature, but is inefficient in that
it both (a) requires significant refactoring to implement, and (b)
enumerates large sections of later bitmaps which are all zeros (as
they pertain to objects in earlier layers).
(b) is not so bad in and of itself, but can cause significant
slow-downs when combined with expensive loop bodies.
This approach (enumerating an OR'd together version of all of the
type-level bitmaps from each layer) produces a significantly more
straightforward implementation with significantly less refactoring
required in order to make it work.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Acked-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Prepare for using pseudo-merges with incremental MIDX bitmaps by
attempting to apply pseudo-merges from each layer when encountering a
given commit during a walk.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Acked-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In a similar fashion as previous commits, use nth_midxed_pack() instead
of accessing the MIDX's ->packs array directly to support incremental
MIDXs.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Acked-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Implement support for the special `--test-bitmap` mode of `git rev-list`
when using incremental MIDXs.
The bitmap_test_data structure is extended to contain a "base" pointer
that mirrors the structure of the bitmap chain that it is being used to
test.
When we find a commit to test, we first chase down the ->base pointer to
find the appropriate bitmap_test_data for the bitmap layer that the
given commit is contained within, and then perform the test on that
bitmap.
In order to implement this, light modifications are made to
bitmap_for_commit() to reimplement it in terms of a new function,
find_bitmap_for_commit(), which fills out a pointer which indicates the
bitmap layer which contains the given commit.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Acked-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In a similar fashion as previous commits in the first phase of
incremental MIDXs, enumerate not just the packs in the current
incremental MIDX layer, but previous ones as well.
Likewise, in reuse_partial_packfile_from_bitmap(), when reusing only a
single pack from a MIDX, use the oldest layer's preferred pack as it is
likely to contain the largest number of reusable sections.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Acked-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Since we may ask for a pack_id that is in an earlier MIDX layer relative
to the one corresponding to our bitmap, use nth_midxed_pack() instead of
accessing the ->packs array directly.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Acked-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The pack-bitmap machinery uses `bitmap_for_commit()` to locate the
EWAH-compressed bitmap corresponding to some given commit object.
Teach this function about incremental MIDX bitmaps by teaching it to
recur on earlier bitmap layers when it fails to find a given commit in
the current layer.
The changes to do so are as follows:
- Avoid initializing hash_pos at its declaration, since
bitmap_for_commit() is now a recursive function and may receive a
NULL bitmap_index pointer as its first argument.
- In cases where we would previously return NULL (to indicate that a
lookup failed and the given bitmap_index does not contain an entry
corresponding to the given commit), recursively call the function on
the previous bitmap layer.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Acked-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Prepare the pack-bitmap machinery to work with incremental MIDXs by
adding a new "base" field to keep track of the bitmap index associated
with the previous MIDX layer.
The changes in this commit are mostly boilerplate to open the correct
bitmap(s), add them to the chain of bitmap layers along the "base"
pointer, ensure that the correct packs and their reverse indexes are
loaded across MIDX layers, etc.
While we're at it, keep track of a base_nr field to indicate how many
bitmap layers (including the current bitmap) exist. This will be used in
a future commit to allocate an array of 'struct ewah_bitmap' pointers to
collect all of the respective type bitmaps among all layers to
initialize a multi-EWAH iterator.
Subsequent commits will teach the functions within the pack-bitmap
machinery how to interact with these new fields.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Acked-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Prepare the reverse index machinery to handle object lookups in an
incremental MIDX bitmap. These changes are broken out across a few
functions:
- load_midx_revindex() learns to use the appropriate MIDX filename
depending on whether the given 'struct multi_pack_index *' is
incremental or not.
- pack_pos_to_midx() and midx_to_pack_pos() now both take in a global
object position in the MIDX pseudo-pack order, and find the
earliest containing MIDX (similar to midx.c::midx_for_object().
- midx_pack_order_cmp() adjusts its call to pack_pos_to_midx() by the
number of objects in the base (since 'vb - midx->revindx_data' is
relative to the containing MIDX, and pack_pos_to_midx() expects a
global position).
Likewise, this function adjusts its output by adding
m->num_objects_in_base to return a global position out through the
`*pos` pointer.
Together, these changes are sufficient to use the multi-pack index's
reverse index format for incremental multi-pack reachability bitmaps.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Acked-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Prepare to implement support for reachability bitmaps for the new
incremental multi-pack index (MIDX) feature over the following commits.
This commit begins by first describing the relevant format and usage
details for incremental MIDX bitmaps.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Acked-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
One of the items listed as "future work" in the MIDX's technical
documentation is to extend the format to allow MIDXs to be written
incrementally across multiple layers.
This was suggested all the way back in ceab693d1f (multi-pack-index: add
design document, 2018-07-12), and implemented in b9497848df (Merge
branch 'tb/incremental-midx-part-1', 2024-08-19). Let's remove it
accordingly.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Acked-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In our CI systems we can observe that t0610 fails rather frequently.
This testcase races a bunch of git-update-ref(1) processes with one
another which are all trying to update a unique reference, where we
expect that all processes succeed and end up updating the reftable
stack. The error message in this case looks like the following:
fatal: update_ref failed for ref 'refs/heads/branch-88': reftable: transaction prepare: I/O error
Instrumenting the code with a couple of calls to `BUG()` in relevant
sites where we return `REFTABLE_IO_ERROR` quickly leads one to discover
that this error is caused when calling `flock_acquire()`, which is a
thin wrapper around our lockfile API. Curiously, the error code we get
in such cases is `EACCESS`, indicating that we are not allowed to access
the file.
The root cause of this is an oddity of `CreateFileW()`, which is what
`_wopen()` uses internally. Quoting its documentation [1]:
If you call CreateFile on a file that is pending deletion as a
result of a previous call to DeleteFile, the function fails. The
operating system delays file deletion until all handles to the file
are closed. GetLastError returns ERROR_ACCESS_DENIED.
This behaviour is triggered quite often in the above testcase because
all the processes race with one another trying to acquire the lock for
the "tables.list" file. This is due to how locking works in the reftable
library when compacting a stack:
1. Lock the "tables.list" file and reads its contents.
2. Decide which tables to compact.
3. Lock each of the individual tables that we are about to compact.
4. Unlock the "tables.list" file.
5. Compact the individual tables into one large table.
6. Re-lock the "tables.list" file.
7. Write the new list of tables into it.
8. Commit the "tables.list" file.
The important step is (4): we don't commit the file directly by renaming
it into place, but instead we delete the lockfile so that concurrent
processes can continue to append to the reftable stack while we compact
the tables. And because we use `DeleteFileW()` to do so, we may now race
with another process that wants to acquire that lockfile. So if we are
unlucky, we would now see `ERROR_ACCESS_DENIED` instead of the expected
`ERROR_FILE_EXISTS`, which the lockfile subsystem isn't prepared to
handle and thus it will bail out without retrying to acquire the lock.
In theory, the issue is not limited to the reftable library and can be
triggered by every other user of the lockfile subsystem, as well. My gut
feeling tells me it's rather unlikely to surface elsewhere though.
Fix the issue by translating the error to `EEXIST`. This makes the
lockfile subsystem handle the error correctly: in case a timeout is set
it will now retry acquiring the lockfile until the timeout has expired.
With this, t0610 is now always passing on my machine whereas it was
previously failing in around 20-30% of all test runs.
[1]: https://learn.microsoft.com/en-us/windows/win32/api/fileapi/nf-fileapi-createfilew
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In our compat library we have both "msvc.c" and "mingw.c". The former is
mostly a thin wrapper around the latter as it directly includes it, but
it has a couple of extra headers that aren't included in "mingw.c" and
is expected to be used with the Visual Studio compiler toolchain.
While our Makefile knows to pick up the correct file depending on
whether or not the Visual Studio toolchain is used, we don't do the same
with Meson. Fix this.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
As part of the reference transaction commit phase, the transaction is
set to a closed state regardless of whether it was successful of not.
Attempting to abort a closed transaction via `ref_transaction_abort()`
results in a `BUG()`.
In c92abe71df (builtin/fetch: fix leaking transaction with `--atomic`,
2024-08-22), logic to free a transaction after the commit phase is moved
to the centralized exit path. In cases where the transaction commit
failed, this results in a closed transaction being aborted and signaling
a bug.
Free the transaction and set it to NULL when the commit fails. This
allows the exit path to correctly handle the error without attempting to
abort the transaction.
Signed-off-by: Justin Tobler <jltobler@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The previous commit changed the behavior of repack's '--max-cruft-size'
to specify a cruft pack-specific override for '--max-pack-size'.
Introduce a new flag, '--combine-cruft-below-size' which is a
replacement for the old behavior of '--max-cruft-size'. This new flag
does explicitly what it says: it combines together cruft packs which are
smaller than a given threshold, and leaves alone ones which are
larger.
This accomplishes the original intent of '--max-cruft-size', which was
to avoid repacking cruft packs larger than the given threshold.
The new behavior is slightly different. Instead of building up small
packs together until the threshold is met, '--combine-cruft-below-size'
packs up *all* cruft packs smaller than the threshold. This means that
we may make a pack much larger than the given threshold (e.g., if you
aggregate 5 packs which are each 99 MiB in size with a threshold of 100
MiB).
But that's OK: the point isn't to restrict the size of the cruft packs
we generate, it's to avoid working with ones that have already grown too
large. If repositories still want to limit the size of the generated
cruft pack(s), they may use '--max-cruft-size'.
There's some minor test fallout as a result of the slight differences in
behavior between the old meaning of '--max-cruft-size' and the behavior
of '--combine-cruft-below-size'. In the test which is now called
"--combine-cruft-below-size combines packs", we need to use the new flag
over the old one to exercise that test's intended behavior. The
remainder of the changes there are to improve the clarity of the
comments.
Suggested-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Acked-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In 37dc6d8104 (builtin/repack.c: implement support for
`--max-cruft-size`, 2023-10-02), we exposed new functionality that
allowed repositories to specify the behavior of when we should combine
multiple cruft packs together.
This feature was designed to ensure that we never repacked cruft packs
which were larger than the given threshold in order to provide tighter
I/O bounds for repositories that have many unreachable objects. In
essence, specifying '--max-cruft-size=N' instructed 'repack' to
aggregate cruft packs together (in order of ascending size) until the
combine size grows past 'N', and then make a new cruft pack whose
contents includes the packs we rolled up.
But this isn't quite how it works in practice. Suppose for example that
we have two cruft packs which are each 100MiB in size. One might expect
specifying "--max-cruft-size=200M" would combine these two packs
together, and then avoid repacking them until a pruning GC takes place.
In reality, 'repack' would try and aggregate these together, but writing
a pack that is strictly smaller than 200 MiB (since pack-objects'
"--max-pack-size" provides a strict bound for packs containing more than
one object).
So instead we'll write out a pack that is, say, 199 MiB in size, and
then another 1 MiB pack containing the balance. If we later repack the
repository without adding any new unreachable objects, we'll repeat the
same exercise again, making the same 199 MiB and 1 MiB packs each time.
This happens because of a poor choice to bolt the '--max-cruft-size'
functionality onto pack-objects' '--max-pack-size', forcing us to
generate packs which are always smaller than the provided threshold and
thus subject to repacking.
The following commit will introduce a new flag that implements something
similar to the behavior above. Let's prepare for that by making repack's
'--max-cruft-size' flag behave as an cruft pack-specific override for
'--max-pack-size'.
Do so by temporarily repurposing the 'collapse_small_cruft_packs()'
function to instead generate a cruft pack using the same instructions as
if we didn't specify any maximum pack size. The calling code looks
something like:
if (args->max_pack_size && !cruft_expiration) {
collapse_small_cruft_packs(in, args->max_pack_size, existing);
} else {
for_each_string_list_item(item, &existing->non_kept_packs)
fprintf(in, "-%s.pack\n", item->string);
for_each_string_list_item(item, &existing->cruft_packs)
fprintf(in, "-%s.pack\n", item->string);
}
This patch makes collapse_small_cruft_packs() behave identically to the
'else' arm of the conditional above. This repurposing of
'collapse_small_cruft_packs()' is intentional, since it will set us up
nicely to introduce the new behavior in the following commit.
Naturally, there is some test fallout in the test which exercises the
old meaning of '--max-cruft-size'. Mark that test as failing for now to
be dealt with in the following commit. Likewise, add a new test which
explicitly tests the behavior of '--max-cruft-size' to place a hard
limit on the size of any generated cruft pack(s).
Note that this is a breaking change, as it alters the user-visible
behavior of '--max-cruft-size'. But I'm OK changing this behavior in
this instance, since the behavior wasn't accurate to begin with.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Acked-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
A previous commit moved a handful of tests from a different script into
t7704, including one that relies on generating random blobs.
Incidentally, the original home of this test defined its own helper
"write_blob" for doing so, which is identical in function to our
"generate_random_blob" (and is slightly inferior to the latter, which
cleans up after itself).
Rewrite the test that uses "write_blob" to no longer do so and then
remove the function.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Acked-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Now that a number of new tests have landed in t7704, make sure that they
all make sense and are testing the things they say they are.
Things are mostly OK, but a handful of tests needed tweaks. Those tweaks
are as follows:
- Use the terms "too large" or "too small" in tests that exercise the
'--max-cruft-size' behavior. This has historically been treated as a
threshold beneath which to combine cruft packs, but that will change
in a subsequent commit. Prepare for that by using a more generic
term.
- Remove references to "--max-cruft-size" in the freshening tests.
These tests provide coverage of our ability to record updated mtimes
for objects already in cruft packs whose mtimes are upserted from
various sources (loose objects, finding that object in a new pack,
another cruft pack, etc.).
These have nothing to do with the '--max-cruft-size' feature, and in
fact none of the tests even *use* '--max-cruft-size'. Name them
appropriately to make it clear that these tests exercise freshening
behavior, not '--max-cruft-size' behavior.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Acked-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The cruft pack feature has two primary test scripts which exercise
various parts of it, which are:
- t5329-pack-objects-cruft.sh
- t7704-repack-cruft.sh
The former is designed to test low-level pack generation mechanics at
the 'git pack-objects --cruft'-level, which is plumbing. The latter, on
the other hand, is designed to test the user-facing behavior through
'git repack --cruft', which is porcelain (under the "ancillary
manipulators" sub-section).
At some point a handful of tests which should have been added to the
latter script were instead written to the former. This isn't a huge
deal, but rectifying it is straightforward. Move a handful of
'repack'-related tests out of t5329 and into their rightful home in
t7704.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Acked-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The `--missing={print,print-info}` option for git-rev-list(1) prints
missing objects found while performing the object walk in the form:
$ git rev-list --missing=print-info <rev>
?<oid> [SP <token>=<value>]... LF
Add support for printing missing objects in a NUL-delimited format when
the `-z` option is enabled.
$ git rev-list -z --missing=print-info <rev>
<oid> NUL missing=yes NUL [<token>=<value> NUL]...
In this mode, values containing special characters or spaces are printed
as-is without being escaped or quoted. Instead of prefixing the missing
OID with '?', a separate `missing=yes` token/value pair is appended.
Signed-off-by: Justin Tobler <jltobler@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The `--boundary` option for git-rev-list(1) prints boundary objects
found while performing the object walk in the form:
$ git rev-list --boundary <rev>
-<oid> LF
Add support for printing boundary objects in a NUL-delimited format when
the `-z` option is enabled.
$ git rev-list -z --boundary <rev>
<oid> NUL boundary=yes NUL
In this mode, instead of prefixing the boundary OID with '-', a separate
`boundary=yes` token/value pair is appended.
Signed-off-by: Justin Tobler <jltobler@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When walking objects, git-rev-list(1) prints each object entry on a
separate line. Some options, such as `--objects`, may print additional
information about tree and blob object on the same line in the form:
$ git rev-list --objects <rev>
<tree/blob oid> SP [<path>] LF
Note that in this form the SP is appended regardless of whether the tree
or blob object has path information available. Paths containing a
newline are also truncated at the newline.
Introduce the `-z` option for git-rev-list(1) which reformats the output
to use NUL-delimiters between objects and associated info in the
following form:
$ git rev-list -z --objects <rev>
<oid> NUL [path=<path> NUL]
In this form, the start of each record is signaled by an OID entry that
is all hexidecimal and does not contain any '='. Additional path info
from `--objects` is appended to the record as a token/value pair
`path=<path>` as-is without any truncation.
For now, the `--objects` flag is the only options that can be used in
combination with `-z`. In a subsequent commit, NUL-delimited support for
other options is added. Other options that do not make sense when used
in combination with `-z` are rejected.
Signed-off-by: Justin Tobler <jltobler@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Before invoking `setup_revisions()`, the `--missing` and
`--exclude-promisor-objects` options are parsed early. In a subsequent
commit, another option is added that must be parsed early.
Refactor the code to parse both options in a single early pass.
Signed-off-by: Justin Tobler <jltobler@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The `show_object_with_name()` function only has a single call site.
Inline call to `show_object_with_name()` in `show_object()` so the
explicit function can be cleaned up and live closer to where it is used.
While at it, factor out the code that prints the OID and newline for
both objects with and without a name. In a subsequent commit,
`show_object()` is modified to support printing object information in a
NUL-delimited format.
Signed-off-by: Justin Tobler <jltobler@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When the compiler/linker cannot verify that an assert() invocation is
free of side effects for us (e.g. because the assertion includes some
kind of function call), replace the use of assert() with ASSERT().
Signed-off-by: Elijah Newren <newren@gmail.com>
Acked-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
It is a big no-no to have side-effects in an assertion, because if the
assert() is compiled out, you don't get that side-effect, leading to the
code behaving differently. That can be a large headache to debug.
We have roughly 566 assert() calls in our codebase (my grep might have
picked up things that aren't actually assert() calls, but most appeared
to be). All but 9 of them can be determined by gcc to be free of side
effects with a clever redefine of assert() provided by Bruno De Fraine
(from
https://stackoverflow.com/questions/10593492/catching-assert-with-side-effects),
who upon request has graciously placed his two-liner into the public
domain without warranty of any kind. The current 9 assert() calls
flagged by this clever redefinition of assert() appear to me to be free
of side effects as well, but are too complicated for a compiler/linker
to figure that since each assertion involves some kind of function call.
Add a CI job which will find and report these possibly problematic
assertions, and have the job suggest to the user that they replace these
with ASSERT() calls.
Example output from running:
```
ERROR: The compiler could not verify the following assert()
calls are free of side-effects. Please replace with
ASSERT() calls.
/home/newren/floss/git/diffcore-rename.c:1409
assert(!dir_rename_count || strmap_empty(dir_rename_count));
/home/newren/floss/git/merge-ort.c:1645
assert(renames->deferred[side].trivial_merges_okay &&
!strset_contains(&renames->deferred[side].target_dirs,
path));
/home/newren/floss/git/merge-ort.c:794
assert(omittable_hint ==
(!starts_with(type_short_descriptions[type], "CONFLICT") &&
!starts_with(type_short_descriptions[type], "ERROR")) ||
type == CONFLICT_DIR_RENAME_SUGGESTED);
/home/newren/floss/git/merge-recursive.c:1200
assert(!merge_remote_util(commit));
/home/newren/floss/git/object-file.c:2709
assert(would_convert_to_git_filter_fd(istate, path));
/home/newren/floss/git/parallel-checkout.c:280
assert(is_eligible_for_parallel_checkout(pc_item->ce, &pc_item->ca));
/home/newren/floss/git/scalar.c:244
assert(have_fsmonitor_support());
/home/newren/floss/git/scalar.c:254
assert(have_fsmonitor_support());
/home/newren/floss/git/sequencer.c:4968
assert(!(opts->signoff || opts->no_commit ||
opts->record_origin || should_edit(opts) ||
opts->committer_date_is_author_date ||
opts->ignore_date));
```
Note that if there are possibly problematic assertions, not necessarily
all of them will be shown in a single run, because the compiler errors
may include something like "ld: ... more undefined references to
`not_supposed_to_survive' follow" instead of listing each individually.
But in such cases, once you clean up a few that are shown in your first
run, subsequent runs will show (some of) the ones that remain, allowing
you to iteratively remove them all.
Helped-by: Bruno De Fraine <defraine@gmail.com>
Signed-off-by: Elijah Newren <newren@gmail.com>
Acked-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Create a ASSERT() macro which is similar to assert(), but will not be
compiled out when NDEBUG is defined, and is thus safe to use even if its
argument has side-effects.
We will use this new macro in a subsequent commit to convert a few
existing assert() invocations to ASSERT(). In particular, we'll
convert the handful of invocations which cannot be proven to be free of
side effects with a simple compiler/linker hack.
Signed-off-by: Elijah Newren <newren@gmail.com>
Acked-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Previously, write_object_record() would flush the current block and retry
appending the record whenever block_writer_add() returned any nonzero
error. This forced an assumption that every failure meant the block was
full, even when errors such as memory allocation or I/O failures occurred.
Update the write_object_record() to inspect the error code returned by
block_writer_add() and flush and reinitialize the writer iff the
error is REFTABLE_ENTRY_TOO_BIG_ERROR. For any other error, immediately
propagate it.
If the flush and reinitialization still fail with
REFTABLE_ENTRY_TOO_BIG_ERROR, reset the record's offset length to zero
before a final attempt.
All call sites now handle various error codes returned by
block_writer_add().
Signed-off-by: Meet Soni <meetsoni3017@gmail.com>
Acked-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Previously, writer_add_record() would flush the current block and retry
appending the record whenever block_writer_add() returned any nonzero
error. This forced an assumption that every failure meant the block was
full, even when errors such as memory allocation or I/O failures occurred.
Update the writer_add_record() to inspect the error code returned by
block_writer_add() and only flush and reinitialize the writer when the
error is REFTABLE_ENTRY_TOO_BIG_ERROR. For any other error, immediately
propagate it.
Signed-off-by: Meet Soni <meetsoni3017@gmail.com>
Acked-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Previously, functions block_writer_add() and related functions returned
-1 when the record did not fit, forcing the caller to assume that any
failure meant the entry was too big. Replace these generic -1 returns
with defined error codes.
This prepares the codebase for finer-grained error handling so that
callers can distinguish between a block-full condition and other errors.
Signed-off-by: Meet Soni <meetsoni3017@gmail.com>
Acked-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The comment added in 7252d9a036 (pseudo-merge: implement support for
finding existing merges, 2024-05-23) misspells 'bitmap' as 'bitamp'.
Correct that so that we no longer have any stray "bitamps" lurking
throughout the tree:
$ git grep -ci bitamp | wc -l
0
Noticed-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Acked-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
For similar reasons as in the previous refactoring of `refspec_init()`
into `refspec_init_fetch()` and `refspec_init_push()`, apply the same
refactoring to `refspec_item_init()`.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Acked-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
There are two callers of this function, which ensures that a dispatched
call to refspec_item_init() does not fail.
In the following commit, we're going to add fetch/push-specific variants
of refspec_item_init(), which will turn one function into two. To avoid
introducing yet another pair of new functions (such as
refspec_item_init_push_or_die() and refspec_item_init_fetch_or_die()),
let's remove the thin wrapper entirely.
This duplicates a single line of code among two callers, but thins the
refspec.h API by one function, and prevents introducing two more in the
following commit.
Note that we still have a trailing Boolean argument in the function
`refspec_item_init()`. The following commit will address this.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Acked-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
To avoid having a Boolean argument in the refspec_init() function,
replace it with two variants:
- `refspec_init_fetch()`
- `refspec_init_push()`
to codify the meaning of that Boolean into the function's name itself.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Acked-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Since 6d4c057859 (refspec: introduce struct refspec, 2018-05-16), we
have macros called REFSPEC_FETCH and REFSPEC_PUSH. This confusingly
suggests that we might introduce other modes in the future, which, while
possible, is highly unlikely.
But these values are treated as a Boolean, and stored in a struct field
called 'fetch'. So the following:
if (refspec->fetch == REFSPEC_FETCH) { ... }
, and
if (refspec->fetch) { ... }
are equivalent. Let's avoid renaming the Boolean values "true" and
"false" here and remove the two REFSPEC_ macros mentioned above.
Since this value is truly a Boolean and will only ever take on a value
of 0 or 1, we can declare it as a single bit unsigned field. In
practice this won't shrink the size of 'struct refspec', but it more
clearly indicates the intent.
Note that this introduces some awkwardness like:
refspec_item_init_or_die(&spec, refspec, 1);
, where it's unclear what the final "1" does. This will be addressed in
the following commits.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Acked-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* jk/fetch-ref-prefix-cleanup:
fetch: use ref prefix list to skip ls-refs
fetch: avoid ls-refs only to ask for HEAD symref update
fetch: stop protecting additions to ref-prefix list
fetch: ask server to advertise HEAD for config-less fetch
refspec_ref_prefixes(): clean up refspec_item logic
t5516: beef up exact-oid ref prefixes test
t5516: drop NEEDSWORK about v2 reachability behavior
t5516: prefer "oid" to "sha1" in some test titles
t5702: fix typo in test name
curl supports a few options to control when and how often it should
instruct the OS to send TCP keepalives, like KEEPIDLE, KEEPINTVL, and
KEEPCNT. Until this point, there hasn't been a way for users to change
what values are used for these options, forcing them to rely on curl's
defaults.
But we do unconditionally enable TCP keepalives without giving users an
ability to tweak any fine-grained parameters. Ordinarily this isn't a
problem, particularly for users that have fast-enough connections,
and/or are talking to a server that has generous or nonexistent
thresholds for killing a connection it hasn't heard from in a while.
But it can present a problem when one or both of those assumptions fail.
For instance, I can reliably get an in-progress clone to be killed from
the remote end when cloning from some forges while using trickle to
limit my clone's bandwidth.
For those users and others who wish to more finely tune the OS's
keepalive behavior, expose configuration and environment variables which
allow setting curl's KEEPIDLE, KEEPINTVL, and KEEPCNT options.
Note that while KEEPIDLE and KEEPINTVL were added in curl 7.25.0,
KEEPCNT was added much more recently in curl 8.9.0. Per f7c094060c
(git-curl-compat: remove check for curl 7.25.0, 2024-10-23), both
KEEPIDLE and KEEPINTVL are set unconditionally. But since we may be
compiled with a curl that isn't as new as 8.9.0, only set KEEPCNT when
we have CURLOPT_TCP_KEEPCNT to begin with.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Acked-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
At the end of `get_curl_handle()` we call `set_curl_keepalive()` to
enable TCP keepalive probes on our CURL handle. `set_curl_keepalive()`
dates back to 47ce115370 (http: use curl's tcp keepalive if available,
2013-10-14), which conditionally compiled different variants of
`set_curl_keepalive()` depending on what version of curl we were
compiled with[^1].
As of f7c094060c (git-curl-compat: remove check for curl 7.25.0,
2024-10-23), we no longer conditionally compile `set_curl_keepalive()`
since we no longer support pre-7.25.0 versions of curl. But the version
of that function that we kept is really just a thin wrapper around
setting the TCP_KEEPALIVE option, so there's no reason to keep it in its
own function.
Inline the definition of `set_curl_keepalive()` to within
`get_curl_handle()` so that the setup of our CURL handle is
self-contained.
[1]: The details are spelled out in 47ce115370, but the gist is curl
7.25.0 and newer use CURLOPT_TCP_KEEPALIVE, older versions use
CURLOPT_SOCKOPTFUNCTION with a custom callback, and older versions
that predate even that option do nothing.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Acked-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In 7059cd99fc (http_init(): Fix config file parsing, 2009-03-09), http.c
gained a new "set_from_env()" function as a convenience function around
conditionally assigning an environment variable to some variable if and
only if the environment variable was set to begin with.
But prior to 7059cd99fc, there were two spots which need to first
strtol() whatever is set in the environment before assigning it to a
long pointer. Both instances stored the result of getenv() in a
temporary variable, and conditionally strtol() it depending on whether
or not getenv() returned NULL.
Replace those two instances with a new cousin of 'set_from_env()' called
'set_long_from_env()', which does what its name suggests. This allows us
to remove the temporary variables and clean up some minor code
duplication while also adding more robust error handling.
More importantly, however, it prepares us for a future commit which will
introduce more instances of assigning an environment variable to a long.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Acked-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When parsing 'http.lowSpeedLimit' and 'http.lowSpeedTime', we explicitly
cast the result of 'git_config_int()' to a long before assignment. This
cast has been in place since all the way back in 58e60dd203 (Add support
for pushing to a remote repository using HTTP/DAV, 2005-11-02).
But that cast has always been unnecessary, since long is guaranteed to
be at least as wide as int. Let's drop the cast accordingly.
Noticed-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Acked-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The CI setups of GitLab and GitHub use a common dependency management
script 'ci/install-dependencies.sh'. The script install the necessary
packages based on a combination of the "$distro" and "$jobname" env
variables.
The "$distro" variable is derived from the "CI_JOB_IMAGE" env variable
set by the CI configs. In the GitHub CI config, some of the jobs are
missing this variable. For the 'Documentation' job which depends on
'meson' being installed, this raises an error since the 'meson'
dependency is never installed.
Fix this by adding the 'CI_JOB_IMAGE' variable to all missing jobs. We
don't add it the windows jobs, since they manager their dependency as
part of the CI config and no further dependency management is needed.
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
- Switch the synopsis to a synopsis block which automatically
formats placeholders in italics and keywords in monospace
- Use _<placeholder>_ instead of <placeholder> in the description
- Use `backticks` for keywords and more complex option
descriptions. The new rendering engine applies synopsis rules to
these spans.
Possible values for some variables, that were mentioned in the description
prose, are now made into enumerated list.
Signed-off-by: Jean-Noël Avila <jn.avila@free.fr>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
With the modern formatting of the manpages, the options and commands are now
backticked in their definition lists. This patch updates the generation of
the completion list to take into account this new format.
The script `generate-configlist.sh` is updated to get rid of extraneous
commands and fit everything in a single sed script.
Signed-off-by: Jean-Noël Avila <jn.avila@free.fr>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Commit 0578f1e66a ("global: adapt callers to use generic hash context helpers")
accidentally removed `->init_fn`, which is required for OpenSSL 3+ SHA1.
This fixes the following error on fetch:
fatal: fetch-pack: invalid index-pack output
Signed-off-by: Jensen Huang <hmz007@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Because the "[remote "nick"] fetch = ..." configuration variables
have the nickname in the second part, the nicknames are case
sensitive, unlike the first and the third component (i.e.
"remote.origin.fetch" and "Remote.origin.FETCH" are the same thing,
but "remote.Origin.fetch" and "remote.origin.fetch" are different).
Let's follow the way Git works in general and compare the remote
names case sensitively when processing advertised remotes.
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In the 'KnownUrl' case, in should_accept_remote(), let's check that
`remote_url` is not NULL before we use strcmp() to compare it with
the local URL. This could avoid crashes if a server starts to not
advertise any URL in the future.
If `remote_url` is NULL, we should reject the URL. Let's also warn in
this case because we warn otherwise when a remote is rejected to try
to help diagnose things at the end of the function.
And while we are checking that remote_url is not NULL and warning if
it is, it makes sense to also help diagnose the case where remote_url
is empty.
Also while at it, let's spell "URL" with uppercase letters in all the
warnings.
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Using strvec_push() to push `NULL` into a 'strvec' results in a
segfault, because `xstrdup(NULL)` crashes.
So when an URL is missing from the config, let's not push the remote
name and URL into the 'strvec's.
While at it, let's also not push them in case the URL is empty. It's
just not worth the trouble and it's consistent with how Git otherwise
treats missing and empty URLs in the same way.
Note that in case of missing or empty URL, Git uses the remote name to
fetch, which can work if the remote is on the same filesystem. So
configurations where the client, server and remote are all on the same
filesystem may need URLs to be configured even if they are the same as
the remote names. But this is a rare case, and the work around is easy
enough.
We leave improving the strvec API and/or xstrdup() for a future
separate effort.
While at it, let's also use git_config_get_string_tmp() instead of
git_config_get_string() to simplify memory management.
Helped-by: Jeff King <peff@peff.net>
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
If `test_when_finished "rm -rf client"` is run after we clone, it
will not run if the clone failed, so the "client" directory might
not be removed at the end of the test.
`git clone` does try to remove the directory when it fails, but
let's be safe and try to protect against possibly weird clone
failures by moving `test_when_finished "rm -rf client"` before
the clone. It just makes more sense this way around.
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When we are going to consider updating the refs/remotes/*/HEAD symref,
we have to ask the remote side where its HEAD points. But if we know
that the feature is disabled by config, we don't need to bother!
This saves a little bit of work and network communication for the
server. And even a little bit of effort on the client, as our local
set_head() function did a bit of work matching the remote HEAD before
realizing that we're not going to do anything with it.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The new followRemoteHEAD feature is triggered for almost every fetch,
causing us to ask the server about the remote "HEAD" and to consider
updating our local tracking HEAD symref. This patch limits the feature
only to the case when we are fetching a remote using its configured
refspecs (typically into its refs/remotes/ hierarchy). There are two
reasons for this.
One is efficiency. E.g., the fixes in 6c915c3f85 (fetch: do not ask for
HEAD unnecessarily, 2024-12-06) and 20010b8c20 (fetch: avoid ls-refs
only to ask for HEAD symref update, 2025-03-08) were aimed at reducing
the work we do when we would not be able to update HEAD anyway. But they
do not quite cover all cases. The remaining one is:
git fetch origin refs/heads/foo:refs/remotes/origin/foo
which _sometimes_ can update HEAD, but usually not. And that leads us to
the second point, which is being simple and explainable.
The code for updating the tracking HEAD symref requires both that we
learned which ref the remote HEAD points at, and that the server
advertised that ref to us. But because the v2 protocol narrows the
server's advertisement, the command above would not typically update
HEAD at all, unless it happened to point to the "foo" branch. Or even
weirder, it probably _would_ update if the server is very old and
supports only the v0 protocol, which always gives a full advertisement.
This creates confusing behavior for the user: sometimes we may try to
update HEAD and sometimes not, depending on vague rules.
One option here would be to loosen the update code to accept the remote
HEAD even if the server did not advertise that ref. I think that could
work, but it may also lead to interesting corner cases (e.g., creating a
dangling symref locally, even though the branch is not unborn on the
server, if we happen not to have fetched it).
So let's instead simplify the rules: we'll only consider updating the
tracking HEAD symref when we're doing a full fetch of the remote's
configured refs. This is easy to implement; we can just set a flag at
the moment we realize we're using the configured refspecs. And we can
drop the special case code added by 6c915c3f85 and 20010b8c20, since
this covers those cases. The existing tests from those commits still
pass.
In t5505, an incidental call to "git fetch <remote> <refspec>" updated
HEAD, which caused us to adjust the test in 3f763ddf28 (fetch: set
remote/HEAD if it does not exist, 2024-11-22). We can now adjust that
back to how it was before the feature was added.
Even though t5505 is incidentally testing our new desired behavior,
we'll add an explicit test in t5510 to make sure it is covered.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* jk/fetch-ref-prefix-cleanup:
fetch: use ref prefix list to skip ls-refs
fetch: avoid ls-refs only to ask for HEAD symref update
fetch: stop protecting additions to ref-prefix list
fetch: ask server to advertise HEAD for config-less fetch
refspec_ref_prefixes(): clean up refspec_item logic
t5516: beef up exact-oid ref prefixes test
t5516: drop NEEDSWORK about v2 reachability behavior
t5516: prefer "oid" to "sha1" in some test titles
t5702: fix typo in test name
When BreakingChanges.txt was added in 57ec9254eb9 (docs: introduce
document to announce breaking changes, 2024-06-14) there was no
corresponding change to the Makefile to build it. Fix that by adding it
to the TECH_DOCS target.
Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Separate the paragraphs in the description of `--exclude` with a `+`
rather than an empty line to indent the whole description rather than
just the first paragraph.
Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Switch from merge-recursive to merge-ort. Adjust the following
testcases due to the switch:
* t4151: This test left an untracked file in the way of the merge.
merge-recursive could only sometimes tell when untracked files were
in the way, and by the time it discovers others, it has already made
too many changes to back out of the merge. So, instead of writing the
results to e.g. 'file1' it would instead write them to
'file1~branch1'. This is confusing for users, because they might not
notice 'file1~branch1' and accidentally add and commit 'file1'.
In contrast, merge-ort correctly notices the file in the way before
making any changes and aborts. Since this test didn't care about the
file in the way, just remove it before calling git-am.
* t4255: Usage of merge-ort allows us to change two known failures into
successes.
* t6427: As noted a few commits ago, the choice of conflict label for
diff3 markers for the ancestor commit was previously handled by
merge-recursive.c rather than by callers. Since that has now changed,
`git am` needs to specify that label. Although the previous conflict
label ("constructed merge base") was already fairly somewhat slanted
towards `git am`, let's use wording more along the lines of the
related command-line flag from `git apply` and function involved to
tie it more closely to `git am`.
Signed-off-by: Elijah Newren <newren@gmail.com>
Reviewed-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
There are two issues here.
First, when merge.directoryRenames is set to false, there are a few code
paths that should be turned off. I missed one; collect_renames() was
still doing some directory rename detection logic unconditionally. It
ended up not having much effect because
get_provisional_directory_renames() was skipped earlier and not setting
up renames->dir_renames, but the code should still be skipped.
Second, the larger issue is that sometimes we get a cached_pair rename
from a previous commit being replayed mapping A->B, but in a subsequent
commit but collect_merge_info() doesn't even recurse into the
directory containing B because there are no source pairings for that
rename that are relevant; we can merge that commit fine without knowing
the rename. But since the cached renames are added to the normal
renames, when we go to process it and find that B is not part of
opt->priv->paths, we hit the assertion error
process_renames: Assertion `newinfo && ~newinfo->merged.clean` failed.
I think we could fix this at the beginning of detect_regular_renames() by
pruning from cached_pairs any entry whose destination isn't in
opt->priv->paths, but it's suboptimal in that we'd kind of like the
cached_pair to be restored afterwards so that it can help the subsequent
commit, but more importantly since it sits at the intersection of
the caching renames optimization and the relevant renames optimization,
and the trivial directory resolution optimization, and I don't currently
have Documentation/technical/remembering-renames.txt fully paged in, I'm
not sure if that's a full solution or a bandaid for the current
testcase. However, since the remembering renames optimization was the
weakest of the set, and the optimization is far less important when
directory rename detection is off (as that implies far fewer potential
renames), let's just use a bigger hammer to ensure this special case is
fixed: turn off the rename caching. We do the same thing already when
we encounter rename/rename(1to1) cases (as per `git grep -3
disabling.the.optimization`, though it uses a slightly different
triggering mechanism since it's trying to affect the next time that
merge_check_renames_reusable() is called), and I think it makes sense
to do the same here.
Signed-off-by: Elijah Newren <newren@gmail.com>
Reviewed-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
There is a bug in the way renames are cached that rears its head when
`merge.directoryRenames` is set to false; it results in the following
message:
merge-ort.c:3002: process_renames: Assertion `newinfo && !newinfo->merged.clean' failed.
Aborted
It is quite a curious bug: the same test case will succeed, without any
assertion, if instead run with `merge.directoryRenames=true`.
Further, the assertion does not manifest while replaying the first
commit, it manifests while replaying the _second_ commit of the commit
range. But it does _not_ manifest when the second commit is replayed
individually.
This would indicate that there is an incomplete rename cache left-over
from the first replayed commit which is being reused for the second
commit, and if directory rename detection is enabled, the missing paths
are somehow regenerated.
Incidentally, the same bug can by triggered by modifying t6429 to switch
from merge.directoryRenames=true to merge.directoryRenames=false.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
[en: tweaked the commit message slightly, including adjusting the
line number of the assertion to the latest version, and the much
later discovery that a simple t6429 tweak would also display the
issue.]
Signed-off-by: Elijah Newren <newren@gmail.com>
Reviewed-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Various callers such as am & checkout set the merge verbosity to 0 to
avoid having conflict messages printed. While this could be achieved by
avoiding the wrappers from merge-ort-wrappers and instead passing 0 for
display_update_msgs to merge_switch_to_result(), for simplicity of
converting callers simply allow them to also achieve this with the
merge-ort-wrappers by setting verbosity to 0.
Signed-off-by: Elijah Newren <newren@gmail.com>
Reviewed-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When merge-ort was written, I did not at first allow rename detection to
be disabled, because I suspected that most folks disabling rename
detection were doing so solely for performance reasons. Since I put a
lot of working into providing dramatic speedups for rename detection
performance as used by the merge machinery, I wanted to know if there
were still real world repositories where rename detection was
problematic from a performance perspective. We have had years now to
collect such information, and while we never received one, waiting
longer with the option disabled seems unlikely to help surface such
issues at this point. Also, there has been at least one request to
allow rename detection to be disabled for behavioral rather than
performance reasons (see the thread including
https://lore.kernel.org/git/CABPp-BG-Nx6SCxxkGXn_Fwd2wseifMFND8eddvWxiZVZk0zRaA@mail.gmail.com/
), so let's start heeding the config and command line settings.
Signed-off-by: Elijah Newren <newren@gmail.com>
Reviewed-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
merge-recursive.[ch] have three entry points:
* merge_trees()
* merge_recursive()
* merge_recursive_generic()
merge-ort*.[ch] only has equivalents for the first two. Add an
equivalent for the final entry point, so we can switch callers to
use it and remove merge-recursive.[ch].
While porting it over, finally fix the issue with the label for the
ancestor (used when merge.conflictStyle=diff3 as a conflict label).
merge-recursive.c has traditionally not allowed callers to set that
label, but I have found that problematic for years.
(Side note: This function was initially part of the merge-ort rewrite,
but reviewers questioned the ancestor label funnyness which I was
never really happy with anyway. It resulted in me jettisoning it and
hoping at the time that I would eventually be able to force the existing
callers to use some other API. That worked with `git stash`, as per
874cf2a60444 (stash: apply stash using 'merge_ort_nonrecursive()',
2022-05-10), but this API is the most reasonable one for `git am` and
`git merge-recursive`, if we can just allow them some freedom over the
ancestor label.)
The merge_recursive_generic() function did not know whether it was being
invoked by `git stash`, `git merge-recursive`, or `git am`, and the
choice of meaningful ancestor label, when there is a unique ancestor,
varies for these different callers:
* git am: ancestor is a constructed "fake ancestor" that user knows
nothing about and has no access to. (And is different than
the normal thing we mean by a "virtual merge base" which is
the merging of merge bases.)
* git merge-recursive: ancestor might be a tree, but at least it
was one specified by the user (if they invoked
merge-recursive directly)
* git stash: ancestor was the commit serving as the stash base
Thus, using a label like "constructed merge base" (as
merge_recursive_generic() does) presupposes that `git am` is the only
caller; it is incorrect for other callers. This label has thrown me off
more than once. Allow the caller to override when there is a unique
merge base.
Signed-off-by: Elijah Newren <newren@gmail.com>
Reviewed-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The graph for `--ancestry-path=H D..M` should contain commit C.
Signed-off-by: Han Jiang <jhcarl0814@gmail.com>
Acked-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This note was added to the restore command docs in 46e91b663b
(checkout: split part of it to new command 'restore', 2019-04-25),
but it is now inaccurate. The underlying builtin `add -i` implementation,
made default in 0527ccb1b5 (add -i: default to the built-in implementation,
2021-11-30), supports pathspecs, so `git restore -p <pathspec>...` has
worked for all users since then. I bisected to verify this was the commit
that added support.
Signed-off-by: Adam Johnson <me@adamj.eu>
Acked-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Having the compiler point out unreachable code can help avoid bugs, like
the one discussed in:
https://lore.kernel.org/git/20250307195057.GA3675279@coredump.intra.peff.net/
In that case it was found by Coverity, but finding it earlier saves
everybody time and effort.
We can use -Wunreachable-code to get some help from the compiler here.
Interestingly, this is a noop in gcc. It was a real warning up until gcc
4.x, when it was removed for being too flaky, but they left the
command-line option to avoid breaking users. See:
https://stackoverflow.com/questions/17249934/why-does-gcc-not-warn-for-unreachable-code
However, clang does implement this option, and it finds the case
mentioned above (and no other cases within the code base). And since we
run clang in several of our CI jobs, that's enough to get an early
warning of breakage.
We could enable it only for clang, but since gcc is happy to ignore it,
it's simpler to just turn it on for all developer builds.
Signed-off-by: Jeff King <peff@peff.net>
[jc: squashed meson.build change sent by Patrick]
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Our hope is that the number of code paths that falsely trigger
warnings with the -Wunreachable-code compilation option are small,
and they can be worked around case-by-case basis, like we just did
in the previous commit. If we need such a workaround a bit more
often, however, we may benefit from a more generic and descriptive
facility that helps document the cases we need such workarounds.
Side note: if we need the workaround all over the place, it
simply means -Wunreachable-code is not a good tool for us to
save engineering effort to catch mistakes. We are still
exploring if it helps us, so let's assume that it is not the
case.
Introduce NOT_CONSTANT() macro, with which, the developer can tell
the compiler:
Do not optimize this expression out, because, despite whatever
you are told by the system headers, this expression should *not*
be treated as a constant.
and use it as a replacement for the workaround we used that was
somewhat specific to the sigfillset case. If the compiler already
knows that the call to sigfillset() cannot fail on a particular
platform it is compiling for and declares that the if() condition
would not hold, it is plausible that the next version of the
compiler may learn that sigfillset() that never fails would not
touch errno and decide that in this sequence:
errno = 0;
sigfillset(&all)
if (errno)
die_errno("sigfillset");
the if() statement will never trigger. Marking that the value
returned by sigfillset() cannot be a constant would document our
intention better and would not break with such a new version of
compiler that is even more "clever". With the marco, the above
sequence can be rewritten:
if (NOT_CONSTANT(sigfillset(&all)))
die_errno("sigfillset");
which looks almost like other innocuous annotations we have,
e.g. UNUSED.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
While 'git-reflog(1)' currently allows users to expire reflogs and
delete individual entries, it lacks functionality to completely remove
reflogs for specific references. This becomes problematic in
repositories where reflogs are not needed but continue to accumulate
entries despite setting 'core.logAllRefUpdates=false'.
Add a new 'drop' subcommand to git-reflog that allows users to delete
the entire reflog for a specified reference. Include an '--all' flag to
enable dropping all reflogs from all worktrees and an addon flag
'--single-worktree', to only drop all reflogs from the current worktree.
While here, remove an extraneous newline in the file.
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The 'git reflog expire' prints the error message '<ref> points nowhere!'
when used with a non-existent ref. This message is a bit confusing and
vague. Modify the message to be more clear and direct.
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Since enabling -Wunreachable-code, builds with clang on macOS now fail,
complaining that the die_errno() call in:
if (sigfillset(&all))
die_errno("sigfillset");
is unreachable. On that platform the manpage documents that sigfillset()
always returns success, and presumably the implementation is a macro or
inline function that does so in a way that is transparent to the
compiler.
But we should continue to check on other platforms, since POSIX says it
may return an error.
We could solve this with a compile-time knob to split the two cases
(assuming success on macOS and checking for the error elsewhere). But we
can also work around it more directly by relying on errno to check the
outcome (since POSIX dictates that errno will be set on error). And that
works around the compiler's cleverness, since it doesn't know the
semantics of errno (though I suppose if sigfillset() is simple enough,
it could perhaps realize that no writes to errno are possible; however
this does seem to work in practice).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Both strset_for_each_entry and strintmap_for_each_entry are macros that
evaluate to the same thing, so they are technically interchangeable.
However, the intent is that we use the one matching the variable type we
are passing. Unfortunately, I somehow mistakenly got one of these wrong
in 7bee6c100431 (merge-ort: avoid recursing into directories when we
don't need to, 2021-07-16) -- possibly related to the fact that
relevant_sources was initially a strset and later refactored into a
strintmap. Correct which macro we use.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
t7615 is entirely about testing the differences about different
diff algorithms, but it doesn't specify any diff algorithm when it
is testing myers. Given that we have discussed potentially switching
defaults (https://lore.kernel.org/git/xmqqed873vgn.fsf@gitster.g/), it
makes sense in tests that are about different diff algorithms to be
explicitly about which one is intended to be used in each test. Add
that specificity.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The comment describing testcase 13b of t6423 somehow mixed up commits
A and B in one paragraph. Fix the references.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
stash was modified to use merge_ort_nonrecursive() instead of
merge_recursive_generic() back in commit 874cf2a60444 (stash: apply
stash using 'merge_ort_nonrecursive()', 2022-05-10). That makes the
inclusion of merge-recursive.h unnecessary. In preparation for the
removal of merge-recursive.h, remove the unnecessary include.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The `perl` variable in meson.build is assigned to a program lookup,
which may have the value "not-found object" if configuring with
`-Dperl=disabled`.
There is already a list of other cases where we do need a perl command,
even when not building perl bindings. Building documentation should be
one of those cases, but was missing from the list. Add it.
Fixes:
```
$ meson setup builddir/ -Ddocs=man -Dperl=disabled -Dtests=false
[...]
Documentation/meson.build:308:22: ERROR: Tried to use not-found external program in "command"
```
Bug: https://bugs.gentoo.org/949247
Signed-off-by: Eli Schwartz <eschwartz@gentoo.org>
Acked-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This amends 1ae85ff6d (git-gui: strip comments and consecutive empty
lines from commit messages, 2024-08-13) to deal with custom comment
characters/strings.
The magic commentString value "auto" is not handled, because the option
makes no sense to me - it does not support comments in templates and
hook output, and it seems far-fetched that someone would introduce
comments during editing the message.
Signed-off-by: Oswald Buddenhagen <oswald.buddenhagen@gmx.de>
Signed-off-by: Johannes Sixt <j6t@kdbg.org>
Prior to commit 9db2ac56168e (diffcore-rename: accelerate rename_dst
setup, 2020-12-11), the function add_rename_dst() resulted in quadratic
runtime since each call inserted the new entry into the array in sorted
order. The reason for the sorted order requirement was so that
locate_rename_dst(), used when break detection is turned on, could find
the appropriate entry in logarithmic time via bisection on string
comparisons. (It's better to be quadratic in moving pointers than
quadratic in string comparisons, so this made some sense.) However,
since break detection always sticks the broken pairs adjacent to each
other, that commit decided to simply append entries to rename_dst, and
record the mapping of (filename) -> (index within rename_dst) via a
strintmap. Doing this relied on the fact that when adding the source of
a broken pair via register_rename_src(), that the next item we'd process
was the other half of the same broken pair and would be added to
rename_dst via add_rename_dst(). This assumption was fine under break
detection alone, but the combination of break detection and
single_follow violated that assumption because of this code:
else if (options->single_follow &&
strcmp(options->single_follow, p->two->path))
continue; /* not interested */
which would end up skipping calling add_rename_dst() below that point.
Since I knew I was assuming that the dst pair of a break would always be
added right after the src pair of a break, I added a new BUG() directive
as part of that commit later on at time of use that would check my
assumptions held. That BUG() didn't trip for nearly 4 years...which
sadly meant I had long since forgotten the related details. Anyway...
When the dst half of a broken pair is skipped like this, it means that
not only could my recorded index be invalid (just past the end of the
array), it could also point to some unrelated dst that just happened to
be the next one added to the array. So, to fix this, we need to add a
little more safety around the checks for the recorded break_idx.
It turns out that making a testcase to trigger this is quite the
challenge. I actually added two testscases:
* One testcase which uses --follow incorrectly (it uses its single
pathspec to specifying something other than a single filename), and
which triggers the same bug reported-by Olaf. This triggers a
special case within locate_rename_dst() where idx evaluates to 0
and rename_dst is NULL, meaning that our return value of
&rename_dst[idx] happens to evaluate to NULL as well. This
addressing of an index into a NULL array hints at deeper problems,
which are raised in the next testcase...
* A second testcase which when run under valgrind shows that the code
actually depends upon unintialized memory, in particular the entry
just after the end of the rename_dst array.
In short, when the two rare options -B and --follow are used together,
fix the accidental find of the wrong dst entry (which would often be
uninitialized memory just past the end of the array, but also could
have just been a dst for an unrelated path if no dst was recorded for
the expected path). Do so by adding a little more care around checking
the recorded indices in break_idx.
Reported-by: Olaf Hering <olaf@aepfle.de>
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
xdl_get_hunk() calculates the maximum number of common lines between two
changes that would fit into the same hunk for the given context options.
It involves doubling and addition and thus can overflow if the terms are
huge.
The type of ctxlen and interhunkctxlen in xdemitconf_t is long, while
the type of the corresponding context and interhunkcontext in struct
diff_options is int. On many platforms longs are bigger that ints,
which prevents the overflow. On Windows they have the same range and
the overflow manifests as hunks that are split erroneously and lines
being repeated between them.
Fix the overflow by checking and not going beyond LONG_MAX. This allows
specifying a huge context line count and getting all lines of a changed
files in a single hunk, as expected.
Reported-by: Jason Cho <jason11choca@proton.me>
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Once an object is written into a cruft pack, we can only freshen it by
writing a new loose or packed copy of that object with a more recent
mtime.
Prior to 61568efa95 (builtin/pack-objects.c: support `--max-pack-size`
with `--cruft`, 2023-08-28), we typically had at most one cruft pack in
a repository at any given time. So freshening unreachable objects was
straightforward when already rewriting the cruft pack (and its *.mtimes
file).
But 61568efa95 changes things: 'pack-objects' now supports writing
multiple cruft packs when invoked with `--cruft` and the
`--max-pack-size` flag. Cruft packs are rewritten until they reach some
size threshold, at which point they are considered "frozen", and will
only be modified in a pruning GC, or if the threshold itself is
adjusted.
Prior to this patch, however, this process breaks down when we attempt
to freshen an object packed in an earlier cruft pack, and that cruft
pack is larger than the threshold and thus will survive the repack.
When this is the case, it is impossible to freshen objects in cruft
pack(s) when those cruft packs are larger than the threshold. This is
because we would avoid writing them in the new cruft pack entirely, for
a couple of reasons.
1. When enumerating packed objects via 'add_objects_in_unpacked_packs()'
we pass the SKIP_IN_CORE_KEPT_PACKS, which is used to avoid looping
over the packs we're going to retain (which are marked as kept
in-core by 'read_cruft_objects()').
This means that we will avoid enumerating additional packed copies
of objects found in any cruft packs which are larger than the given
size threshold. Thus there is no opportunity to call
'create_object_entry()' whatsoever.
2. We likewise will discard the loose copy (if one exists) of any
unreachable object packed in a cruft pack that is larger than the
threshold. Here our call path is 'add_unreachable_loose_objects()',
which uses the 'add_loose_object()' callback.
That function will eventually land us in 'want_object_in_pack()'
(via 'add_cruft_object_entry()'), and we'll discard the object as it
appears in one of the packs which we marked as kept in-core.
This means in effect that it is impossible to freshen an unreachable
object once it appears in a cruft pack larger than the given threshold.
Instead, we should pack an additional copy of an unreachable object we
want to freshen even if it appears in a cruft pack, provided that the
cruft copy has an mtime which is before the mtime of the copy we are
trying to pack/freshen. This is sub-optimal in the sense that it
requires keeping an additional copy of unreachable objects upon
freshening, but we don't have a better alternative without the ability
to make in-place modifications to existing *.mtimes files.
In order to implement this, we have to adjust the behavior of
'want_found_object()'. When 'pack-objects' is told that we're *not*
going to retain any cruft packs (i.e. the set of packs marked as kept
in-core does not contain a cruft pack), the behavior is unchanged.
But when there *is* at least one cruft pack that we're holding onto, it
is no longer sufficient to reject a copy of an object found in that
cruft pack for that reason alone. In this case, we only want to reject a
candidate object when copies of that object either:
- exists in a non-cruft pack that we are retaining, regardless of that
pack's mtime, or
- exists in a cruft pack with an mtime at least as recent as the copy
we are debating whether or not to pack, in which case freshening
would be redundant.
To do this, keep track of whether or not we have any cruft packs in our
in-core kept list with a new 'ignore_packed_keep_in_core_has_cruft'
flag. When we end up in this new special case, we replace a call to
'has_object_kept_pack()' to 'want_cruft_object_mtime()', and only reject
objects when we have a copy in an existing cruft pack with at least as
recent an mtime as our candidate (in which case "freshening" would be
redundant).
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git version --build-options" stopped showing zlib version by
mistake due to recent refactoring, which has been corrected.
* tc/zlib-ng-fix:
help: print zlib-ng version number
help: include git-zlib.h to print zlib version
* ps/refname-avail-check-optim: (43 commits)
refs: reuse iterators when determining refname availability
refs/iterator: implement seeking for files iterators
refs/iterator: implement seeking for packed-ref iterators
refs/iterator: implement seeking for ref-cache iterators
refs/iterator: implement seeking for reftable iterators
refs/iterator: implement seeking for merged iterators
refs/iterator: provide infrastructure to re-seek iterators
refs/iterator: separate lifecycle from iteration
refs: stop re-verifying common prefixes for availability
refs/files: batch refname availability checks for initial transactions
refs/files: batch refname availability checks for normal transactions
refs/reftable: batch refname availability checks
refs: introduce function to batch refname availability checks
builtin/update-ref: skip ambiguity checks when parsing object IDs
object-name: allow skipping ambiguity checks in `get_oid()` family
object-name: introduce `repo_get_oid_with_flags()`
Git 2.49-rc0
The fourteenth batch
mailmap: fix check-mailmap with full mailmap line
The thirteenth batch
...
When verifying whether refnames are available we have to verify whether
any reference exists that is nested under the current reference. E.g.
given a reference "refs/heads/foo", we must make sure that there is no
other reference "refs/heads/foo/*".
This check is performed using a ref iterator with the prefix set to the
nested reference namespace. Until now it used to not be possible to
reseek iterators, so we always had to reallocate the iterator for every
single reference we're about to check. This keeps us from reusing state
that the iterator may have and that may make it work more efficiently.
Refactor the logic to reseek iterators. This leads to a sizeable speedup
with the "reftable" backend:
Benchmark 1: update-ref: create many refs (refformat = reftable, preexisting = 100000, new = 10000, revision = HEAD~)
Time (mean ± σ): 39.8 ms ± 0.9 ms [User: 29.7 ms, System: 9.8 ms]
Range (min … max): 38.4 ms … 42.0 ms 62 runs
Benchmark 2: update-ref: create many refs (refformat = reftable, preexisting = 100000, new = 10000, revision = HEAD)
Time (mean ± σ): 31.9 ms ± 1.1 ms [User: 27.0 ms, System: 4.5 ms]
Range (min … max): 29.8 ms … 34.3 ms 74 runs
Summary
update-ref: create many refs (refformat = reftable, preexisting = 100000, new = 10000, revision = HEAD) ran
1.25 ± 0.05 times faster than update-ref: create many refs (refformat = reftable, preexisting = 100000, new = 10000, revision = HEAD~)
The "files" backend doesn't really show a huge impact:
Benchmark 1: update-ref: create many refs (refformat = files, preexisting = 100000, new = 10000, revision = HEAD~)
Time (mean ± σ): 392.3 ms ± 7.1 ms [User: 59.7 ms, System: 328.8 ms]
Range (min … max): 384.6 ms … 404.5 ms 10 runs
Benchmark 2: update-ref: create many refs (refformat = files, preexisting = 100000, new = 10000, revision = HEAD)
Time (mean ± σ): 387.7 ms ± 7.4 ms [User: 54.6 ms, System: 329.6 ms]
Range (min … max): 377.0 ms … 397.7 ms 10 runs
Summary
update-ref: create many refs (refformat = files, preexisting = 100000, new = 10000, revision = HEAD) ran
1.01 ± 0.03 times faster than update-ref: create many refs (refformat = files, preexisting = 100000, new = 10000, revision = HEAD~)
This is mostly because it is way slower to begin with because it has to
create a separate file for each new reference, so the milliseconds we
shave off by reseeking the iterator doesn't really translate into a
significant relative improvement.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Implement seeking for "files" iterators. As we simply use a ref-cache
iterator under the hood the implementation is straight-forward. Note
that we do not implement seeking on reflog iterators, same as with the
"reftable" backend.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Implement seeking of `packed-ref` iterators. The implementation is again
straight forward, except that we cannot continue to use the prefix
iterator as we would otherwise not be able to reseek the iterator
anymore in case one first asks for an empty and then for a non-empty
prefix. Instead, we open-code the logic to in `advance()`.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Implement seeking of ref-cache iterators. This is done by splitting most
of the logic to seek iterators out of `cache_ref_iterator_begin()` and
putting it into `cache_ref_iterator_seek()` so that we can reuse the
logic.
Note that we cannot use the optimization anymore where we return an
empty ref iterator when there aren't any references, as otherwise it
wouldn't be possible to reseek the iterator to a different prefix that
may exist. This shouldn't be much of a performance concern though as we
now start to bail out early in case `advance()` sees that there are no
more directories to be searched.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Implement seeking of reftable iterators. As the low-level reftable
iterators already support seeking this change is straight-forward. Two
notes though:
- We do not support seeking on reflog iterators. It is unclear what
seeking would even look like in this context, as you typically would
want to seek to a specific entry in the reflog for a specific ref.
There is currently no use case for this, but if one arises in the
future, we can still implement seeking at that later point.
- We start to check whether `reftable_stack_init_ref_iterator()` is
successful.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Implement seeking on merged iterators. The implementation is rather
straight forward, with the only exception that we must not deallocate
the underlying iterators once they have been exhausted.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Reftable iterators need to be scrapped after they have either been
exhausted or aren't useful to the caller anymore, and it is explicitly
not possible to reuse them for iterations. But enabling for reuse of
iterators may allow us to tune them by reusing internal state of an
iterator. The reftable iterators for example can already be reused
internally, but we're not able to expose this to any users outside of
the reftable backend.
Introduce a new `.seek` function in the ref iterator vtable that allows
callers to seek an iterator multiple times. It is expected to be
functionally the same as calling `refs_ref_iterator_begin()` with a
different (or the same) prefix.
Note that it is not possible to adjust parameters other than the seeked
prefix for now, so exclude patterns, trimmed prefixes and flags will
remain unchanged. We do not have a usecase for changing these parameters
right now, but if we ever find one we can adapt accordingly.
Implement the callback for trivial cases. The other iterators will be
implemented in subsequent commits.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The ref and reflog iterators have their lifecycle attached to iteration:
once the iterator reaches its end, it is automatically released and the
caller doesn't have to care about that anymore. When the iterator should
be released before it has been exhausted, callers must explicitly abort
the iterator via `ref_iterator_abort()`.
This lifecycle is somewhat unusual in the Git codebase and creates two
problems:
- Callsites need to be very careful about when exactly they call
`ref_iterator_abort()`, as calling the function is only valid when
the iterator itself still is. This leads to somewhat awkward calling
patterns in some situations.
- It is impossible to reuse iterators and re-seek them to a different
prefix. This feature isn't supported by any iterator implementation
except for the reftable iterators anyway, but if it was implemented
it would allow us to optimize cases where we need to search for
specific references repeatedly by reusing internal state.
Detangle the lifecycle from iteration so that we don't deallocate the
iterator anymore once it is exhausted. Instead, callers are now expected
to always call a newly introduce `ref_iterator_free()` function that
deallocates the iterator and its internal state.
Note that the `dir_iterator` is somewhat special because it does not
implement the `ref_iterator` interface, but is only used to implement
other iterators. Consequently, we have to provide `dir_iterator_free()`
instead of `dir_iterator_release()` as the allocated structure itself is
managed by the `dir_iterator` interfaces, as well, and not freed by
`ref_iterator_free()` like in all the other cases.
While at it, drop the return value of `ref_iterator_abort()`, which
wasn't really required by any of the iterator implementations anyway.
Furthermore, stop calling `base_ref_iterator_free()` in any of the
backends, but instead call it in `ref_iterator_free()`.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
One of the checks done by `refs_verify_refnames_available()` is whether
any of the prefixes of a reference already exists. For example, given a
reference "refs/heads/main", we'd check whether "refs/heads" or "refs"
already exist, and if so we'd abort the transaction.
When updating multiple references at once, this check is performed for
each of the references individually. Consequently, because references
tend to have common prefixes like "refs/heads/" or refs/tags/", we
evaluate the availability of these prefixes repeatedly. Naturally this
is a waste of compute, as the availability of those prefixes should in
general not change in the middle of a transaction. And if it would,
backends would notice at a later point in time.
Optimize this pattern by storing prefixes in a `strset` so that we can
trivially track those prefixes that we have already checked. This leads
to a significant speedup with the "reftable" backend when creating many
references that all share a common prefix:
Benchmark 1: update-ref: create many refs (refformat = reftable, preexisting = 100000, new = 10000, revision = HEAD~)
Time (mean ± σ): 63.1 ms ± 1.8 ms [User: 41.0 ms, System: 21.6 ms]
Range (min … max): 60.6 ms … 69.5 ms 38 runs
Benchmark 2: update-ref: create many refs (refformat = reftable, preexisting = 100000, new = 10000, revision = HEAD)
Time (mean ± σ): 40.0 ms ± 1.3 ms [User: 29.3 ms, System: 10.3 ms]
Range (min … max): 38.1 ms … 47.3 ms 61 runs
Summary
update-ref: create many refs (refformat = reftable, preexisting = 100000, new = 10000, revision = HEAD) ran
1.58 ± 0.07 times faster than update-ref: create many refs (refformat = reftable, preexisting = 100000, new = 10000, revision = HEAD~)
For the "files" backend we see an improvement, but a much smaller one:
Benchmark 1: update-ref: create many refs (refformat = files, preexisting = 100000, new = 10000, revision = HEAD~)
Time (mean ± σ): 395.8 ms ± 5.3 ms [User: 63.6 ms, System: 330.5 ms]
Range (min … max): 387.0 ms … 404.6 ms 10 runs
Benchmark 2: update-ref: create many refs (refformat = files, preexisting = 100000, new = 10000, revision = HEAD)
Time (mean ± σ): 386.0 ms ± 4.0 ms [User: 51.5 ms, System: 332.8 ms]
Range (min … max): 380.8 ms … 392.6 ms 10 runs
Summary
update-ref: create many refs (refformat = files, preexisting = 100000, new = 10000, revision = HEAD) ran
1.03 ± 0.02 times faster than update-ref: create many refs (refformat = files, preexisting = 100000, new = 10000, revision = HEAD~)
This change also leads to a modest improvement when writing references
with "initial" semantics, for example when migrating references. The
following benchmarks are migrating 1m references from the "reftable" to
the "files" backend:
Benchmark 1: migrate reftable:files (refcount = 1000000, revision = HEAD~)
Time (mean ± σ): 836.6 ms ± 5.6 ms [User: 645.2 ms, System: 185.2 ms]
Range (min … max): 829.6 ms … 845.9 ms 10 runs
Benchmark 2: migrate reftable:files (refcount = 1000000, revision = HEAD)
Time (mean ± σ): 759.8 ms ± 5.1 ms [User: 574.9 ms, System: 178.9 ms]
Range (min … max): 753.1 ms … 768.8 ms 10 runs
Summary
migrate reftable:files (refcount = 1000000, revision = HEAD) ran
1.10 ± 0.01 times faster than migrate reftable:files (refcount = 1000000, revision = HEAD~)
And vice versa:
Benchmark 1: migrate files:reftable (refcount = 1000000, revision = HEAD~)
Time (mean ± σ): 870.7 ms ± 5.7 ms [User: 735.2 ms, System: 127.4 ms]
Range (min … max): 861.6 ms … 883.2 ms 10 runs
Benchmark 2: migrate files:reftable (refcount = 1000000, revision = HEAD)
Time (mean ± σ): 799.1 ms ± 8.5 ms [User: 661.1 ms, System: 130.2 ms]
Range (min … max): 787.5 ms … 812.6 ms 10 runs
Summary
migrate files:reftable (refcount = 1000000, revision = HEAD) ran
1.09 ± 0.01 times faster than migrate files:reftable (refcount = 1000000, revision = HEAD~)
The impact here is significantly smaller given that we don't perform any
reference reads with "initial" semantics, so the speedup only comes from
us doing less string list lookups.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The "files" backend explicitly carves out special logic for its initial
transaction so that it can avoid writing out every single reference as
a loose reference. While the assumption is that there shouldn't be any
preexisting references, we still have to verify that none of the newly
written references will conflict with any other new reference in the
same transaction.
Refactor the initial transaction to use batched refname availability
checks. This does not yet have an effect on performance as we still call
`refs_verify_refname_available()` in a loop. But this will change in
subsequent commits and then impact performance when cloning a repository
with many references or when migrating references to the "files" format.
This will improve performance when cloning a repository with many
references or when migrating references from any format to the "files"
format once the availability checks have learned to optimize checks for
many references in a subsequent commit.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Same as the "reftable" backend that we have adapted in the preceding
commit to use batched refname availability checks we can also do so for
the "files" backend. Things are a bit more intricate here though, as we
call `refs_verify_refname_available()` in a set of different contexts:
1. `lock_raw_ref()` when it hits either EEXISTS or EISDIR when creating
a new reference, mostly to create a nice, user-readable error
message. This is nothing we have to care about too much, as we only
hit this code path at most once when we hit a conflict.
2. `lock_raw_ref()` when it _could_ create the lockfile to check
whether it is conflicting with any packed refs. In the general case,
this code path will be hit once for every (successful) reference
update.
3. `lock_ref_oid_basic()`, but it is only executed when copying or
renaming references or when expiring reflogs. It will thus not be
called in contexts where we have many references queued up.
4. `refs_refname_ref_available()`, but again only when copying or
renaming references. It is thus not interesting due to the same
reason as the previous case.
5. `files_transaction_finish_initial()`, which is only executed when
creating a new repository or migrating references.
So out of these, only (2) and (5) are viable candidates to use the
batched checks.
Adapt `lock_raw_ref()` accordingly by queueing up reference names that
need to be checked for availability and then checking them after we have
processed all updates. This check is done before we (optionally) lock
the `packed-refs` file, which is somewhat flawed because it means that
the `packed-refs` could still change after the availability check and
thus create an undetected conflict. But unconditionally locking the file
would change semantics that users are likely to rely on, so we keep the
current locking sequence intact, even if it's suboptmial.
The refactoring of `files_transaction_finish_initial()` will be done in
the next commit.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Refactor the "reftable" backend to batch the availability check for
refnames. This does not yet have an effect on performance as
`refs_verify_refnames_available()` effectively still performs the
availability check for each refname individually. But this will be
optimized in subsequent commits, where we learn to optimize some parts
of the logic when checking multiple refnames for availability.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The `refs_verify_refname_available()` functions checks whether a
reference update can be committed or whether it would conflict with
either a prefix or suffix thereof. This function needs to be called once
per reference that one wants to check, which requires us to redo a
couple of checks every time the function is called.
Introduce a new function `refs_verify_refnames_available()` that does
the same, but for a list of references. For now, the new function uses
the exact same implementation, except that we loop through all refnames
provided by the caller. This will be tuned in subsequent commits.
The existing `refs_verify_refname_available()` function is reimplemented
on top of the new function. As such, the diff is best viewed with the
`--ignore-space-change option`.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Most of the commands in git-update-ref(1) accept an old and/or new
object ID to update a specific reference to. These object IDs get parsed
via `repo_get_oid()`, which not only handles plain object IDs, but also
those that have a suffix like "~" or "^2". More surprisingly though, it
even knows to resolve arbitrary revisions, despite the fact that its
manpage does not mention this fact even once.
One consequence of this is that we also check for ambiguous references:
when parsing a full object ID where the DWIM mechanism would also cause
us to resolve it as a branch, we'd end up printing a warning. While this
check makes sense to have in general, it is arguably less useful in the
context of git-update-ref(1). This is due to multiple reasons:
- The manpage is explicitly structured around object IDs. So if we see
a fully blown object ID, the intent should be quite clear in
general.
- The command is part of our plumbing layer and not a tool that users
would generally use in interactive workflows. As such, the warning
will likely not be visible to anybody in the first place.
- Users can and should use the fully-qualified refname in case there
is any potential for ambiguity. And given that this command is part
of our plumbing layer, one should always try to be as defensive as
possible and use fully-qualified refnames.
Furthermore, this check can be quite expensive when updating lots of
references via `--stdin`, because we try to read multiple references per
object ID that we parse according to the DWIM rules. This effect can be
seen both with the "files" and "reftable" backend.
The issue is not unique to git-update-ref(1), but was also an issue in
git-cat-file(1), where it was addressed by disabling the ambiguity check
in 25fba78d36b (cat-file: disable object/refname ambiguity check for
batch mode, 2013-07-12).
Disable the warning in git-update-ref(1), which provides a significant
speedup with both backends. The user-visible outcome is unchanged even
when ambiguity exists, except that we don't show the warning anymore.
The following benchmark creates 10000 new references with a 100000
preexisting refs with the "files" backend:
Benchmark 1: update-ref: create many refs (refformat = files, preexisting = 100000, new = 10000, revision = HEAD~)
Time (mean ± σ): 467.3 ms ± 5.1 ms [User: 100.0 ms, System: 365.1 ms]
Range (min … max): 461.9 ms … 479.3 ms 10 runs
Benchmark 2: update-ref: create many refs (refformat = files, preexisting = 100000, new = 10000, revision = HEAD)
Time (mean ± σ): 394.1 ms ± 5.8 ms [User: 63.3 ms, System: 327.6 ms]
Range (min … max): 384.9 ms … 405.7 ms 10 runs
Summary
update-ref: create many refs (refformat = files, preexisting = 100000, new = 10000, revision = HEAD) ran
1.19 ± 0.02 times faster than update-ref: create many refs (refformat = files, preexisting = 100000, new = 10000, revision = HEAD~)
And with the "reftable" backend:
Benchmark 1: update-ref: create many refs (refformat = reftable, preexisting = 100000, new = 10000, revision = HEAD~)
Time (mean ± σ): 146.9 ms ± 2.2 ms [User: 90.4 ms, System: 56.0 ms]
Range (min … max): 142.7 ms … 150.8 ms 19 runs
Benchmark 2: update-ref: create many refs (refformat = reftable, preexisting = 100000, new = 10000, revision = HEAD)
Time (mean ± σ): 63.2 ms ± 1.1 ms [User: 41.0 ms, System: 21.8 ms]
Range (min … max): 61.1 ms … 66.6 ms 41 runs
Summary
update-ref: create many refs (refformat = reftable, preexisting = 100000, new = 10000, revision = HEAD) ran
2.32 ± 0.05 times faster than update-ref: create many refs (refformat = reftable, preexisting = 100000, new = 10000, revision = HEAD~)
Note that the absolute improvement with both backends is roughly in the
same ballpark, but the relative improvement for the "reftable" backend
is more significant because writing the new table to disk is faster in
the first place.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When reading an object ID via `get_oid_basic()` or any of its related
functions we perform a check whether the object ID is ambiguous, which
can be the case when a reference with the same name exists. While the
check is generally helpful, there are cases where it only adds to the
runtime overhead without providing much of a benefit.
Add a new flag that allows us to disable the check. The flag will be
used in a subsequent commit.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Introduce a new function `repo_get_oid_with_flags()`. This function
behaves the same as `repo_get_oid()`, except that it takes an extra
`flags` parameter that it ends up passing to `get_oid_with_context()`.
This function will be used in a subsequent commit.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Our "documentation" CI job performs a couple of tests against our
documentation. Part of these tests is to check whether documentation
builds at all and whether it spits out the expected set of files. We
don't yet have such a test for Meson, which means that we wouldn't
notice at all if building the documentation were to break. As a result,
breakages as fixed by 87eccc3a81d (meson: fix building technical and
howto docs, 2025-03-02) are easy to go unnoticed.
Address this test gap by starting to build both manpages and HTML sites
as part of the CI job.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When breaking changes are enabled we continue to install documentation
of the git-pack-redundant(1) command even though it is completely
disabled and thus inaccessible. Improve this by only installing the
documentation in case breaking changes aren't enabled.
Based-on-patch-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We continue to compile the git-pack-redundant(1) builtin with Meson when
breaking changes are enabled even though we ultimately don't expose this
command at all. This is mostly harmless, but given that the intent of
the build option is to be as close as possible to the state where the
breaking change has been fully implemented this isn't optimal either.
Improve the situation by not compiling the builtin when breaking changes
are enabled.
Based-on-patch-by: Phillip Wood <phillip.wood123@gmail.com>
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
While Meson already supports the `-Dbreaking_changes=true` option, it
only wires up the build option that propagates into the tests. The build
option is only used for our tests to enable the `WITH_BREAKING_CHANGES`
prerequisite though, and does not influence the code that is actually
being built.
The omission went unnoticed because we only have tests right now that
get disabled when breaking changes are enabled, but not the other way
round. In other words, we don't have any tests that verify that breaking
changes behave as expected.
Fix the build issue by setting the `WITH_BREAKING_CHANGES` preprocessor
macro when breaking changes are enabled. Note that the `libgit_c_args`
array is defined after the current spot where we handle the option, so
to not have multiple sites where we handle it we instead move it after
the array has been defined.
Based-on-patch-by: Phillip Wood <phillip.wood123@gmail.com>
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
As part of Git 3.0, remove the hidden synonym for "--annotate-stdin"
for real. As this does not change the fact that it used to be
called "--stdin" in older version of Git, keep that passage in the
documentation for "--annotate-stdin".
Signed-off-by: Junio C Hamano <gitster@pobox.com>
There is absolutely no reason why a pattern given to grep to find
'warning: --stdin is deprecated' must be quoted within a pair of
single quotes, or the pattern to look for the literal string as ERE.
Quote the test body with a pair of single quotes like everybody
else, and quote the needle string in a pair of double quotes. Also
use test_grep instead of "grep -E".
Signed-off-by: Junio C Hamano <gitster@pobox.com>
A handful of tests invoke "git" on the upstream side of a pipe,
hiding its exit status. Correct them.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Earlier c5bc9a7f (Makefile: wire up build option for deprecated
features, 2025-01-22) made an unfortunate decision to introduce the
WITHOUT_BREAKING_CHANGES prerequisite to perform tests that ensure
the historical behaviour that may be different from what we will
have in the future. It would inevitably invite double-negation when
we need to add tests to ensure the behaviour we want to have in the
future.
Introduce WITH_BREAKING_CHANGES prerequisite and replace the
existing uses of WITHOUT_BREAKING_CHANGES prerequisite. To catch
any future topics that add more uses of WITHOUT_BREAKING_CHANGES,
mark it as a removed prerequisite.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The `struct repository;` forward declaration appears twice in `dir.h`:
once at line 10 and again at line 46. This duplication is unnecessary
and likely unintentional.
Removing the second declaration has no impact on compilation, as verified
by a clean build.
Signed-off-by: Abhijeetsingh Meena <abhijeet040403@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Allow test_lazy_prereq script to signal a programming error by
exiting with status 125 (like how bisect scripts do). This is used
to signal a deprecated-and-then-removed prerequisite that should
never be used in tests anymore.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The t/README file talked about test_set_prereq but lacked
explanation on test_lazy_prereq, which is a more modern way to
define prerequisites.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
fast-export has a --signed-tags= option that controls how to handle tag
signatures. However, there is no equivalent for commit signatures; it
just silently strips the signature out of the commit (analogously to
--signed-tags=strip).
While signatures are generally problematic for fast-export/fast-import
(because hashes are likely to change), if they're going to support tag
signatures, there's no reason to not also support commit signatures.
So, implement a --signed-commits= option that mirrors the --signed-tags=
option.
On the fast-export side, try to be as much like signed-tags as possible,
in both implementation and in user-interface. This will change the
default behavior to '--signed-commits=abort' from what is now
'--signed-commits=strip'. In order to provide an escape hatch for users
of third-party tools that call fast-export and do not yet know of the
--signed-commits= option, add an environment variable
'FAST_EXPORT_SIGNED_COMMITS_NOABORT=1' that changes the default to
'--signed-commits=warn-strip'.
Signed-off-by: Luke Shumaker <lukeshu@datawire.io>
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
fast-export's helper function find_encoding() takes a `const char *`, but
modifies that memory despite the `const`. Ultimately, this memory came
from get_commit_buffer(), and you're not supposed to modify the memory
that you get from get_commit_buffer().
So, get rid of find_encoding() in favor of commit.h:find_commit_header(),
which gives back a string length, rather than mutating the memory to
insert a '\0' terminator.
Because find_commit_header() detects the "\n\n" string that separates the
headers and the commit message, move the call to be above the
`message = strstr(..., "\n\n")` call. This helps readability, and allows
for the value of `encoding` to be used for a better value of "..." so that
the same memory doesn't need to be checked twice. Introduce a
`commit_buffer_cursor` variable to avoid writing an awkward
`encoding ? encoding + encoding_len : committer_end` expression.
Signed-off-by: Luke Shumaker <lukeshu@datawire.io>
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Luke Shumaker <lukeshu@datawire.io>
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The --signed-tags= option takes one of five arguments specifying how to
handle signed tags during export. Among these arguments, 'strip' is to
'warn-strip' as 'verbatim' is to 'warn' (the unmentioned argument is
'abort', which stops the fast-export process entirely). That is,
signatures are either stripped or copied verbatim while exporting, with
or without a warning.
Match the pattern and rename 'warn' to 'warn-verbatim' to make it clear
that it instructs fast-export to copy signatures verbatim.
To maintain backwards compatibility, 'warn' is still recognized as
deprecated synonym of 'warn-verbatim'.
Signed-off-by: Luke Shumaker <lukeshu@datawire.io>
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"Documentation/CodingGuidelines" says that there should be whitespaces
around operators like 'if', 'switch', 'for', etc.
Let's fix this in "builtin/fast-export.c".
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Luke Shumaker <lukeshu@datawire.io>
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
There are multiple instances where ints have been initialized with
values of unsigned ints, and where negative values don't mean anything.
When such ints are compared with unsigned ints, it causes sign comparison
warnings.
Also, some of these are used just as stand-ins for their initial
values, never being modified, thus obscuring the specific conditions
under which certain operations happen.
Replace int with unsigned int for 2 variables, and replace the
intermediate variables with their initial values for 2 other variables.
Signed-off-by: Arnav Bhate <bhatearnav@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The `null_oid()` function returns the object ID that only consists of
zeroes. Naturally, this ID also depends on the hash algorithm used, as
the number of zeroes is different between SHA1 and SHA256. Consequently,
the function returns the hash-algorithm-specific null object ID.
This is currently done by depending on `the_hash_algo`, which implicitly
makes us depend on `the_repository`. Refactor the function to instead
pass in the hash algorithm for which we want to retrieve the null object
ID. Adapt callsites accordingly by passing in `the_repository`, thus
bubbling up the dependency on that global variable by one layer.
There are a couple of trivial exceptions for subsystems that already got
rid of `the_repository`. These subsystems instead use the repository
that is available via the calling context:
- "builtin/grep.c"
- "grep.c"
- "refs/debug.c"
There are also two non-trivial exceptions:
- "diff-no-index.c": Here we know that we may not have a repository
initialized at all, so we cannot rely on `the_repository`. Instead,
we adapt `diff_no_index()` to get a `struct git_hash_algo` as
parameter. The only caller is located in "builtin/diff.c", where we
know to call `repo_set_hash_algo()` in case we're running outside of
a Git repository. Consequently, it is fine to continue passing
`the_repository->hash_algo` even in this case.
- "builtin/ls-files.c": There is an in-flight patch series that drops
`USE_THE_REPOSITORY_VARIABLE` in this file, which causes a semantic
conflict because we use `null_oid()` in `show_submodule()`. The
value is passed to `repo_submodule_init()`, which may use the object
ID to resolve a tree-ish in the superproject from which we want to
read the submodule config. As such, the object ID should refer to an
object in the superproject, and consequently we need to use its hash
algorithm.
This means that we could in theory just not bother about this edge
case at all and just use `the_repository` in "diff-no-index.c". But
doing so would feel misdesigned.
Remove the `USE_THE_REPOSITORY_VARIABLE` preprocessor define in
"hash.c".
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
There are a couple of trivial "-Wsign-compare" warnings in "hash.c". Fix
them.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
While we have a "hash.h" header, the actual implementation of the
subsystem is hosted by "object-file.c". This makes it harder than
necessary to find the actual implementation of the hash subsystem and
intermingles the different concerns with one another.
Split out the implementation of hash algorithms into a new, separate
"hash.c" file.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
There are multiple sites in "delta-islands.c" where we use the
global `the_repository` variable, either explicitly or implicitly by
using `the_hash_algo`.
Refactor the code to stop using `the_repository`. In most cases this is
trivial because we already had a repository available in the calling
context, with the only exception being `propagate_island_marks()`. Adapt
it so that the repository gets passed in via a parameter.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
There are multiple sites in "object-file-convert.c" where we use the
global `the_repository` variable, either explicitly or implicitly by
using `the_hash_algo`. All of these callsites are transitively called
from `convert_object_file()`, which indeed has no repo as input.
Refactor the function so that it receives a repository as a parameter
and pass it through to all internal functions to get rid of the
dependency. Remove the `USE_THE_REPOSITORY_VARIABLE` define.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
There are multiple sites in "pack-bitmap-write.c" where we use the
global `the_repository` variable, either explicitly or implicitly by
using `the_hash_algo`.
Refactor the code so that the `struct bitmap_writer` stores the
repository it is getting initialized with. Like this, we can adapt
callsites that use `the_repository` to instead use the repository
provided by the writer.
Remove the `USE_THE_REPOSITORY_VARIABLE` define.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
There are multiple sites in "pack-revindex.c" where we use the global
`the_repository` variable, either explicitly or implicitly by using
`the_hash_algo`. In all of those cases we already have a repository
available in the calling context though.
Refactor the code to instead use the caller-provided repository and
remove the `USE_THE_REPOSITORY_VARIABLE` define.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
There are multiple sites in "pack-check.c" where we use the global
`the_repository` variable, either explicitly or implicitly by using
`the_hash_algo`. In all of those cases we already have a repository
available in the calling context though.
Refactor the code to instead use the caller-provided repository and
remove the `USE_THE_REPOSITORY_VARIABLE` define.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The "core.bigFileThreshold" setting is stored in a global variable and
populated via `git_default_core_config()`. This may cause issues in
the case where one is handling multiple different repositories in a
single process with different values for that config key, as we may or
may not see the correct value in that case. Furthermore, global state
blocks our path towards libification.
Refactor the code so that we instead store the value in `struct
repo_settings`, where the value is computed as-needed and cached.
Note that this change requires us to adapt one test in t1050 that
verifies that we die when parsing an invalid "core.bigFileThreshold"
value. The exercised Git command doesn't use the value at all, and thus
it won't hit the new code path that parses the value. This is addressed
by using git-hash-object(1) instead, which does read the value.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
There are a couple of functions in "pack-write.c" that implicitly depend
on `the_repository` or `the_hash_algo`. Remove this dependency by
injecting the repository via a parameter and adapt callers accordingly.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
There are a couple of functions exposed by "object.c" that implicitly
depend on `the_repository`. Remove this dependency by injecting the
repository via a parameter. Adapt callers accordingly by simply using
`the_repository`, except in cases where the subsystem is already free of
the repository. In that case, we instead pass the repository provided by
the caller's context.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
There are multiple sites in "csum-file.c" where we use the global
`the_repository` variable, either explicitly or implicitly by using
`the_hash_algo`.
Refactor the code to stop using `the_repository` by adapting functions
to receive required data as parameters. Adapt callsites accordingly by
either using `the_repository->hash_algo`, or by using a context-provided
hash algorithm in case the subsystem already got rid of its dependency
on `the_repository`.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In git-fetch we have an optimization to avoid issuing an ls-refs command
to the server if we don't care about the value of any refs (e.g.,
because we are fetching exact object ids), saving a round-trip to the
server. This comes from e70a3030e7 (fetch: do not list refs if fetching
only hashes, 2018-09-27).
It uses an explicit flag "must_list_refs" to decide when we need to do
so. That was needed back then, because the list of ref-prefixes was not
always complete. If it was empty, it did not necessarily mean that we
were not interested in any refs). But that is no longer the case; an
empty list of prefixes means that we truly do not care about any refs.
And so rather than an explicit flag, we can just check whether we are
interested in any ref prefixes. This simplifies the code slightly, as
there is now a single source of truth for the decision.
It also fixes a bug in / optimizes a very unlikely case, which is:
git fetch $remote ^foo $oid
I.e., a negative refspec combined with an exact oid fetch. This is
somewhat nonsense, in that there are no positive refspecs mentioning
refs to countermand with the negative one. But we should be able to do
this without issuing an ls-refs command (excluding "foo" from the empty
set will obviously still be the empty set).
However, the current code does not do so. The negative refspec is not
counted as a noop in un-setting the must_list_refs flag (hardly the
fault of e70a3030e7, as negative refspecs did not appear until much
later). But by using the prefix list as a source of truth, this
naturally just works; the negative refspec does not add a prefix to ask
about, and hence does not trigger the ls-refs call.
This is esoteric enough that I didn't bother adding a test. The real
value here is in the code simplification.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When we fetch from a configured remote, we may try to update the local
refs/remotes/<origin>/HEAD, and so we ask the server to advertise its
HEAD to us.
But if we aren't otherwise asking about any refs at all, then we know
this HEAD update can never happen! To consider a new value for HEAD,
the set_head() function uses guess_remote_head(). And even if it sees an
explicit symref value for HEAD, it will only report that as a match if
we also saw that remote ref advertised, and it mapped to a local
tracking ref via get_fetch_map().
In other words, a fetch like this:
git fetch origin $exact_oid:refs/heads/foo
can never update HEAD, because we will never have fetched (nor even see
the advertisement for) the ref that HEAD points to.
Currently the command above will still call ls-refs to ask about the
HEAD, even though it is pointless. This patch teaches it to skip the
ls-refs call entirely in this case, which avoids a round-trip to the
server.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When using the ref-prefix feature of protocol v2, a client which sends
no prefixes at all will get the full advertisement. And so the code in
git-fetch was historically loose about setting up that list based on our
refspecs. There were cases where we needed to know about some refs, so
we just didn't add anything to the ref-prefix list.
And hence further code, like that for tag-following and updating
origin/HEAD, had to be careful about adding to an empty list. E.g., see
the bug fixed by bd52d9a058 (fetch: fix following tags when fetching
specific OID, 2025-03-07).
But the previous commit removed the last such case, and now we know an
empty ref-prefix list (at least inside git-fetch's do_fetch() function)
means that we really don't need to see any refs. So we can drop those
extra conditionals.
This simplifies the code a little. But it also means that some cases can
now use ref prefixes when they would not otherwise. As the test shows,
fetching an exact oid into a local ref can now avoid enumerating all of
the refs. The refspec itself doesn't need to know about any remote refs,
and the tag auto-following can just ask about refs/tags/.
The same is true for asking about HEAD to update the local origin/HEAD.
I didn't add a test for that yet, though, as we can optimize it even
further.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
If we're not given any refspecs (either on the command line or via
config) and we have no branch merge config, then we fetch the remote
HEAD into our local FETCH_HEAD. In that case we do not send any
ref-prefix option to the server at all, and we see the full
advertisement.
But this is sub-optimal. We only care about HEAD, so we can just ask
for that, and ignore all of the other refs.
The new test demonstrates a case where we see fewer refs (in this case
only one less, but in theory we could be ignoring millions of them).
This also removes the only case where we care about seeing some refs
from the other side, but don't add anything to the ref_prefixes list.
Cleaning this up means one less maintenance burden. Before this patch,
any code which wanted to add to the list had to make sure the list was
not empty, since an empty list meant "ask for everything". Now it really
means "we are not interested in any refs".
This should let us optimize a few more cases in subsequent patches.
Note that we'll add "HEAD" to the list of prefixes, and later code for
updating "refs/remotes/<remote>/HEAD" may likewise do so. In theory this
could cause duplicates in the list, but in practice these can't both
trigger. We hit our new case only if there are no refspecs, and the
"<remote>/HEAD" feature is enabled only when we are fetching from a
remote with configured refspecs. We could be defensive with a flag, but
it didn't seem worth it to me (the absolute worse case is a useless
redundant ref-prefix line sent to the server).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The point of refspec_ref_prefixes() is to look over the set of refspecs
and set up an appropriate list of "ref-prefix" strings to send to the
server.
The logic for handling individual refspec_items has some confusing bits.
The final part of our if/else cascade checks this:
else if (item->src && !item->exact_sha1)
prefix = item->src;
But we know that "item->exact_sha1" can never be true, because earlier
we did:
if (item->exact_sha1 || item->negative)
continue;
This is due to 6c301adb0a (fetch: do not pass ref-prefixes for fetch by
exact SHA1, 2018-05-31), which added the continue. So it is tempting to
remove the extra exact_sha1 at the end of the cascade, leaving the one
at the top of the loop.
But I don't think that's quite right. The full cascade is:
if (rs->fetch == REFSPEC_FETCH)
prefix = item->src;
else if (item->dst)
prefix = item->dst;
else if (item->src && !item->exact_sha1)
prefix = item->src;
which all comes from 6373cb598e (refspec: consolidate ref-prefix
generation logic, 2018-05-16). That first "if" is supposed to handle
fetches, where we care about the source name, since that is coming from
the server. And the rest should be for pushes, where we care about the
destination, since that's the name the server will use. And we get that
either explicitly from "dst" (for something like "foo:bar") or
implicitly from the source (a refspec like "foo" is treated as
"foo:foo").
But how should exact_sha1 interact with those? For a fetch, exact_sha1
always means we do not care about sending a name to the server (there is
no server refname at all). But pushing an exact sha1 should still care
about the destination on the server! It is only if we have to fall back
to the implicit source that we need to care if it is a real ref (though
arguably such a push does not even make sense; where would the server
store it?).
So I think that 6c301adb0a "broke" the push case by always skipping
exact_sha1 items, even though a push should only care about the
destination.
Of course this is all completely academic. We have still not implemented
a v2 push protocol, so even though we do call this function for pushes,
we'd never actually send these ref-prefix lines.
However, given the effort I spent to figure out what was going on here,
and the overlapping exact_sha1 checks, I'd like to rewrite this to
preemptively fix the bug, and hopefully make it less confusing.
This splits the "if" at the top-level into fetch vs push, and then each
handles exact_sha1 appropriately itself. The check for negative refspecs
remains outside of either (there is no protocol support for them, so we
never send them to the server, but rather use them only to reduce the
advertisement we receive).
The resulting behavior should be identical for fetches, but hopefully
sets us up better for a potential future v2 push.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Commit 6c301adb0a (fetch: do not pass ref-prefixes for fetch by exact
SHA1, 2018-05-31) added a test that fetching an exact oid with the v2
protocol works. Originally it failed without the code change from that
commit, because fetch failed with "no matching remote head".
That changed in 0177565148 (transport: do not list refs if possible,
2018-09-27), which made fetch more forgiving of this case.
But that now meant the test passes even without its fix! So let's also
have it check the packet listing to make sure we did not ask for the
bogus prefix (ultimately this is less important than whether the command
fails, since it's just an optimization, but we should make sure not to
regress it).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When this test was added in 6c301adb0a (fetch: do not pass ref-prefixes
for fetch by exact SHA1, 2018-05-31), there was still some uncertainty
about the v2 protocol's looser behavior with serving objects that are
not directly pointed at by a ref.
At this point that behavior is well established, and I do not think we
would ever change v2 to match the v0 behavior (and if we did,
remembering to update this test is the least of our concerns).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
These old tests refer to object ids as "sha1". These days we prefer
the more algorithm-agnostic "oid".
There are a few more tests that mention sha1 in the title and also use
it in variables throughout the test. I've left them for now, as changing
them is more involved (and they're linked to the allowTipSHA1InWant
config, which as a v0-only thing actually is always sha1).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The documentation is using the historical mode for titles, which is a
setext-style (i.e., two-line) section title.
The issue with this mode is that starting block delimiters (e.g.,
`----`) can be confused with a section title when they are exactly the
same length as the preceding line. In the original documentation, this
is taken care of for English by the writer, but it is not the case for
translations where these delimiters are hidden. A translator can
generate a line that is exactly the same length as the following block
delimiter, which leads to this line being considered as a title.
To safeguard against this issue, add a blank line before and after
block delimiters where block is at root level, else add a "+" line
before block delimiters to link it to the preceding paragraph.
Signed-off-by: Jean-Noël Avila <jn.avila@free.fr>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Commit bc26f7690a (clone: make it possible to specify --tags,
2025-02-06) added a new paragraph in the middle of this list item. By
adding an empty line rather than using a list continuation, we broke the
list continuation, with the new paragraph ending up funnily indented.
Restore the chain of list continuations.
Signed-off-by: Martin Ågren <martin.agren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
- Update for 2.49.0.
- Fix numerous typos found by spelling checker.
- Fix more straight quotes.
- Harmonize translation of "blob" (to "blob", not "blobb").
- Harmonize translation of "reflog" (to "referenslogg").
Signed-off-by: Peter Krefting <peter@softwolves.pp.se>
Remove the_repository global variable in favor of the repository
argument that gets passed in "builtin/checkout-index.c".
When `-h` is passed to the command outside a Git repository, the
`run_builtin()` will call the `cmd_checkout_index()` function with `repo`
set to NULL and then early in the function, `show_usage_with_options_if_asked()`
call will give the options help and exit.
Pass an instance of "struct index_state" available in the calling
context to both `checkout_all()` and `checkout_file()` to remove their
dependency on the global `the_repository` variable.
Mentored-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Usman Akinyemi <usmanakinyemi202@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Remove the_repository global variable in favor of the repository
argument that gets passed in "builtin/for-each-ref.c".
When `-h` is passed to the command outside a Git repository, the
`run_builtin()` will call the `cmd_for_each_ref()` function with `repo`
set to NULL and then early in the function, `parse_options()` call will
give the options help and exit.
Mentored-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Usman Akinyemi <usmanakinyemi202@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Remove the_repository global variable in favor of the repository
argument that gets passed in "builtin/ls-files.c".
When `-h` is passed to the command outside a Git repository, the
`run_builtin()` will call the `cmd_ls_files()` function with `repo` set
to NULL and then early in the function, `show_usage_with_options_if_asked()`
call will give the options help and exit.
Pass the repository available in the calling context to both
`expand_objectsize()` and `show_ru_info()` to remove their
dependency on the global `the_repository` variable.
Mentored-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Usman Akinyemi <usmanakinyemi202@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Remove the_repository global variable in favor of the repository
argument that gets passed in "builtin/pack-refs.c".
When `-h` is passed to the command outside a Git repository, the
`run_builtin()` will call the `cmd_pack_refs()` function with `repo` set
to NULL and then early in the function, `parse_options()` call will give
the options help and exit.
Mentored-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Usman Akinyemi <usmanakinyemi202@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Remove the_repository global variable in favor of the repository
argument that gets passed in "builtin/send-pack.c".
When `-h` is passed to the command outside a Git repository, the
`run_builtin()` will call the `cmd_send_pack()` function with `repo` set
to NULL and then early in the function, `parse_options()` call will give
the options help and exit.
Mentored-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Usman Akinyemi <usmanakinyemi202@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Remove the_repository global variable in favor of the repository
argument that gets passed in "builtin/verify-commit.c".
When `-h` is passed to the command outside a Git repository, the
`run_builtin()` will call the `cmd_verify_commit()` function with `repo`
set to NULL and then early in the function, `parse_options()` call will
give the options help and exit.
Pass the repository available in the calling context to `verify_commit()`
to remove it's dependency on the global `the_repository` variable.
Mentored-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Usman Akinyemi <usmanakinyemi202@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Remove the_repository global variable in favor of the repository
argument that gets passed in "builtin/verify-tag.c".
When `-h` is passed to the command outside a Git repository, the
`run_builtin()` will call the `cmd_verify_tag()` function with `repo` set
to NULL and then early in the function, `parse_options()` call will give
the options help and exit.
Mentored-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Usman Akinyemi <usmanakinyemi202@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The `repo` value can be NULL if a builtin command is run outside
any repository. The current implementation of `repo_config()` will
fail if `repo` is NULL.
If the `repo` is NULL the `repo_config()` can ignore the repository
configuration but it should read the other configuration sources like
the system-side configuration instead of failing.
Teach the `repo_config()` to allow `repo` to be NULL by calling the
`read_very_early_config()` which read config but only enumerate system
and global settings.
This will be useful in the following commits.
Suggested-by: Junio C Hamano <gitster@pobox.com>
Mentored-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Usman Akinyemi <usmanakinyemi202@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In 3f763ddf28 (fetch: set remote/HEAD if it does not exist, 2024-11-22),
unconditionally adds "HEAD" to the list of ref prefixes we send to the
server.
This breaks a core assumption that the list of prefixes we send to the
server is complete. We must either send all prefixes we care about, or
none at all (in the latter case the server then advertises everything).
The tag following code is careful to only add "refs/tags/" to the list
of prefixes if there are already entries in the prefix list. But because
the new code from 3f763ddf28 runs after the tag code, and because it
unconditionally adds to the prefix list, we may end up with a prefix
list that _should_ have "refs/tags/" in it, but doesn't.
When that is the case, the server does not advertise any tags, and our
auto-following breaks because we never learned about any tags in the
first place.
Fix this by only adding "HEAD" to the ref prefixes when we know that we
are already limiting the advertisement. In either case we'll learn about
HEAD (either through the limited advertisement, or implicitly through a
full advertisement).
Reported-by: Igor Todorovski <itodorov@ca.ibm.com>
Co-authored-by: Jeff King <peff@peff.net>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When building against zlib-ng, the header file `zlib.h` is not included,
but `zlib-ng.h` is included instead. It's `zlib.h` that defines
`ZLIB_VERSION` and that macro is used to print out zlib version in
`git-version(1)` with `--build-options`. But when it's not defined, no
version is printed.
`zlib-ng.h` defines another macro: `ZLIBNG_VERSION`. Use that macro to
print the zlib-ng version in `git version --build-options` when it's
set. Otherwise fallback to `ZLIB_VERSION`.
Signed-off-by: Toon Claes <toon@iotcl.com>
Helped-by: Patrick Steinhardt <ps@pks.im>
Reviewed-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In 41f1a8435a (git-compat-util: move include of "compat/zlib.h" into
"git-zlib.h", 2025-01-28) some code was refactored to enable easier
linking against zlib-ng.
This removed `zlib.h` being indirectly included in `help.c`. As this
file uses `ZLIB_VERSION` to print the version number of zlib when
running git-version(1) with `--build-options`, this resulted in a
regression.
Include `git-zlib.h` directly into `help.c` to print zlib version
information. This brings back the zlib version in the output of
`git version --build-options`.
Signed-off-by: Toon Claes <toon@iotcl.com>
Reviewed-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Hotfix to help building Git-for-Windows.
* js/win-2.49-build-fixes:
cmake: generalize the handling of the `CLAR_TEST_OBJS` list
meson: fix sorting
ident: stop assuming that `gw_gecos` is writable
Some future breaking changes would remove certain parts of the
default repository, which were still described even when the
documents were built for the future with WITH_BREAKING_CHANGES.
* pw/repo-layout-doc-update:
docs: fix repository-layout when building with breaking changes
merge-ort has a number of sanity checks on the file it is processing in
process_renames(). One of these sanity checks was slightly overzealous
because it indirectly assumed that a renamed file always ended up at a
different path than where it started. That is normally an entirely fair
assumption, but directory rename detection can make things interesting.
As a quick refresher, if one side of history renames directory A/ -> B/,
and the other side of history adds new files to A/, then directory
rename detection notices and suggests moving those new files to B/. A
similar thing is done for paths renamed into A/, causing them to be
transitively renamed into B/. But, if the file originally came from B/,
then this can end up causing a file to be renamed back to itself.
It turns out the rest of the code following this assertion handled the
case fine; the assertion was just an extra sanity check, not a rigid
precondition. Therefore, simply adjust the assertion to pass under this
special case as well.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
If one side of history renames a directory A/ -> B/, and the other side
of history adds new files to A/, then directory rename detection notices
and moves or suggests moving those new files to B/. A similar thing is
done for paths renamed into A/, causing them to be transitively renamed
into B/. But, if the file originally came from B/, then this can end up
causing a file to be renamed back to itself. merge-ort crashes under
this special case, due to a slightly overzealous assertion:
git: merge-ort.c:3051: process_renames: Assertion `source_deleted || oldinfo->filemask & old_sidemask' failed.
Aborted (core dumped)
Add a testcase demonstrating this.
Signed-off-by: Dmitry Goncharov <dgoncharov@users.sf.net>
[en: Instead of adding a new testsuite, place it near similar tests in
t6423, adjusting to match the style of those tests. Tweak the commit
message to not repeat the entire testcase, but just describe the bug.
Also update the line number in the error message.]
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In the packed-refs backend, our implementation of '--exclude' (dating
back to 59c35fac54 (refs/packed-backend.c: implement jump lists to avoid
excluded pattern(s), 2023-07-10)) considers, for example:
$ git for-each-ref --exclude=refs/heads/ba
to exclude "refs/heads/bar", "refs/heads/baz", and so on.
The files backend, which does not implement '--exclude' (and relies on
the caller to cull out results that don't match) naturally will
enumerate "refs/heads/bar" and so on.
So in the above example, 'for-each-ref' will try and see if
"refs/heads/ba" matches "refs/heads/bar" (since the files backend simply
enumerated every loose reference), and, realizing that it does not
match, output the reference as expected. (A caller that did want to
exclude "refs/heads/bar" and "refs/heads/baz" might instead run "git
for-each-ref --exclude='refs/heads/ba*'").
This can lead to strange behavior, like seeing a different set of
references advertised via 'upload-pack' depending on what set of
references were loose versus packed.
So there is a subtle bug with '--exclude' which is that in the
packed-refs backend we will consider "refs/heads/bar" to be a pattern
match against "refs/heads/ba" when we shouldn't. Likewise, the reftable
backend (which in this case is bug-compatible with the packed backend)
exhibits the same broken behavior.
There are a few ways to fix this. One is to tighten the rules in
cmp_record_to_refname(), which is used to determine the start/end-points
of the jump list used by the packed backend. In this new "strict" mode,
the comparison function would handle the case where we've reached the
end of the pattern by introducing a new check like so:
while (1) {
if (*r1 == '\n')
return *r2 ? -1 : 0;
if (!*r2)
if (strict && *r1 != '/') /* <- here */
return 1;
return start ? 1 : -1;
if (*r1 != *r2)
return (unsigned char)*r1 < (unsigned char)*r2 ? -1 : +1;
r1++;
r2++;
}
(eliding out the rest of cmp_record_to_refname()). Equivalently, we
could teach refs/packed-backend::populate_excluded_jump_list() to append
a trailing '/' if one does not already exist, forcing an exclude pattern
like "refs/heads/ba" to only match "refs/heads/ba/abc" and so forth.
But since the same problem exists in reftable, we can fix both at once
by performing this pre-processing step one layer up in refs.c at the
common entrypoint for the two, which is 'refs_ref_iterator_begin()'.
Since that solution is both the simplest and only requires modification
in one spot, let's normalize exclude patterns so that they end with a
trailing slash. This causes us to unify the behavior between all three
backends.
There is some minor test fallout in the "overlapping excluded regions"
test, which happens to use 'refs/ba' as an exclude pattern, and expects
references under the "refs/heads/bar/*" and "refs/heads/baz/*"
hierarchies to be excluded from the results.
But that test fallout is expected, because the test was codifying the
buggy behavior to begin with, and should have never been written that
way. Split that into its own test (since the range is no longer
overlapping under the stricter interpretation of --exclude patterns
presented here). Create a new test which does have overlapping
regions by using a refs/heads/bar/4/... hierarchy and excluding both
"refs/heads/bar" and "refs/heads/bar/4".
Reported-by: SURA <surak8806@gmail.com>
Helped-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In 59c35fac54 (refs/packed-backend.c: implement jump lists to avoid
excluded pattern(s), 2023-07-10), the packed-refs backend learned how to
construct "jump lists" to avoid enumerating sections of the packed-refs
file that we know the caller is going to throw out anyway.
This process works by finding the start- and end-points (that is, where
in the packed-refs file corresponds to the range we're going to ignore)
for each exclude pattern, then constructing a jump list based on that.
At enumeration time we'll consult the jump list to skip past everything
in the range(s) found in the previous step, saving time when excluding a
large portion of references.
But when there is a --exclude pattern which is just the empty string,
the behavior is a little funky. When we try and exclude the empty
string, the matched range covers the entire packed-refs file, meaning
that we won't output any packed references. But the empty pattern
doesn't actually match any references to begin with! For example, on my
copy of git.git I can do:
$ git for-each-ref '' | wc -l
0
So "git for-each-ref --exclude=''" shouldn't actually remove anything
from the output, and ought to be equivalent to "git for-each-ref". But
it's not, and in fact:
$ git for-each-ref | wc -l
2229
$ git for-each-ref --exclude='' | wc -l
480
But why does the '--exclude' version output only some of the references
in the repository? Here's a hint:
$ find .git/refs -type f | wc -l
480
Indeed, because the files backend doesn't implement[^1] the same jump
list concept as the packed backend we get the correct result for the
loose references, but none of the packed references.
Since the empty string exclude pattern doesn't match anything, we can
discard them before the packed-refs backend has a chance to even see it
(and likewise for reftable, which also implements a similar concept
since 1869525066 (refs/reftable: wire up support for exclude patterns,
2024-09-16)).
This approach (copying only some of the patterns into a strvec at the
refs.c layer) may seem heavy-handed, but it's setting us up to fix
another bug in the following commit where the fix will involve modifying
the incoming patterns.
[^1]: As noted in 59c35fac54. We technically could avoid opening and
enumerating the contents of, for e.g., "$GIT_DIR/refs/heads/foo/" if
we knew that we were excluding anything under the 'refs/heads/foo'
hierarchy. But the --exclude stuff is all best-effort anyway, since
the caller is expected to cull out any results that they don't want.
Noticed-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
A late-comer to the v2.49.0 party, `sk/unit-test-oid`, added yet another
array item to `CLAR_TEST_OBJS`, causing the `win+VS build` job to fail
with symptoms like this one:
unit-tests-lib.lib(u-oid-array.obj) : error LNK2019: unresolved
external symbol cl_parse_any_oid referenced in function fill_array
This is a similar scenario to the one that forced me to write
8afda42fce60 (cmake: generalize the handling of the `UNIT_TEST_OBJS`
list, 2024-09-18): The hard-coded echo of `CLAR_TEST_OBJS` in
`CMakeLists.txt` that recapitulates faithfully what was already
hard-coded in `Makefile` would either have to be updated whack-a-mole
style, or generalized.
Just like I chose the latter option for `UNIT_TEST_OBJS`, I now do the
same for `CLAR_TEST_OBJS`.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In 904339edbd80 (Introduce support for the Meson build system,
2024-12-06) the `meson.build` file was introduced, adding also a
Windows-specific list of source files. This list was obviously meant to
be sorted alphabetically, but there is one mistake. Let's fix that.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In 590e081dea7c (ident: add NO_GECOS_IN_PWENT for systems without
pw_gecos in struct passwd, 2011-05-19), code was introduced to iterate
over the `gw_gecos` field; The loop variable is of type `char *`, which
assumes that `gw_gecos` is writable.
However, it is not necessarily writable (and it is a bad idea to have it
writable in the first place), so let's switch the loop variable type to
`const char *`.
This is not a new problem, but what is new is the Meson build. While it
does not trigger in CI builds, imitating the commands of
`ci/run-build-and-tests.sh` in a regular Git for Windows SDK (`meson
setup build . --fatal-meson-warnings --warnlevel 2 --werror --wrap-mode
nofallback -Dfuzzers=true` followed by `meson compile -C build --`
results in this beautiful error:
"cc" [...] -o libgit.a.p/ident.c.obj "-c" ../ident.c
../ident.c: In function 'copy_gecos':
../ident.c:68:18: error: assignment discards 'const' qualifier from pointer target type [-Werror=discarded-qualifiers]
68 | for (src = get_gecos(w); *src && *src != ','; src++) {
| ^
cc1.exe: all warnings being treated as errors
Now, why does this not trigger in CI? The answer is as simple as it is
puzzling: The `win+Meson` job completely side-steps Git for Windows'
development environment, opting instead to use the GCC that is on the
`PATH` in GitHub-hosted `windows-latest` runners. That GCC is pinned to
v12.2.0 and targets the UCRT (unlikely to change any time soon, see
https://github.com/actions/runner-images/blob/win25/20250303.1/images/windows/toolsets/toolset-2022.json#L132-L141).
That is in stark contrast to Git for Windows, which uses GCC v14.2.0 and
targets MSVCRT. Git for Windows' `Makefile`-based build also obviously
uses different compiler flags, otherwise this compile error would have
had plenty of opportunity in almost 14 years to surface.
In other words, contrary to my expectations, the `win+Meson` job is
ill-equipped to replace the `win build` job because it exercises a
completely different tool version/compiler flags vector than what Git
for Windows needs.
Nevertheless, there is currently this huge push, including breaking
changes after -rc1 and all, for switching to Meson. Therefore, we need
to make it work, somehow, even in Git for Windows' SDK, hence this
patch, at this point in time.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Meson-based build procedure forgot to build some docs, which has
been corrected.
* pw/build-meson-technical-and-howto-docs:
meson: fix building technical and howto docs
The editorconfig file is updated to tell us that bash scripts are
similar to general Bourne shell scripts.
* dm/editorconfig-bash-is-like-sh:
editorconfig: add .bash extension
Convert a few unit tests to the clar framework.
* sk/unit-test-oid:
t/unit-tests: convert oidtree test to use clar test framework
t/unit-tests: convert oidmap test to use clar test framework
t/unit-tests: convert oid-array test to use clar test framework
t/unit-tests: implement clar specific oid helper functions
The path.[ch] API takes an explicit repository parameter passed
throughout the callchain, instead of relying on the_repository
singleton instance.
* ps/path-sans-the-repository:
path: adjust last remaining users of `the_repository`
environment: move access to "core.sharedRepository" into repo settings
environment: move access to "core.hooksPath" into repo settings
repo-settings: introduce function to clear struct
path: drop `git_path()` in favor of `repo_git_path()`
rerere: let `rerere_path()` write paths into a caller-provided buffer
path: drop `git_common_path()` in favor of `repo_common_path()`
worktree: return allocated string from `get_worktree_git_dir()`
path: drop `git_path_buf()` in favor of `repo_git_path_replace()`
path: drop `git_pathdup()` in favor of `repo_git_path()`
path: drop unused `strbuf_git_path()` function
path: refactor `repo_submodule_path()` family of functions
submodule: refactor `submodule_to_gitdir()` to accept a repo
path: refactor `repo_worktree_path()` family of functions
path: refactor `repo_git_path()` family of functions
path: refactor `repo_common_path()` family of functions
Since commit 8ccc75c2452 (remote: announce removal of "branches/" and
"remotes/", 2025-01-22) enabling WITH_BREAKING_CHANGES when building git
removes support for reading branches from ".git/branches" and remotes
from ".git/remotes". However those locations are still documented in
gitrepository-layout.adoc even though the build does not support them.
Rectify this by adding a new document attribute "with-breaking-changes"
and use it to make the inclusion of those sections of the documentation
conditional. Note that the name of the attribute does not match the test
prerequisite WITHOUT_BREAKING_CHANGES added in c5bc9a7f94a (Makefile:
wire up build option for deprecated features, 2025-01-22). This is to
avoid the awkward double negative ifndef::without_breaking_changes for
documentation that should be included when WITH_BREAKING_CHANGES is
enabled. The test prerequisite will be renamed to match the
documentation attribute in a future patch series.
Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Verify that if the path exists then it is a file using test_path_is_file().
Signed-off-by: Mahendra Dani <danimahendra0904@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Adapt urlmatch-normalization test file to use clar testing framework by
using clar assertions where necessary.
Mentored-by: Patrick Steinhardt <ps@pks.im>
Mentored-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Signed-off-by: Seyi Kuforiji <kuforiji98@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Adapt trailer test file to use clar testing framework by using clar
assertions where necessary. Split test into individual test functions
for clarity and maintainability. Each test case now has its own
function, making it easier to isolate failures and improve test
readability.
Mentored-by: Patrick Steinhardt <ps@pks.im>
Mentored-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Signed-off-by: Seyi Kuforiji <kuforiji98@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
If block_source_read_block() or parse_footer() fail, we leak the "name"
member of struct reftable_reader in reftable_reader_new(). Release it.
Reported by: H Z <shiyuyuranzh@gmail.com>
Helped-by: Jeff King <peff@peff.net>
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Update a few more instances of Documentation/*.txt files which have been
renamed to *.adoc.
Signed-off-by: Todd Zullinger <tmz@pobox.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Commit ec14d4ecb5 (builtin.h: take over documentation from
api-builtin.txt, 2017-08-02) deleted api-builtin.txt and moved the
contents into builtin.h. Most of the references were fixed in
d85e9448dd (new-command.txt: update reference to builtin docs,
2023-02-04), but one remained. Fix it.
Signed-off-by: Todd Zullinger <tmz@pobox.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The top-level .gitattributes file contains entries for the Documentation
tree. Documentation/.gitattributes has not been touched since it was
added in 14f9e128d3 (Define the project whitespace policy, 2008-02-10).
Signed-off-by: Todd Zullinger <tmz@pobox.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
All Documentation files now end in .adoc. Update the entries for
git-merge.adoc, gitk.adoc, and user-manual.adoc to properly set the
conflict-marker-size attribute.
Signed-off-by: Todd Zullinger <tmz@pobox.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
After 1f010d6bdf (doc: use .adoc extension for AsciiDoc files,
2025-01-20), we no longer matched any files in this test. The result is
that we did not test for mismatches in the documentation and --help
output.
Adjust the test to look at the renamed *.adoc files.
Signed-off-by: Todd Zullinger <tmz@pobox.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The point behind a compile-time switch is to ensure that we have a
mechanism to hide myriad of backward incompatible changes that may
be prepared and accumulated over time, yet make them available for
testing any time during the development toward the big version
boundary. Add a few words to stress that point.
Since the document was first written, we have added the CI job that
the document anticipated us to have. Rephrase to state the current
status.
The discussion in [*1*] made us abandon the "feature.git3" based
runtime switching of behaviour and instead adopt the compile-time
switching mechanism, but a stray sentence about runtime switching
still remained in the final text by mistake. Remove it.
[Reference]
*1* https://lore.kernel.org/git/xmqqldzel6ug.fsf@gitster.g/
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Acked-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Let's add a design doc about how we could improve handling liarge blobs
using "Large Object Promisors" (LOPs). It's a set of features with the
goal of using special dedicated promisor remotes to store large blobs,
and having them accessed directly by main remotes and clients.
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Rename processing in the recursive merge backend has seen a micro
optimization.
* ms/merge-recursive-string-list-micro-optimization:
merge-recursive: optimize time complexity for process_renames
What happens to submodules during merge has been documented in a
bit more detail.
* lo/doc-merge-submodule-update:
merge-strategies.adoc: detail submodule merge
Assorted fixes and improvements to the build procedure based on
meson.
* ps/build-meson-fixes-0130:
gitlab-ci: restrict maximum number of link jobs on Windows
meson: consistently use custom program paths to resolve programs
meson: fix overwritten `git` variable
meson: prevent finding sed(1) in a loop
meson: improve handling of `sane_tool_path` option
meson: improve PATH handling
meson: drop separate version library
meson: stop linking libcurl into all executables
meson: introduce `libgit_curl` dependency
meson: simplify use of the common-main library
meson: inline the static 'git' library
meson: fix OpenSSL fallback when not explicitly required
meson: fix exec path with enabled runtime prefix
The use of "paste" command for aggregating the test results have
been corrected.
* dk/test-aggregate-results-paste-fix:
t/aggregate-results: fix paste(1) invocation
Both files in the command below appear to be indented with tabs, and I'd
expect .bash files to have roughly the same style as .sh files.
$ find . -name \*.bash
./contrib/completion/git-completion.bash
./ci/check-directional-formatting.bash
Signed-off-by: David Mandelberg <david@mandelberg.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When our asciidoc files were renamed from "*.txt" to "*.adoc" in
1f010d6bdf7 (doc: use .adoc extension for AsciiDoc files, 2025-01-20)
the "meson.build" file in "Documentation" was updated but the
"meson.build" files in the "technical" and "howto" subdirectories were
not. This causes the meson build to fail when configured with
-Ddocs=html. Fix this by updating the relevant "meson.build" files.
Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Acked-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The diffs queued from git-diff-pairs(1) are flushed when stdin is
closed. To enable greater flexibility, allow control over when the diff
queue is flushed by writing a single NUL byte on stdin between input
file pairs. Diff output between flushes is separated by a single NUL
byte.
Signed-off-by: Justin Tobler <jltobler@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Through git-diff(1), a single diff can be generated from a pair of blob
revisions directly. Unfortunately, there is not a mechanism to compute
batches of specific file pair diffs in a single process. Such a feature
is particularly useful on the server-side where diffing between a large
set of changes is not feasible all at once due to timeout concerns.
To facilitate this, introduce git-diff-pairs(1) which acts as a backend
passing its NUL-terminated raw diff format input from stdin through diff
machinery to produce various forms of output such as patch or raw.
The raw format was originally designed as an interchange format and
represents the contents of the diff_queued_diff list making it possible
to break the diff pipeline into separate stages. For example,
git-diff-tree(1) can be used as a frontend to compute file pairs to
queue and feed its raw output to git-diff-pairs(1) to compute patches.
With this, batches of diffs can be progressively generated without
having to recompute renames or retrieve object context. Something like
the following:
git diff-tree -r -z -M $old $new |
git diff-pairs -p -z
should generate the same output as `git diff-tree -p -M`. Furthermore,
each line of raw diff formatted input can also be individually fed to a
separate git-diff-pairs(1) process and still produce the same output.
Based-on-patch-by: Jeff King <peff@peff.net>
Signed-off-by: Justin Tobler <jltobler@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
By default, `diffcore_std()` resolves the statuses for queued diff file
pairs by calling `diff_resolve_rename_copy()`. If status information is
already manually set, invoking `diffcore_std()` may change the status
value.
Introduce the `skip_resolving_statuses` diff option that prevents
`diffcore_std()` from resolving file pair statuses when enabled.
Signed-off-by: Justin Tobler <jltobler@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The `diff_addremove()` and `diff_change()` functions set up and queue
diffs, but do not return the `diff_filepair` added to the queue. In a
subsequent commit, modifications to `diff_filepair` need to occur in
certain cases after being queued.
Since the existing `diff_addremove()` and `diff_change()` are also used
for callbacks in `diff_options` as types `add_remove_fn_t` and
`change_fn_t`, modifying the existing function signatures requires
further changes. The diff options for pruning use `file_add_remove()`
and `file_change()` where file pairs do not even get queued. Thus,
separate functions are implemented instead.
Split out the queuing operations into `diff_queue_addremove()` and
`diff_queue_change()` which also return a handle to the queued
`diff_filepair`. Both `diff_addremove()` and `diff_change()` are
reimplemented as thin wrappers around the new functions.
Signed-off-by: Justin Tobler <jltobler@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We renamed from .txt to .adoc all the asciidoc source files and
necessary includes. We also need to adjust the build-docdep tool to
work on files whose suffix is .adoc when computing the documentation
dependencies.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The .txt extensions were changed to .adoc in 1f010d6bdf (doc: use .adoc
extension for AsciiDoc files, 2025-01-20).
Do the same for contrib/subtree.
Signed-off-by: Todd Zullinger <tmz@pobox.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The .txt extensions were changed to .adoc in 1f010d6bdf (doc: use .adoc
extension for AsciiDoc files, 2025-01-20).
Do the same for contrib/contacts.
Signed-off-by: Todd Zullinger <tmz@pobox.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The .txt extensions were changed to .adoc in 1f010d6bdf (doc: use .adoc
extension for AsciiDoc files, 2025-01-20). This left broken links in
the generated howto-index.html.
Signed-off-by: Todd Zullinger <tmz@pobox.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
With the preceding refactorings we now only have a couple of implicit
users of `the_repository` left in the "path" subsystem, all of which
depend on global state via `calc_shared_perm()`. Make the dependency on
`the_repository` explicit by passing the repo as a parameter instead and
adjust callers accordingly.
Note that this change bubbles up into a couple of subsystems that were
previously declared as free from `the_repository`. Instead of marking
all of them as `the_repository`-dependent again, we instead use the
repository that is available in the calling context. There are three
exceptions though with "copy.c", "pack-write.c" and "tempfile.c".
Adjusting these would require us to adapt callsites all over the place,
so this is left for a future iteration.
Mark "path.c" as free from `the_repository`.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Similar as with the preceding commit, we track "core.sharedRepository"
via a pair of global variables. Move them into `struct repo_settings` so
that we can instead track them per-repository.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The "core.hooksPath" setting is stored in a global variable and
populated via the `git_default_core_config`. This may cause issues in
the case where one is handling multiple different repositories in a
single process with different values for that config key, as we may or
may not see the correct value in that case. Furthermore, global state
blocks our path towards libification.
Refactor the code so that we instead store the value in `struct
repo_settings`. The value is computed as-needed and cached. The result
should be functionally the same as there aren't ever any code paths
where we'd execute hooks outside the context of a repository.
Note that this requires us to change the passed-in repository in the
`repo_git_path()` family of functions to be non-constant, as we call
`adjust_git_path()` there.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We don't provide a way to clear a `struct repo_settings`, and instead
open-code this in `repo_clear()`. This is mixing up concerns and means
that developers have to touch multiple files whenever they add a new
field to the structure in case the associated resources need to be
released.
Provide a new `repo_settings_clear()` function to improve this.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Remove `git_path()` in favor of the `repo_git_path()` family of
functions, which makes the implicit dependency on `the_repository` go
away.
Note that `git_path()` returned a string allocated via `get_pathname()`,
which uses a rotating set of statically allocated buffers. Consequently,
callers didn't have to free the returned string. The same isn't true for
`repo_common_path()`, so we also have to add logic to free the returned
strings.
This refactoring also allows us to remove `repo_common_pathv()` as well
as `get_pathname()` from the public interface.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Same as with `get_worktree_git_dir()` a couple of commits ago, the
`rerere_path()` function returns paths that need not be free'd by the
caller because `git_path()` internally uses `get_pathname()`.
Refactor the function to instead accept a caller-provided buffer that
the path will be written into, passing on ownership to the caller. This
refactoring prepares us for the removal of `git_path()`.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Removal of ".git/branches" and ".git/remotes" support in the
BreakingChanges document has been further clarified.
* jc/3.0-branches-remotes-update:
BreakingChanges: clarify branches/ and remotes/
The netrc support (via the cURL library) for the HTTP transport has
been re-enabled.
* bc/http-push-auth-netrc-fix:
http: allow using netrc for WebDAV-based HTTP protocol
"git rebase -i" failed to allow rewording an empty commit that has
been fast-forwarded.
* pw/rebase-i-ff-empty-commit:
rebase -i: reword empty commit after fast-forward
"git refs migrate" can optionally be told not to migrate the reflog.
* kn/ref-migrate-skip-reflog:
builtin/refs: add '--no-reflog' flag to drop reflogs
The value of "uname -s" is by default sent over the wire as a part
of the "version" capability.
* ua/os-version-capability:
agent: advertise OS name via agent capability
t5701: add setup test to remove side-effect dependency
version: extend get_uname_info() to hide system details
version: refactor get_uname_info()
version: refactor redact_non_printables()
version: replace manual ASCII checks with isprint() for clarity
At now, we have already implemented the ref consistency checks for both
"files-backend" and "packed-backend". Although we would check some
redundant things, it won't cause trouble. So, let's integrate it into
the "git-fsck(1)" command to get feedback from the users. And also by
calling "git refs verify" in "git-fsck(1)", we make sure that the new
added checks don't break.
Introduce a new function "fsck_refs" that initializes and runs a child
process to execute the "git refs verify" command. In order to provide
the user interface create a progress which makes the total task be 1.
It's hard to know how many loose refs we will check now. We might
improve this later.
Then, introduce the option to allow the user to disable checking ref
database consistency. Put this function in the very first execution
sequence of "git-fsck(1)" due to that we don't want the existing code of
"git-fsck(1)" which would implicitly check the consistency of refs to
die the program.
Last, update the test to exercise the code.
Mentored-by: Patrick Steinhardt <ps@pks.im>
Mentored-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: shejialuo <shejialuo@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When there is a "sorted" trait in the header of the "packed-refs" file,
it means that each entry is sorted increasingly by comparing the
refname. We should add checks to verify whether the "packed-refs" is
sorted in this case.
Update the "packed_fsck_ref_header" to know whether there is a "sorted"
trail in the header. It may seem that we could record all refnames
during the parsing process and then compare later. However, this is not
a good design due to the following reasons:
1. Because we need to store the state across the whole checking
lifetime, we would consume a lot of memory if there are many entries
in the "packed-refs" file.
2. We cannot reuse the existing compare function "cmp_packed_ref_records"
which cause repetition.
Because "cmp_packed_ref_records" needs an extra parameter "struct
snaphost", extract the common part into a new function
"cmp_packed_ref_records" to reuse this function to compare.
Then, create a new function "packed_fsck_ref_sorted" to parse the file
again and user the new fsck message "packedRefUnsorted(ERROR)" to report
to the user if the file is not sorted.
Mentored-by: Patrick Steinhardt <ps@pks.im>
Mentored-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: shejialuo <shejialuo@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"packed-backend.c::next_record" will parse the ref entry to check the
consistency. This function has already checked the following things:
1. Parse the main line of the ref entry to inspect whether the oid is
not correct. Then, check whether the next character is oid. Then
check the refname.
2. If the next line starts with '^', it would continue to parse the
peeled oid and check whether the last character is '\n'.
As we decide to implement the ref consistency check for "packed-refs",
let's port these two checks and update the test to exercise the code.
Mentored-by: Patrick Steinhardt <ps@pks.im>
Mentored-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: shejialuo <shejialuo@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"packed-backend.c::next_record" will use "check_refname_format" to check
the consistency of the refname. If it is not OK, the program will die.
However, it is reported in [1], we cannot catch some corruption. But we
already have the code path and we must miss out something.
We use the following code to get the refname:
strbuf_add(&iter->refname_buf, p, eol - p);
iter->base.refname = iter->refname_buf.buf
In the above code, `p` is the start pointer of the refname and `eol` is
the next newline pointer. We calculate the length of the refname by
subtracting the two pointers. Then we add the memory range between `p`
and `eol` to get the refname.
However, if there are some NUL characters in the memory range between `p`
and `eol`, we will see the refname as a valid ref name as long as the
memory range between `p` and first occurred NUL character is valid.
In order to catch above corruption, create a new function
"refname_contains_nul" by searching the first NUL character. If it is
not at the end of the string, there must be some NUL characters in the
refname.
Use this function in "next_record" function to die the program if
"refname_contains_nul" returns true.
[1] https://lore.kernel.org/git/6cfee0e4-3285-4f18-91ff-d097da9de737@rd10.de/
Reported-by: R. Diez <rdiez-temp3@rd10.de>
Mentored-by: Patrick Steinhardt <ps@pks.im>
Mentored-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: shejialuo <shejialuo@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In "packed-backend.c::create_snapshot", if there is a header (the line
which starts with '#'), we will check whether the line starts with "#
pack-refs with: ". However, we need to consider other situations and
discuss whether we need to add checks.
1. If the header does not exist, we should not report an error to the
user. This is because in older Git version, we never write header in
the "packed-refs" file. Also, we do allow no header in "packed-refs"
in runtime.
2. If the header content does not start with "# packed-ref with: ", we
should report an error just like what "create_snapshot" does. So,
create a new fsck message "badPackedRefHeader(ERROR)" for this.
3. If the header content is not the same as the constant string
"PACKED_REFS_HEADER". This is expected because we make it extensible
intentionally and runtime "create_snapshot" won't complain about
unknown traits. In order to align with the runtime behavior. There is
no need to report.
As we have analyzed, we only need to check the case 2 in the above. In
order to do this, use "open_nofollow" function to get the file
descriptor and then read the "packed-refs" file via "strbuf_read". Like
what "create_snapshot" and other functions do, we could split the line
by finding the next newline in the buffer. When we cannot find a
newline, we could report an error.
So, create a function "packed_fsck_ref_next_line" to find the next
newline and if there is no such newline, use
"packedRefEntryNotTerminated(ERROR)" to report an error to the user.
Then, parse the first line to apply the checks. Update the test to
exercise the code.
Mentored-by: Patrick Steinhardt <ps@pks.im>
Mentored-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: shejialuo <shejialuo@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We always write a space after "# pack-refs with:" but we don't align
with this rule in the "create_snapshot" method where we would check
whether header starts with "# pack-refs with:". It might seem that we
should undoubtedly tighten this rule, however, we don't have any
technical documentation about this and there is a possibility that we
would break the compatibility for other third-party libraries.
By investigating influential third-party libraries, we could conclude
how these libraries handle the header of "packed-refs" file:
1. libgit2 is fine and always writes the space. It also expects the
whitespace to exist.
2. JGit does not expect th header to have a trailing space, but expects
the "peeled" capability to have a leading space, which is mostly
equivalent because that capability is typically the first one we
write. It always writes the space.
3. gitoxide expects the space t exist and writes it.
4. go-git doesn't create the header by default.
As many third-party libraries expect a single space after "# pack-refs
with:", if we forget to write the space after the colon,
"create_snapshot" won't catch this. And we would break other
re-implementations. So, we'd better tighten the rule by checking whether
the header starts with "# pack-refs with: ".
Mentored-by: Patrick Steinhardt <ps@pks.im>
Mentored-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: shejialuo <shejialuo@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Although "git-fsck(1)" and "packed-backend.c" will check some
consistency and correctness of "packed-refs" file, they never check the
filetype of the "packed-refs". Let's verify that the "packed-refs" has
the expected filetype, confirming it is created by "git pack-refs"
command.
We could use "open_nofollow" wrapper to open the raw "packed-refs" file.
If the returned "fd" value is less than 0, we could check whether the
"errno" is "ELOOP" to report an error to the user. And then we use
"fstat" to check whether the "packed-refs" file is a regular file.
Reuse "FSCK_MSG_BAD_REF_FILETYPE" fsck message id to report the error to
the user if "packed-refs" is not a regular file.
Mentored-by: Patrick Steinhardt <ps@pks.im>
Mentored-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: shejialuo <shejialuo@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In "packed-backend.c", there are some functions such as "create_snapshot"
and "next_record" which would check the correctness of the content of
the "packed-ref" file. When anything is bad, the program will die.
It may seem that we have nothing relevant to above feature, because we
are going to read and parse the raw "packed-ref" file without creating
the snapshot and using the ref iterator to check the consistency.
However, when using "get_worktrees" in "builtin/refs", we would parse
the "HEAD" information. If the referent of the "HEAD" is inside the
"packed-ref", we will call "create_snapshot" function to parse the
"packed-ref" to get the information. No matter whether the entry of
"HEAD" in "packed-ref" is correct, "create_snapshot" would call
"verify_buffer_safe" to check whether there is a newline in the last
line of the file. If not, the program will die.
Although this behavior has no harm for the program, it will
short-circuit the program. When the users execute "git refs verify" or
"git fsck", we should avoid reading the head information, which may
execute the read operation in packed backend with stricter checks to die
the program. Instead, we should continue to check other parts of the
"packed-refs" file completely.
Fortunately, in 465a22b338 (worktree: skip reading HEAD when repairing
worktrees, 2023-12-29), we have introduced a function
"get_worktrees_internal" which allows us to get worktrees without
reading head information.
Create a new exposed function "get_worktrees_without_reading_head", then
replace the "get_worktrees" in "builtin/refs" with the new created
function.
Mentored-by: Patrick Steinhardt <ps@pks.im>
Mentored-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: shejialuo <shejialuo@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
For every test, we would execute the command "cd repo" in the first but
we never execute the command "cd .." to restore the working directory.
However, it's either not a good idea use above way. Because if any test
fails between "cd repo" and "cd ..", the "cd .." will never be reached.
And we cannot correctly restore the working directory.
Let's use subshell to ensure that the current working directory could be
restored to the correct path.
Mentored-by: Patrick Steinhardt <ps@pks.im>
Mentored-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: shejialuo <shejialuo@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We have recently noticed that the "msvc-meson" test job in GitLab CI
succeeds even if there are failures. This is somewhat puzzling because
we use exactly the same command as we do on GitHub Actions, and there
the jobs fail as exected.
As it turns out, this is another weirdness of the GitLab CI hosted
runner for Windows [1]: by default, even successful commands will not
make the job fail. Interestingly though, this depends on what exactly
the command is that you're running -- the MinGW-based job for example
works alright and does fail as expected.
The root cause here seems to be specific behaviour of PowerShell. The
invocation of `ForEach-Object` does not bubble up any errors in case the
invocation of `meson test` fails, and thus we don't notice the error.
This is specific to executing the command in a loop: other build steps
where we execute commands directly fail as expected.
This is because the specific version of PowerShell that we use in the
runner does not know about `PSNativeCommandUseErrorActionPreference`
yet, which controls whether native commands like "meson.exe" honor the
`ErrorActionPreference` variable. The preference has been introduced
with PowerShell 7.3 and is default-enabled since PowerShell 7.4, but
GitLab's hosted runners still seem to use PowerShell 5.1. Consequently,
when tests fail, we won't bubble up the error at all from the loop and
thus the job doesn't fail. This isn't an issue in other cases though
where we execute native commands directly, as the GitLab runner knows to
check the last error code after every command.
The same thing doesn't seem to be an issue on GitHub Actions, most
likely because it uses PowerShell 7.4. Curiously, the preference for
`PSNativeCommandUseErrorActionPreference` is disabled there, but the
jobs fail as expected regardless of that. It's puzzling, but I do not
have enough PowerShell expertise to give a definitive answer as to why
it works there.
In any case, Meson 1.8 will likely get support for slicing tests [1], so
we can eventually get rid of the whole PowerShell script. For now, work
around the issue by explicitly exiting out of the loop with a non-zero
error code if we see that Meson has failed.
[1]: https://github.com/mesonbuild/meson/pull/14092
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The hosted Windows runners on GitLab.com only have 7.5GB of RAM. Given
that "link.exe" provided by Microsoft Visual Studio is multi-threaded by
itself already and thus quite memory hungry this can quickly lead to
memory starvation, out-of-memory situations and thus failed CI jobs.
Fix the issue by limiting the number of concurrent linker jobs. The same
issue hasn't been observed on GitHub Actions yet, probably because it
got more than twice the amount of RAM with 16GB.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The calls to `find_program()` in our documentation don't use our custom
program path. This variable gets populated on Windows with the location
of Git for Windows so that we can use it to provide our build tools.
Consequently, we may not be able to find all necessary binaries on
Windows.
Adapt the calls to use the program path to fix this. While at it, drop
`required: true` arguments, which are the default anyway.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We're assigning the `git` variable in three places:
- In "meson.build" to store the external Git executable.
- In "meson.build" to store the compiled Git executable.
- In "Documentation/meson.build" to store the external Git executable,
a second time.
The last case is only needed because we overwrite the original variable
with the built version. Rename the variable used for the built Git
executable so that we don't have to resolve the external Git executable
multiple times.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We're searching for the sed(1) executable in a loop, which will make us
try to find it multiple times. Starting with the preceding commit we
already declare a variable for that program in the top-level build file.
Use it so that we only need to search for the program once.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The `sane_tool_path` option can be used to override the PATH variable
from which the build process, tests and ultimately Git will end up
picking programs from. It is currently lacking though because we only
use it to populate the PATH environment variable for executed scripts
and for the `BROKEN_PATH_FIX` mechanism, but we don't use it to find
programs used in the build process itself.
Fix this issue by treating it similar to the Windows-specific paths,
which will make us use it both to find programs and to populate the PATH
environment variable.
To help with this fix, change the type of the option to be an array of
paths, which makes the handling a bit easier for us. It's also the
correct thing to do as the input indeed is a list of paths.
Furthermore, the option now overrides the default behaviour on Windows,
which si to pick up tools from Git for Windows. This is done so that it
becomes easier to override that default behaviour in case it's not
desired.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When locating programs required for the build we give some special
treatment to Windows systems so that we know to also look up tools
provided by a Git for Windows installation. This ensures that the build
doesn't have any prerequisites other than Microsoft Visual Studio, Meson
and Git for Windows.
Consequently, some of the programs returned by `find_program()` may not
be found via PATH, but via these extra directories. But while Meson can
use these tools directly without any special treatment, any scripts that
we execute may not be able to find those programs. To help them we thus
prepend the directories of a subset of the found programs to PATH.
This doesn't make much sense though: we don't need to prepend PATH for
any program that was found via PATH, but we really only need to do so
for programs located via the extraneous Windows-specific paths. So
instead of prepending all programs paths, we really only need to prepend
the Windows-specific paths.
Adapt the code accordingly by only prepeding Windows-specific paths to
PATH, which both simplifies the code and clarifies intent.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When building `libgit.a` we link it against a `libgit_version.a` library
that contains the version information that we inject at build time. The
intent of this is to avoid rebuilding all of `libgit.a` whenever the
version changes. But that wouldn't happen in the first place, as we know
to just rebuild the files that depend on the generated "version-def.h"
file.
This is an artifact of an earlier version of the Meson build infra that
didn't ultimately land. We didn't yet have "version-def.h", and instead
injected the version via preprocessor directives. And here we would have
rebuilt all of `libgit.a` indeed in case the version changes, because
the preprocessor directive applied to all files.
Stop building the separate library and instead add "version-def.h" to
the list of source files directly.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We set up libcurl via the `libgit_dependencies` variable, which gets
propagated into every user of the `libgit` dependency. This is not
necessary though, as most of our executables aren't even supposed to
link against libcurl.
Fix this by only propagating include directories as a libgit dependency
and propagating the full curl dependency via `libgit_curl`.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We've got a set of common source files that we use for those executables
that link against libcurl. The setup is somewhat repetitive though.
Simplify it by declaring a `libgit_curl` dependency that bundles all of
it together.
Note that we don't include curl itself as a dependency. This is because
we already pull it in transitively via the libgit dependency, which is
unfortunate because libgit itself shouldn't actually link against curl
in the first place. This will get fixed in the next commit.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The "common-main.c" file is used by multiple executables. In order to
make it easy to set it up we have created a separate library that these
executables can link against. All of these executables also want to link
against `libgit.a` though, which makes it necessary to specify both of
these as dependencies for every executable.
Simplify this a bit by declaring the library as a source dependency:
instead of creating a static library, we now instead compile the common
set of files into each executable separately.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When setting up `libgit.a` we first create the static library itself,
and then declare it as part of a dependency such that compile arguments,
include directories and transitive dependencies get propagated to the
users of that library. As such, the static library isn't expected to be
used by anything but the declared dependency.
Inline the static library so that we don't even use a separate variable
for it. This avoids any kind of confusion that may arise and clarifies
how the library is supposed to be used.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When OpenSSL isn't provided by the system we know to fall back to the
subproject wrapper. This is especially helpful on Windows systems, where
you typically don't have OpenSSL available, in order to reduce the
number of required dependencies.
The fallback is broken though when the OpenSSL backend is set to 'auto'
as we end up calling `dependency('openssl', required: false)` in that
case, which implicitly disables falling back to the wrapper.
Fix the issue by re-allowing the fallback in case either OpenSSL is
required or in case the backend is set to 'auto'. While at it, fix
reporting of the backend in case the user asked us to pick no HTTPS
backend at all.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When the runtime prefix option is enabled, Git is built such that it
knows to locate its binaries relative to the directory a binary is being
executed from. This requires us to figure out relative paths, which is
handled in `system_prefix()` by trying to strip a couple of well-known
paths.
One of these paths, GIT_EXEC_PATH, is expected to be absolute when
runtime prefixes are enabled, but relative otherwise. And while our
Makefile gets this correctly, in Meson we always wire up the absolute
path, which may result in us not being able to find binaries.
Fix this by conditionally injecting the paths depending on whether or
not the `runtime_prefix` option is enabled.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Submodule merges are, in general, similar to other merges based on oid
three-way-merge. When a conflict happens, however, Git has two special
cases (introduced in 68d03e4a6e44) on handling the conflict before
yielding it to the user. From the merge-ort and merge-recursive sources:
- "Case #1: a is contained in b or vice versa": both strategies try to
perform a fast-forward in the submodules if the commit referred by the
conflicted submodule is descendant of another;
- "Case #2: There are one or more merges that contain a and b in the
submodule. If there is only one, then present it as a suggestion to the
user, but leave it marked unmerged so the user needs to confirm the
resolution."
Add a small paragraph on merge-strategies.adoc describing this behavior.
Helped-by: Junio C Hamano <gitster@pobox.com>
Helped-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Lucas Seiki Oshiro <lucasseikioshiro@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
As we have created an empty .git/branches/ hierarchy until fairly
recently, these directories may be found in modern repositories, but
it is highly unlikely that they are being used.
Reported-by: Jakub Wilk <jwilk@jwilk.net>
Acked-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Upgrade the minimum Perl version enforced by meson-based build to
match what Makefile-based build uses.
* po/meson-perl-fix:
meson: fix Perl version check for Meson versions before 1.7.0
meson: bump minimum required Perl version to 5.26.0
Correct the default target in Documentation/Makefile, and
future-proof all Makefiles from similar breakages by declaring the
default target (which happens to be "all") upfront.
* ad/set-default-target-in-makefiles:
Makefile: set default goals in makefiles
"git merge-tree --stdin" has been improved (including a workaround
for a deadlock).
* pw/merge-tree-stdin-deadlock-fix:
merge-tree: fix link formatting in html docs
merge-tree: improve docs for --stdin
merge-tree: only use basic merge config
merge-tree: remove redundant code
merge-tree --stdin: flush stdout to avoid deadlock
The documentation of "git commit" and "git rebase" now refer to
commit titles as such, not "subject".
* mh/doc-commit-title-not-subject:
doc: use 'title' consistently
The -G/-S options to the "diff" family of commands caused us to hit
a BUG() when they get no values; they have been corrected.
* bc/diff-reject-empty-arg-to-pickaxe:
diff: don't crash with empty argument to -G or -S
Noises from "-Wsign-compare" in the borrowed xdiff code has been
squelched.
* da/xdiff-w-sign-compare-workaround:
xdiff: avoid signed vs. unsigned comparisons in xutils.c
xdiff: avoid signed vs. unsigned comparisons in xpatience.c
xdiff: avoid signed vs. unsigned comparisons in xhistogram.c
xdiff: avoid signed vs. unsigned comparisons in xemit.c
xdiff: avoid signed vs. unsigned comparisons in xdiffi.c
xdiff: move sign comparison warning guard into each file
Adapt oidtree test script to clar framework by using clar assertions
where necessary. `cl_parse_any_oid()` ensures the hash algorithm is set
before parsing. This prevents issues from an uninitialized or invalid
hash algorithm.
Introduce 'test_oidtree__initialize` handles the to set up of the global
oidtree variable and `test_oidtree__cleanup` frees the oidtree when all
tests are completed.
With this change, `check_each` stops at the first error encountered,
making it easier to address it.
Mentored-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Seyi Kuforiji <kuforiji98@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Adapt oidmap test script to clar framework by using clar assertions
where necessary. `cl_parse_any_oid()` ensures the hash algorithm is set
before parsing. This prevents issues from an uninitialized or invalid
hash algorithm.
Introduce 'test_oidmap__initialize` handles the to set up of the global
oidmap map with predefined key-value pairs, and `test_oidmap__cleanup`
frees the oidmap and its entries when all tests are completed.
The test loops through all entries to detect multiple errors. With this
change, it stops at the first error encountered, making it easier to
address it.
Mentored-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Seyi Kuforiji <kuforiji98@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Adapt oid-array test script to clar framework by using clar assertions
where necessary. Remove descriptions from macros to reduce
redundancy, and move test input arrays to global scope for reuse across
multiple test functions. Introduce `test_oid_array__initialize()` to
explicitly initialize the hash algorithm.
These changes streamline the test suite, making individual tests
self-contained and reducing redundant code.
Mentored-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Seyi Kuforiji <kuforiji98@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
`get_oid_arbitrary_hex()` and `init_hash_algo()` are both required for
oid-related tests to run without errors. In the current implementation,
both functions are defined and declared in the
`t/unit-tests/lib-oid.{c,h}` which is utilized by oid-related tests in
the homegrown unit tests structure.
Adapt functions in lib-oid.{c,h} to use clar. Both these functions
become available for oid-related test files implemented using the clar
testing framework, which requires them. This will be used by subsequent
commits.
Mentored-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Seyi Kuforiji <kuforiji98@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We have a pattern like:
if (error1)
...handle error 1...
else if (error2)
...handle error 2...
else
...return buf...
...free buf and return NULL...
This is a little subtle because it is the return in the success block
that lets us skip the common error handling. Rewrite this instead to
free the buffer in each error path, marking it as NULL, and then all
code paths can use the common return.
This should make the logic a bit easier to follow. It does mean
duplicating the buf cleanup for errors, but it's a single line.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Inflating a loose object is considered successful only if we got
Z_STREAM_END and there were no more bytes. We check both of those
conditions and return success, but then have to check them a second time
to decide which error message to produce.
I.e., we do something like this:
if (!error_1 && !error_2)
...return success...
if (error_1)
...handle error1...
else if (error_2)
...handle error2...
...common error handling...
This repetition was the source of a small bug fixed in an earlier commit
(our Z_STREAM_END check was not the same in the two conditionals).
Instead we can chain them all into a single if/else cascade, which
avoids repeating ourselves:
if (error_1)
...handle error1...
else if (error_2)
...handle error2....
else
...return success...
...common error handling...
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The unpack_loose_rest() function has funny ownership semantics: we pass
in a z_stream opened by the caller, but then only _sometimes_ close it.
This oddity has developed over time. When the function was originally
split out in 5180cacc20 (Split up unpack_sha1_file() some more,
2005-06-02), it always called inflateEnd() to clean up the stream
(though nowadays it is a git_zstream and we call git_inflate_end()).
But in 7efbff7531 (unpack_sha1_file(): detect corrupt loose object
files., 2007-03-05) we added error code paths which don't close the
stream. This makes some sense, as we'd still look at parts of the stream
struct to decide which error to show (though I am not sure in practice
if inflateEnd() even touches those fields).
This subtlety makes it hard to know when the caller has to clean up the
stream and when it does not. That led to the leak fixed by aa9ef614dc
(object-file: fix memory leak when reading corrupted headers,
2024-08-14).
Let's instead always leave the stream intact, forcing the caller to
clean it up. You might think that would create more work for the
callers, but it actually ends up simplifying them, since they can put
the call to git_inflate_end() in the common cleanup code path.
Two things to note, though:
- The check_stream_oid() function is used as a replacement for
unpack_loose_rest() in read_loose_object() to read blobs. It
inherited the same funny semantics, and we should fix it here, too
(to keep the cleanup in read_loose_object() consistent).
- In read_loose_object() we need a second "out" label, as we can jump
to the existing label before opening the stream at all (and since
the struct is opaque, there is no way to if it was initialized or
not, so we must not call git_inflate_end() in that case).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When unpacking the actual content of a loose object file, we insist both
that the status code we got is Z_STREAM_END, and that we consumed all
bytes.
If we didn't, we'll return an error, but the specific error message we
produce depends on which of the two error conditions we saw. So we'll
check both a second time to decide which error to produce. But this
second time, our status code check is loose: it checks for a negative
status value.
This can get confused by zlib codes which are not negative, such as
Z_NEED_DICT. In this case we'd erroneously print nothing at all, when we
should say "corrupt loose object".
Instead, this second check should check explicitly against Z_STREAM_END.
Note that Z_OK is "0", so the existing code also produced no message for
Z_OK. But it's impossible to see that status, since we only break out of
the inflate loop when we stop seeing Z_OK (so a stream which has more
bytes than its object header claims would eventually yield Z_BUF_ERROR).
There's no test here, as it would require a loose object whose zlib
stream returns Z_NEED_DICT in the middle of the object content. I think
that is probably possible, but even our Z_NEED_DICT test in t1006 does
not trigger this, because we hit that error while reading the header. I
found this bug while reviewing all callers of git_inflate() for bugs
similar to the one we saw in unpack_loose_header(). This was the only
other case that did a numeric comparison rather than explicitly checking
for Z_STREAM_END.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When unpacking a loose header, we try to inflate the first 32 bytes.
We'd expect either Z_OK (we filled up the output buffer, but there are
more bytes in the object) or Z_STREAM_END (this is a tiny object whose
header and content fit in the buffer).
We check for that with "if (status < Z_OK)", making the assumption that
all of the errors we'd see have negative values (as Z_OK itself is "0",
and Z_STREAM_END is "1").
But there's at least one case this misses: Z_NEED_DICT is "2". This
isn't something we'd ever expect to see, but if we do see it, we should
consider it an error (since we have no dictionary to load).
Instead, the current code interprets Z_NEED_DICT as success and looks
for the object header's terminating NUL in the bytes we've read. This
will generaly be zero bytes if the dictionary is mentioned at the start
of the stream. So we'll fail to find it and complain "the header is too
long" (ULHR_LONG). But really, the problem is that the object is
malformed, and we should return ULHR_BAD.
This is a minor bug, as we consider both cases to be an error. But it
does mean we print the wrong error message. The test case added in the
previous patch triggers this code, so we can just confirm the error
message we see here.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This fixes a case where malformed object input can cause us to hit a
BUG() call in the git-zlib.c code.
The zlib format allows the use of preset dictionaries to reduce the size
of deflated data. The checksum of the dictionary is computed by the
deflate code and goes into the stream. On the inflating side, zlib sees
the dictionary checksum and returns Z_NEED_DICT, asking the caller to
provide the dictionary data via inflateSetDictionary().
This should never happen in Git, because we never provide a dictionary
for deflating (and if we get a stream that mentions a dictionary, we
have no idea how to provide it). So normally Z_NEED_DICT is a hard error
for us. But something interesting happens if we _do_ happen to see it
(e.g., because of a corrupt or malicious input).
In git_inflate() as we loop over calls to zlib's inflate(), we translate
between our large-integer git_zstream values and zlib's native z_stream
types, copying in and out with zlib_pre_call() and zlib_post_call(). In
zlib_post_call() we have a few sanity checks, including one that checks
that the number of bytes consumed by zlib (as measured by it moving the
"next_in" pointer) is equal to the movement of its "total_in" count.
But these do not correspond when we see Z_NEED_DICT! Zlib consumes the
bytes from the input buffer but it does not increment total_in. And so
we hit the BUG("total_in mismatch") call.
There are a few options here:
- We could ditch that BUG() check. It is making too many assumptions
about how zlib updates these values. But it does have value in most
cases as a sanity check on the values we're copying.
- We could skip the zlib_post_call() entirely when we see Z_NEED_DICT.
We know that it's hard error for us, so we should just send the
status up the stack and let the caller bail.
The downside is that if we ever did want to support dictionaries,
we couldn't (the git_zstream will be out of sync, since we never
copied its values back from the z_stream).
- We could continue to call zlib_post_call(), but skip just that BUG()
check if the status is Z_NEED_DICT. This keeps git_inflate() as a
thin wrapper around inflate(), and would let us later support
dictionaries for some calls if we wanted to.
This patch uses the third approach. It seems like the least-surprising
thing to keep git_inflate() a close to inflate() as possible. And while
it makes the diff a bit larger (since we have to pass the status down to
to the zlib_post_call() function), it's a static local function, and
every caller by definition will have just made a zlib call (and so will
have a status integer).
Co-authored-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When reading a loose object, we first try to expand the first 32 bytes
to read the type+size header. This is enough for any of the normal Git
types. But since 46f034483e (sha1_file: support reading from a loose
object of unknown type, 2015-05-03), the caller can also ask us to parse
any unknown names, which can be much longer. In this case we keep
inflating until we find the NUL at the end of the header, or hit
Z_STREAM_END.
But what if zlib can't make forward progress? For example, if the loose
object file is truncated, we'll have no more data to feed it. It will
return Z_BUF_ERROR, and we'll just loop infinitely, calling
git_inflate() over and over but never seeing new bytes nor an
end-of-stream marker.
We can fix this by only looping when we think we can make forward
progress. This will always be Z_OK in this case. In other code we might
also be able to continue on Z_BUF_ERROR, but:
- We will never see Z_BUF_ERROR because the output buffer is full; we
always feed a fresh 32-byte buffer on each call to git_inflate().
- We may see Z_BUF_ERROR if we run out of input. But since we've fed
the whole mmap'd buffer to zlib, if it runs out of input there is
nothing more we can do.
So if we don't see Z_OK (and didn't see the end-of-header NUL, otherwise
we'd have broken out of the loop), then we should stop looping and
return an error.
The test case shows an example where the input is truncated (which gives
us the input Z_BUF_ERROR case above).
Although we do operate on objects we might get from an untrusted remote,
I don't think the security implications of this bug are too great. It
can only trigger if both of these are true:
- You're reading a loose object whose on-disk representation was
written by an attacker. So fetching an object (or receiving a push)
are mostly OK, because even with unpack-objects it is our local,
trusted code that writes out the object file. The exception may be
fetching from an untrusted local repo, or using dumb-http, which
copies objects verbatim. But...
- The only code path which triggers the inflate loop is cat-file's
--allow-unknown-type option. This is unlikely to be called at all
outside of debugging. But I also suspect that objects with
non-standard types (or that are truncated) would not survive the
usual fetch/receive checks in the first place.
So I think it would be quite hard to trick somebody into running the
infinite loop, and we can just fix the bug.
Co-authored-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
If a caller asks us to read the whole loose object header value into a
strbuf (e.g., via "cat-file --allow-unknown-type"), we'll keep reading
until we see a NUL byte marking the end of the header.
If we hit Z_STREAM_END before seeing the NUL, we obviously have to stop.
But we return ULHR_TOO_LONG, which doesn't make any sense. The "too
long" return code is used in the normal, 32-byte limited mode to
indicate that we stopped looking. There is no such thing as "too long"
here, as we'd keep reading forever until we see the end of stream or the
NUL.
Instead, we should return ULHR_BAD. The loose object has no NUL marking
the end of header, so it is malformed. The behavior difference is
slight; in either case we'd consider the object unreadable and refuse to
go further. The only difference is the specific error message we
produce.
There's no test case here, as we'd need to generate a valid zlib stream
without a NUL. That's not something Git will do without writing new
custom code. And in the next patch we'll fix another bug in this area
which will make this easier to do (and we will test it then).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When using OBJECT_INFO_ALLOW_UNKNOWN_TYPE to unpack a header that
doesn't fit into our initial 32-byte buffer, we loop over calls
git_inflate(), feeding it our buffer to the "next_out" pointer each
time. As the code is written, we reset next_out after each inflate call
(and after reading the output), ready for the next loop.
This isn't wrong, but there are a few advantages to setting up
"next_out" right before each inflate call, rather than after:
1. It drops a few duplicated lines of code.
2. It makes it obvious that we always feed a fresh buffer on each call
(and thus can never see Z_BUF_ERROR due to due to a lack of output
space).
3. After we exit the loop, we'll leave stream->next_out pointing to
the end of the fetched data (this is how zlib callers find out how
much data is in the buffer). This doesn't matter in practice, since
nobody looks at it again. But it's probably the least-surprising
thing to do, as it matches how next_out is left when the whole
thing fits in the initial 32-byte buffer (and we don't enter the
loop at all).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
After unpack_loose_header() returns, it will have inflated not only the
object header, but possibly some bytes of the object content. When we
call unpack_loose_rest() to extract the actual content, it finds those
extra bytes by skipping past the header's terminating NUL in the buffer.
Like this:
int bytes = strlen(buffer) + 1;
n = stream->total_out - bytes;
...
memcpy(buf, (char *) buffer + bytes, n);
This won't work with the OBJECT_INFO_ALLOW_UNKNOWN_TYPE flag, as there
we allow a header of arbitrary size. We put into a strbuf, but feed only
the final 32-byte chunk we read to unpack_loose_rest(). In that case
stream->total_out may unexpectedly large, and thus our "n" will be
large, causing an out-of-bounds read (we do check it against our
allocated buffer size, which prevents an out-of-bounds write).
Probably this could be made to work by feeding the strbuf to
unpack_loose_rest(), along with adjusting some types (e.g., "bytes"
would need to be a size_t, since it is no longer operating on a 32-byte
buffer).
But I don't think it's possible to actually trigger this in practice.
The only caller who passes ALLOW_UNKNOWN_TYPE is cat-file, which only
allows it with the "-t" and "-s" options (neither of which access the
content). There is one way you can _almost_ trigger it: the oid compat
routines (i.e., accessing sha1 via sha256 names and vice versa) will
convert objects on the fly (which requires access to the content) using
the same flags that were passed in. So in theory this:
t='some very large type field that causes an extra inflate call'
sha1_oid=$(git hash-object -w -t "$t" file)
sha256_oid=$(git rev-parse --output-object-format=sha256 $sha1_oid)
git cat-file --allow-unknown-type -s $sha256_oid
would try to access the content. But it doesn't work, because using
compat objects requires an entry in the .git/objects/loose-object-idx
file, and we don't generate such an entry for non-standard types (see
the "compat" section of write_object_file_literally()).
If we use "t=blob" instead, then it does access the compat object, but
it doesn't trigger the problem (because "blob" is a standard short type
name, and it fits in the initial 32-byte buffer).
So given that this is almost a memory error bug, I think it's worth
addressing. But because we can't actually trigger the situation, I'm
hesitant to try a fix that we can't run. Instead let's document the
restriction and protect ourselves from the out-of-bounds read by adding
a BUG() check.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When running `make test`, when missing prereqs the following is emitted:
make aggregate-results
usage: paste [-s] [-d delimiters] file ...
fixed 1
success 30066
failed 0
broken 218
total 31274
POSIX says that `paste` requires a file operand; stdin was clearly
intended by 49da404070 (test-lib: show missing prereq summary,
2021-11-20). Use it.
Signed-off-by: D. Ben Knoble <ben.knoble+github@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
clear_commit_marks_1() clears the marks of the first parent and its
first parent and so on, and saves the higher numbered parents in a list
for later. There is no benefit in keeping that list growing with each
handled commit. Clear it after each run to reduce peak memory usage.
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
For an extended period of time, we've enabled libcurl's netrc
functionality, which will read credentials from the netrc file if none
are provided. Unfortunately, we have also not documented this fact or
written any tests for it, but people have come to rely on it.
In 610cbc1dfb ("http: allow authenticating proactively", 2024-07-10), we
accidentally broke the ability of users to use the netrc file for the
WebDAV-based HTTP protocol. Notably, it works on the initial request
but does not work on subsequent requests, which causes failures because
that version of the protocol will necessarily make multiple requests.
This happens because curl_empty_auth_enabled never returns -1, only 0 or
1, and so if http.proactiveAuth is not enabled, the username and
password are always set to empty credentials, which prevents libcurl's
fallback to netrc from working. However, in other cases, the server
continues to get a 401 response and the credential helper is invoked,
which is the normal behavior, so this was not noticed earlier.
To fix this, change the condition to check for enabling empty auth and
also not having proactive auth enabled, which should result in the
username and password not being set to a single colon in the typical
case, and thus the netrc file being used.
Reported-by: Peter Georg <peter.georg@physik.uni-regensburg.de>
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
I recently had reported to me a crash from a coworker using the recently
added sendemail mailmap support:
3724814 Segmentation fault (core dumped) git check-mailmap "bugs@company.xx"
This appears to happen because of the NULL pointer name passed into
map_user(). Fix this by passing "" instead of NULL so that we have a
valid pointer.
Signed-off-by: Jacob Keller <jacob.keller@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Two configuration variables about SSL authentication material that
weren't mentioned in the documentations are now mentioned.
* ac/doc-http-ssl-type-config:
docs: indicate http.sslCertType and sslKeyType
Since 183ea3ea (Merge branch 'ps/mingw-rename', 2024-11-13),
a new technique is used on Windows to rename files, where supported.
The first step of this technique is to open the file with
`CreateFileW`. At that time, `FILE_ATTRIBUTE_NORMAL` was passed as
the value of the `dwFlagsAndAttributes` argument. In b30404df [2], this
was improved by passing `FILE_FLAG_BACKUP_SEMANTICS`, to support
directories as well as regular files.
However, neither value of `dwFlagsAndAttributes` is sufficient to open
a symbolic link with the correct semantics to rename it. Symlinks on
Windows are reparse points. Attempting to open a reparse point with
`CreateFileW` dereferences the reparse point and opens the target
instead, unless `FILE_FLAG_OPEN_REPARSE_POINT` is included in
`dwFlagsAndAttributes`. This is documented for that flag and in the
"Symbolic Link Behavior" section of the `CreateFileW` docs [3].
This produces a regression where attempting to rename a symlink on
Windows renames its target to the intended new name and location of the
symlink. For example, if `symlink` points to `file`, then running
git mv symlink symlink-renamed
leaves `symlink` in place and unchanged, but renames `file` to
`symlink-renamed` [4].
This regression is detectable by existing tests in `t7001-mv.sh`, but
the tests must be run by a Windows user with the ability to create
symlinks, and the `ln -s` command used to create the initial symlink
must also be able to create a real symlink (such as by setting the
`MSYS` environment variable to `winsymlinks:nativestrict`). Then
these two tests fail if the regression is present, and pass otherwise:
38 - git mv should overwrite file with a symlink
39 - check moved symlink
Let's fix this, so that renaming a symlink again renames the symlink
itself and leaves the target unchanged, by passing
FILE_FLAG_BACKUP_SEMANTICS | FILE_FLAG_OPEN_REPARSE_POINT
as the `dwFlagsAndAttributes` argument. This is sufficient (and safe)
because including `FILE_FLAG_OPEN_REPARSE_POINT` causes no harm even
when used to open a file or directory that is not a reparse point. In
that case, as noted in [3], this flag is simply ignored.
[1]: 183ea3eabf
[2]: b30404dfc0
[3]: https://learn.microsoft.com/en-us/windows/win32/api/fileapi/nf-fileapi-createfilew
[4]: https://github.com/git-for-windows/git/issues/5436
Signed-off-by: Eliah Kagan <eliah.kagan@gmail.com>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The "git refs migrate" subcommand converts the backend used for ref
storage. It always migrates reflog data as well as refs. Introduce an
option to exclude reflogs from migration, allowing them to be discarded
when they are unnecessary.
This is particularly useful in server-side repositories, where reflogs
are typically not expected. However, some repositories may still have
them due to historical reasons, such as bugs, misconfigurations, or
administrative decisions to enable reflogs for debugging. In such
repositories, it would be optimal to drop reflogs during the migration.
To address this, introduce the '--no-reflog' flag, which prevents reflog
migration. When this flag is used, reflogs from the original reference
backend are migrated. Since only the new reference backend remains in
the repository, all previous reflogs are permanently discarded.
Helped-by: Junio C Hamano <gitster@pobox.com>
Helped-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Wire up credential helpers in our CI runs so that we can rest assured
that they compile and (if tests are available) function correctly.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The musl-based Meson job is supposed to explicitly specify the UTF-8
locale used for testing, which has been introduced with 84bb5eeace7 (ci:
switch linux-musl to use Meson, 2025-01-28). That commit had two issues
though:
- We continue to refer to "linux-musl", even though the job has been
renamed in the same commit to "linux-musl-meson".
- We use the wrong option name to specify the locale. This was not
noticed though due to the first issue.
Fix both of these issues by fixing both the job and option naems.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* 'master' of https://github.com/j6t/gitk:
gitk: introduce support for the Meson build system
gitk: extract script to build executable
gitk: make the "list references" default window width wider
gitk: fix arrow keys in input fields with Tcl/Tk >= 8.6
gitk: Use an external icon file on Windows
gitk: Unicode file name support
gitk(Windows): avoid inadvertently calling executables in the worktree
* 'pks-meson-support' of https://github.com/pks-t/gitk:
gitk: introduce support for the Meson build system
gitk: extract script to build executable
Signed-off-by: Johannes Sixt <j6t@kdbg.org>
* 'g4w-gitk' of https://github.com/dscho/gitk:
gitk: make the "list references" default window width wider
gitk: fix arrow keys in input fields with Tcl/Tk >= 8.6
gitk: Use an external icon file on Windows
gitk: Unicode file name support
gitk(Windows): avoid inadvertently calling executables in the worktree
Signed-off-by: Johannes Sixt <j6t@kdbg.org>
Upstream Git has introduced support for the Meson build system.
Introduce support for Meson into gitk, as well, so that Git can easily
build its vendored copy of Gitk via a `subproject()` directive. The
instructions can be set up as follows:
$ meson setup build
$ meson compile -C build
$ meson install -C build
Specific options, like for example where Gitk shall be installed to, can
be specified at setup time via `-D`. Available options can be discovered
by running `meson configure` either in the source or build directory.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
As some issues that can happen with a Git client can be operating system
specific, it can be useful for a server to know which OS a client is
using. In the same way it can be useful for a client to know which OS
a server is using.
Our current agent capability is in the form of "package/version" (e.g.,
"git/1.8.3.1"). Let's extend it to include the operating system name (os)
i.e in the form "package/version-os" (e.g., "git/1.8.3.1-Linux").
Including OS details in the agent capability simplifies implementation,
maintains backward compatibility, avoids introducing a new capability,
encourages adoption across Git-compatible software, and enhances
debugging by providing complete environment information without affecting
functionality. The operating system name is retrieved using the 'sysname'
field of the `uname(2)` system call or its equivalent.
However, there are differences between `uname(1)` (command-line utility)
and `uname(2)` (system call) outputs on Windows. These discrepancies
complicate testing on Windows platforms. For example:
- `uname(1)` output: MINGW64_NT-10.0-20348.3.4.10-87d57229.x86_64\
.2024-02-14.20:17.UTC.x86_64
- `uname(2)` output: Windows.10.0.20348
On Windows, uname(2) is not actually system-supplied but is instead
already faked up by Git itself. We could have overcome the test issue
on Windows by implementing a new `uname` subcommand in `test-tool`
using uname(2), but except uname(2), which would be tested against
itself, there would be nothing platform specific, so it's just simpler
to disable the tests on Windows.
Mentored-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Usman Akinyemi <usmanakinyemi202@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Command `perl --version` says, e.g., “This is perl 5, version 26,
subversion 0 (v5.26.0)”, which older versions of Meson interpret as
version 26.
This will be fixed in Meson 1.7.0, but at the time of writing that isn’t
yet released.
If we run `perl -V:version` we get the unambiguous response
“version='5.26.0';”, but we need at least Meson 1.5.0 to be able to do that.
Note that Perl are seriously considering dropping the leading 5 entirely
in the near future (https://perl.github.io/PPCs/ppc0025-perl-version/),
but that shouldn’t affect us.
Signed-off-by: Peter Oliver <git@mavit.org.uk>
Co-authored-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Commit 702d8c1f3b (Require Perl 5.26.0, 2024-10-23) dropped support
for Perl versions older than 5.26.0. The Meson build system, which
has been developed in parallel to that commit, hasn't been bumped
accordingly and thus still requires Perl 5.8.1 or newer.
Fix this by requiring Perl 5.26.0 or newer with Meson.
Signed-off-by: Peter Oliver <git@mavit.org.uk>
Reviewed-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git difftool" code clean-up.
* da/difftool-sans-the-repository:
difftool: eliminate use of USE_THE_REPOSITORY_VARIABLE
difftool: eliminate use of the_repository
difftool: eliminate use of global variables
"git rev-list --missing=" learned to accept "print-info" that gives
known details expected of the missing objects, like path and type.
* jt/rev-list-missing-print-info:
rev-list: extend print-info to print missing object type
rev-list: add print-info action to print missing object path
"git push --atomic --porcelain" used to ignore failures from the
other side, losing the error status from the child process, which
has been corrected.
* ps/send-pack-unhide-error-in-atomic-push:
send-pack: gracefully close the connection for atomic push
t5543: atomic push reports exit code failure
send-pack: new return code "ERROR_SEND_PACK_BAD_REF_STATUS"
t5548: add porcelain push test cases for dry-run mode
t5548: add new porcelain test cases
t5548: refactor test cases by resetting upstream
t5548: refactor to reuse setup_upstream() function
t5504: modernize test by moving heredocs into test bodies
Lazy-loading missing files in a blobless clone on demand is costly
as it tends to be one-blob-at-a-time. "git backfill" is introduced
to help bulk-download necessary files beforehand.
* ds/backfill:
backfill: assume --sparse when sparse-checkout is enabled
backfill: add --sparse option
backfill: add --min-batch-size=<n> option
backfill: basic functionality and tests
backfill: add builtin boilerplate
Unlinking a file may fail on Windows systems when the file is still held
open by another process. This is incompatible with POSIX semantics and
by extension with Git's assumed semantics when unlinking files, which
is that files can be unlinked regardless of whether they are still open
or not. To counteract this incompatibility, we have some custom error
handling in the `mingw_unlink()` wrapper that first retries the deletion
with some delay, and then asks the user whether we should continue to
retry.
While this logic might be sensible in many callsites throughout Git, it
is less when used in the reftable library. We only use unlink(3) there
to delete tables which aren't referenced anymore, and the code is very
aware of the limitations on Windows. As such, all calls to unlink(3p)
don't perform any error checking at all and are fine with the call
failing.
Instead, the library provides the `reftable_stack_clean()` function,
which Git knows to execute in git-pack-refs(1) after compacting a stack.
The effect of this function is that all stale tables will eventually get
deleted once they aren't kept open anymore.
So while we're fine with unlink(3p) failing, the Windows-emulation of
that function will still perform several sleeps and ultimately end up
asking the user:
$ git pack-refs
Unlink of file 'C:/temp/jgittest/jgit/.git/reftable/0x000000000002-0x000000000004-50486d0e.ref' failed. Should I try again? (y/n) n
Unlink of file 'C:/temp/jgittest/jgit/.git/reftable/0x000000000002-0x000000000004-50486d0e.ref' failed. Should I try again? (y/n) n
Unlink of file 'C:/temp/jgittest/jgit/.git/reftable/0x000000000002-0x000000000004-50486d0e.ref' failed. Should I try again? (y/n) n
It even asks multiple times, which is doubly annoying and puzzling to
the user:
1. It asks when trying to delete the old file after having written the
compacted stack.
2. It asks when reloading the stack, where it will try to unlink
now-unreferenced tables.
3. It asks when calling `reftable_stack_clean()`, where it will try to
unlink now-stale tables.
Fix the issue by making it possible to disable this behaviour with a
preprocessor define. As "git-compat-util.h" is only included from
"system.h", and given that "system.h" is only ever included by headers
and code that are internal to the reftable library, we can set that
macro in this header without impacting anything else but the reftable
library.
Reported-by: Christian Reich <Zottelbart@t-online.de>
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* ps/reftable-sans-compat-util:
Makefile: skip reftable library for Coccinelle
reftable: decouple from Git codebase by pulling in "compat/posix.h"
git-compat-util.h: split out POSIX-emulating bits
compat/mingw: split out POSIX-related bits
reftable/basics: introduce `REFTABLE_UNUSED` annotation
reftable/basics: stop using `SWAP()` macro
reftable/stack: stop using `sleep_millisec()`
reftable/system: introduce `reftable_rand()`
reftable/reader: stop using `ARRAY_SIZE()` macro
reftable/basics: provide wrappers for big endian conversion
reftable/basics: stop using `st_mult()` in array allocators
reftable: stop using `BUG()` in trivial cases
reftable/record: don't `BUG()` in `reftable_record_cmp()`
reftable/record: stop using `BUG()` in `reftable_record_init()`
reftable/record: stop using `COPY_ARRAY()`
reftable/blocksource: stop using `xmmap()`
reftable/stack: stop using `write_in_full()`
reftable/stack: stop using `read_in_full()`
Wire up static analysis via Coccinelle via a new test target
"coccicheck". This target can be executed via `meson compile coccicheck`
and generates the semantic patch for us.
Note that we don't hardcode the list of source and header files that
shall be analyzed, and instead use git-ls-files(1) to find them for us.
This is because we also want to analyze files that may not get built on
the current platform, so finding all sources at configure time is easier
than introducing a new variable that tracks all sources, including those
which aren't being built.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We've got a couple of credential helpers in "contrib/credential", all
of which aren't yet wired up via Meson. Do so.
Note that ideally, we'd also wire up t0303 to be executed with each of
the credential helpers to verify their functionality. Unfortunately
though, none of them pass the test suite right now, so this is left for
a future change.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The "osxkeychain" helper does not compile due to a warning generated by
the unused `argc` parameter. Fix the warning by checking for the minimum
number of required arguments explicitly in the least restrictive way
possible.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The "libsecret" credential helper does not compile when developer
warnings are enabled due to three warnings:
- contrib/credential/libsecret/git-credential-libsecret.c:78:1:
missing initializer for field ‘reserved’ of ‘SecretSchema’
[-Werror=missing-field-initializers]. This issue is fixed by using
designated initializers.
- contrib/credential/libsecret/git-credential-libsecret.c:171:43:
comparison of integer expressions of different signedness: ‘int’
and ‘guint’ {aka ‘unsigned int’} [-Werror=sign-compare]. This
issue is fixed by using an unsigned variable to iterate through
the string vector.
- contrib/credential/libsecret/git-credential-libsecret.c:420:14:
unused parameter ‘argc’ [-Werror=unused-parameter]. This issue is
fixed by checking the number of arguments, but in the least
restrictive way possible.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The git-credential-wincred helper does not compile on Windows with
Microsoft Visual Studio because of our use of `__attribute__()`, which
its compiler doesn't support. While the rest of our codebase would know
to handle this because we redefine the macro in "compat/msvc.h", this
stub isn't available here because we don't include "git-compat-util.h"
in the first place.
Fix the issue by making the attribute depend on the `_MSC_VER`
preprocessor macro.
Signed-off-by: M Hickford <mirth.hickford@gmail.com>
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Tests of the "netrc" credential helper aren't prepared to handle
out-of-tree builds:
- They expect the "test.pl" script to be located relative to the build
directory, even though it is located in the source directory.
- They expect the built "git-credential-netrc" helper to be located
relative to the "test.pl" file, evne though it is loated in the
build directory.
This works alright as long as source and build directories are the same,
but starts to break apart with Meson.
Fix these first issue by using the new "GIT_SOURCE_DIR" variable to
locate the test script itself. And fix the second issue by introducing a
new environment variable "CREDENTIAL_NETRC_PATH" that can be set for
out-of-tree builds to locate the built credential helper.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
A couple of our tests require knowledge around where to find the
project's source directory in order to locate files required for the
test itself. Until now we have been wiring these up ad-hoc via new,
specialized variables catered to the specific usecase. This is quite
awkward though, as every test that potentially needs to locate paths
relative to the source directory needs to grow another variable.
Introduce a new "GIT_SOURCE_DIR" variable into GIT-BUILD-OPTIONS to stop
this proliferation. Remove existing variables that can be derived from
it.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
A previous commit introduced a "promisor.acceptFromServer" configuration
variable with only "None" or "All" as valid values.
Let's introduce "KnownName" and "KnownUrl" as valid values for this
configuration option to give more choice to a client about which
promisor remotes it might accept among those that the server advertised.
In case of "KnownName", the client will accept promisor remotes which
are already configured on the client and have the same name as those
advertised by the client. This could be useful in a corporate setup
where servers and clients are trusted to not switch names and URLs, but
where some kind of control is still useful.
In case of "KnownUrl", the client will accept promisor remotes which
have both the same name and the same URL configured on the client as the
name and URL advertised by the server. This is the most secure option,
so it should be used if possible.
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When a server S knows that some objects from a repository are available
from a promisor remote X, S might want to suggest to a client C cloning
or fetching the repo from S that C may use X directly instead of S for
these objects.
Note that this could happen both in the case S itself doesn't have the
objects and borrows them from X, and in the case S has the objects but
knows that X is better connected to the world (e.g., it is in a
$LARGEINTERNETCOMPANY datacenter with petabit/s backbone connections)
than S. Implementation of the latter case, which would require S to
omit in its response the objects available on X, is left for future
improvement though.
Then C might or might not, want to get the objects from X. If S and C
can agree on C using X directly, S can then omit objects that can be
obtained from X when answering C's request.
To allow S and C to agree and let each other know about C using X or
not, let's introduce a new "promisor-remote" capability in the
protocol v2, as well as a few new configuration variables:
- "promisor.advertise" on the server side, and:
- "promisor.acceptFromServer" on the client side.
By default, or if "promisor.advertise" is set to 'false', a server S will
not advertise the "promisor-remote" capability.
If S doesn't advertise the "promisor-remote" capability, then a client C
replying to S shouldn't advertise the "promisor-remote" capability
either.
If "promisor.advertise" is set to 'true', S will advertise its promisor
remotes with a string like:
promisor-remote=<pr-info>[;<pr-info>]...
where each <pr-info> element contains information about a single
promisor remote in the form:
name=<pr-name>[,url=<pr-url>]
where <pr-name> is the urlencoded name of a promisor remote and
<pr-url> is the urlencoded URL of the promisor remote named <pr-name>.
For now, the URL is passed in addition to the name. In the future, it
might be possible to pass other information like a filter-spec that the
client may use when cloning from S, or a token that the client may use
when retrieving objects from X.
It is C's responsibility to arrange how it can reach X though, so pieces
of information that are usually outside Git's concern, like proxy
configuration, must not be distributed over this protocol.
It might also be possible in the future for "promisor.advertise" to have
other values. For example a value like "onlyName" could prevent S from
advertising URLs, which could help in case C should use a different URL
for X than the URL S is using. (The URL S is using might be an internal
one on the server side for example.)
By default or if "promisor.acceptFromServer" is set to "None", C will
not accept to use the promisor remotes that might have been advertised
by S. In this case, C will not advertise any "promisor-remote"
capability in its reply to S.
If "promisor.acceptFromServer" is set to "All" and S advertised some
promisor remotes, then on the contrary, C will accept to use all the
promisor remotes that S advertised and C will reply with a string like:
promisor-remote=<pr-name>[;<pr-name>]...
where the <pr-name> elements are the urlencoded names of all the
promisor remotes S advertised.
In a following commit, other values for "promisor.acceptFromServer" will
be implemented, so that C will be able to decide the promisor remotes it
accepts depending on the name and URL it received from S. So even if
that name and URL information is not used much right now, it will be
needed soon.
Helped-by: Taylor Blau <me@ttaylorr.com>
Helped-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The reftable library does not use any of the common helpers that the Git
project has. Consequently, most of the rules that we have in Coccinelle
do not apply to the library at all and may even generate false positives
when a pattern can be converted to use a Git helper function.
Exclude reftable library sources from being checked by Coccinelle to
avoid such false positives.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The reftable library includes "git-compat-util.h" in order to get a
POSIX-like programming environment that papers over various differences
between platforms. The header also brings with it a couple of helpers
specific to the Git codebase though, and over time we have started to
use these helpers in the reftable library, as well.
This makes it very hard to use the reftable library as a standalone
library without the rest of the Git codebase, so other libraries like
e.g. libgit2 cannot easily use it. But now that we have removed all
calls to Git-specific functionality and have split out "compat/posix.h"
as a separate header we can address this.
Stop including "git-compat-util.h" and instead include "compat/posix.h"
to finalize the decoupling of the reftable library from the rest of the
Git codebase. The only bits which remain specific to Git are "system.h"
and "system.c", which projects will have to provide.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The "git-compat-util.h" header is a treasure trove of various bits and
pieces used throughout the project. It basically mixes two different
things into one:
- Providing a POSIX-like interface even on platforms that aren't
POSIX-compliant.
- Providing low-level functionality that is specific to Git.
This intermixing is a bit of a problem for the reftable library as we
don't want to recreate the POSIX-like interface there. But neither do we
want to pull in the Git-specific functionality, as it is otherwise quite
easy to start depending on the Git codebase again.
Split out a new header "compat/posix.h" that only contains the bits and
pieces relevant for the emulation of POSIX, which we will start using in
the next commit.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Split out POSIX-related bits from "compat/mingw.h" and "compat/msvc.h".
This is in preparation for splitting up "git-compat-utils.h" into a
header that provides POSIX-compatibility and a header that provides
common wrappers used by the Git project.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Introduce the `REFTABLE_UNUSED` annotation and replace all existing
users of `UNUSED` in the reftable library to use the new macro instead.
Note that we unconditionally define `MAYBE_UNUSED` in the exact same
way, so doing so unconditionally for `REFTABLE_UNUSED` should be fine,
too.
Suggested-by: Toon Claes <toon@iotcl.com>
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Stop using `SWAP()` macro in favor of an open-coded variant of it. Note
that this also requires us to open-code the build assert that `SWAP()`
itself uses to verify that the size of both variables matches.
This is done to reduce our dependency on the Git codebase.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Refactor our use of `sleep_millisec()` by open-coding it with poll(3p),
which is the current implementation of this function. Ideally, we'd use
a more direct way to sleep, but there is no equivalent to sleep(3p) that
would accept milliseconds as input.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Introduce a new system-level `reftable_rand()` function that generates a
single unsigned integer for us. The implementation of this function is
to be provided by the calling codebase, which allows us to more easily
hook into pre-seeded random number generators.
Adapt the two callsites where we generated random data.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We have a single user of the `ARRAY_SIZE()` macro in the reftable
reader. Drop its use to reduce our dependence on the Git codebase.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We're using a mixture of big endian conversion functions provided by
both the reftable library, but also by the Git codebase. Refactor the
code so that we exclusively use reftable-provided wrappers in order to
untangle us from the Git codebase.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We're using `st_mult()` as part of our macro helpers that allocate
arrays. This is bad due two two reasons:
- `st_mult()` causes us to die in case the multiplication overflows.
- `st_mult()` ties us to the Git codebase.
Refactor the code to instead detect overflows manually and return an
error in such cases.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Stop using `BUG()` in the remaining trivial cases that we still have in
the reftable library. Instead of aborting the program, we'll now bubble
up a `REFTABLE_API_ERROR` to indicate misuse of the calling conventions.
Note that in both `reftable_reader_{inc,dec}ref()` we simply stop
calling `BUG()` altogether. The only situation where the counter should
be zero is when the structure has already been free'd anyway, so we
would run into undefined behaviour regardless of whether we try to abort
the program or not.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The reftable library aborts with a bug in case `reftable_record_cmp()`
is invoked with two records of differing types. This would cause the
program to die without the caller being able to handle the error, which
is not something we want in the context of library code. And it ties us
to the Git codebase.
Refactor the code such that `reftable_record_cmp()` returns an error
code separate from the actual comparison result. This requires us to
also adapt some callers up the callchain in a similar fashion.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We're aborting the program via `BUG()` in case `reftable_record_init()`
was invoked with an unknown record type. This is bad because we may now
die in library code, and because it makes us depend on the Git codebase.
Refactor the code such that `reftable_record_init()` can return an error
code to the caller. Adapt any callers accordingly.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Drop our use of `COPY_ARRAY()`, replacing it with an open-coded variant
thereof. This is done to reduce our dependency on the Git library.
While at it, guard the whole array copy logic so that we only copy it in
case there actually is anything to be copied. Otherwise, we may end up
trying to allocate a zero-sized array, which will return a NULL pointer
and thus cause us to return an `REFTABLE_OUT_OF_MEMORY_ERROR`.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We use `xmmap()` to map reftables into memory. This function has two
problems:
- It causes us to die in case the mmap fails.
- It ties us to the Git codebase.
Refactor the code to use mmap(3p) instead with manual error checking.
Note that this function may not be the system-provided mmap(3p), but may
point to our `git_mmap()` wrapper that emulates the syscall on systems
that do not have mmap(3p) available.
Fix `reftable_block_source_from_file()` to properly bubble up the error
code in case the map(3p) call fails.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Similar to the preceding commit, drop our use of `write_in_full()` and
implement a new wrapper `reftable_write_full()` that handles this logic
for us. This is done to reduce our dependency on the Git library.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
There is a single callsite of `read_in_full()` in the reftable library.
Open-code the function to reduce our dependency on the Git library.
Note that we only partially port over the logic from `read_in_full()`
and its underlying `xread()` helper. Most importantly, the latter also
knows to handle `EWOULDBLOCK` via `handle_nonblock()`. This logic is
irrelevant for us though because the reftable library never sets the
`O_NONBLOCK` option in the first place.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The pickaxe options, -G and -S, need either a regex or a string to look
through the history for. An empty value isn't very useful since it
would either match everything or nothing, and what's worse, we presently
crash with a BUG like so when the user provides one:
BUG: diffcore-pickaxe.c:241: should have needle under -G or -S
Since it's not very nice of us to crash and this wouldn't do anything
useful anyway, let's simply inform the user that they must provide a
non-empty argument and exit with an error if they provide an empty one
instead.
Reported-by: Jared Van Bortel <cebtenzzre@gmail.com>
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Acked-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The first line of a commit message is variously called 'title' or
'subject'.
Prefer 'title' unless discussing email.
Signed-off-by: M Hickford <mirth.hickford@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In the html documentation the link to the "OUTPUT" section is surrounded
by square brackets. Fix this by adding explicit link text to the cross
reference.
Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Acked-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add a section for --stdin in the list of options and document that it
implies -z so readers know how to parse the output. Also correct the
merge status documentation for --stdin as if the status is less than
zero "git merge-tree" dies before printing it.
Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Acked-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Commit 9c93ba4d0ae (merge-recursive: honor diff.algorithm, 2024-07-13)
replaced init_merge_options() with init_basic_merge_config() for use in
plumbing commands and init_ui_merge_config() for use in porcelain
commands. As "git merge-tree" is a plumbing command it should call
init_basic_merge_config() rather than init_ui_merge_config(). The merge
ort machinery ignores "diff.algorithm" so the behavior is unchanged by
this commit but it future proofs us against any future changes to
init_ui_merge_config().
Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Acked-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
real_merge() only ever returns "0" or "1" as it dies if the merge status
is less than zero. Therefore the check for "result < 0" is redundant and
the result variable is not needed. The return value of real_merge() is
ignored because exit status of "git merge-tree --stdin" is "0" for both
successful and conflicted merges (the status of each merge is written to
stdout). The return type of real_merge() is not changed as it is used
for the program's exit status when "--stdin" is not given.
Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Acked-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
If a process tries to read the output from "git merge-tree --stdin"
before it closes merge-tree's stdin then it deadlocks. This happens
because merge-tree does not flush its output before trying to read
another line of input and means that it is not possible to cherry-pick a
sequence of commits using "git merge-tree --stdin". Fix this by calling
maybe_flush_or_die() before trying to read the next line of
input. Flushing the output after each merge does not seem to affect the
performance, any difference is lost in the noise even after increasing
the number of runs.
$ git rev-list --merges --parents -n100 origin/master |
sed 's/^[^ ]* //' >/tmp/merges
$ hyperfine -L flush 0,1 --warmup 1 --runs 30 \
'GIT_FLUSH={flush} ./git merge-tree --stdin </tmp/merges'
Benchmark 1: GIT_FLUSH=0 ./git merge-tree --stdin </tmp/merges
Time (mean ± σ): 546.6 ms ± 11.7 ms [User: 503.2 ms, System: 40.9 ms]
Range (min … max): 535.9 ms … 567.7 ms 30 runs
Benchmark 2: GIT_FLUSH=1 ./git merge-tree --stdin </tmp/merges
Time (mean ± σ): 546.9 ms ± 12.0 ms [User: 505.9 ms, System: 38.9 ms]
Range (min … max): 529.8 ms … 570.0 ms 30 runs
Summary
'GIT_FLUSH=0 ./git merge-tree --stdin </tmp/merges' ran
1.00 ± 0.03 times faster than 'GIT_FLUSH=1 ./git merge-tree --stdin </tmp/merges'
Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Acked-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Rename `match_name_with_pattern()` to `match_refname_with_pattern()` to
better reflect its purpose and improve documentation comment clarity.
The previous function name and parameter names were inconsistent, making
it harder to understand their roles in refspec matching.
- Rename parameters:
- `key` -> `pattern` (globbing pattern to match)
- `name` -> `refname` (refname to check)
- `value` -> `replacement` (replacement mapping pattern)
Signed-off-by: Meet Soni <meetsoni3017@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Currently, the "test capability advertisement" test creates some files
with expected content which are used by other tests below it.
To remove that side-effect from this test, let's split up part of
it into a "setup"-type test which creates the files with expected content
which gets reused by multiple tests. This will be useful in a following
commit.
Mentored-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Usman Akinyemi <usmanakinyemi202@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Currently, get_uname_info() function provides the full OS information.
In a following commit, we will need it to provide only the OS name.
Let's extend it to accept a "full" flag that makes it switch between
providing full OS information and providing only the OS name.
We may need to refactor this function in the future if an
`osVersion.format` is added.
Mentored-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Usman Akinyemi <usmanakinyemi202@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Some code from "builtin/bugreport.c" uses uname(2) to get system
information.
Let's refactor this code into a new get_uname_info() function, so
that we can reuse it in a following commit.
Mentored-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Usman Akinyemi <usmanakinyemi202@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The git_user_agent_sanitized() function performs some sanitizing to
avoid special characters being sent over the line and possibly messing
up with the protocol or with the parsing on the other side.
Let's extract this sanitizing into a new redact_non_printables() function,
as we will want to reuse it in a following patch.
For now the new redact_non_printables() function is still static as
it's only needed locally.
While at it, let's use strbuf_detach() to explicitly detach the string
contained by the 'buf' strbuf.
Mentored-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Usman Akinyemi <usmanakinyemi202@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Since the isprint() function checks for printable characters, let's
replace the existing hardcoded ASCII checks with it. However, since
the original checks also handled spaces, we need to account for spaces
explicitly in the new check.
Mentored-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Usman Akinyemi <usmanakinyemi202@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Explicitly set the default goal at the very top of various makefiles.
This is already present in some makefiles, but not all of them.
In particular, this corrects a regression introduced in a38edab7c8
(Makefile: generate doc versions via GIT-VERSION-GEN, 2024-12-06). That
commit added some config files as build targets for the Documentation
directory, and put the target configuration in a sensible place.
Unfortunately, that sensible place was above any other build target
definitions, meaning the default goal changed to being those
configuration files only, rather than the HTML and man page
documentation.
Signed-off-by: Adam Dinwoodie <adam@dinwoodie.org>
Helped-by: Junio C Hamano <gitster@pobox.com>
Acked-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Code clean-up.
* kn/reflog-migration-fix-followup:
reftable: prevent 'update_index' changes after adding records
refs: use 'uint64_t' for 'ref_update.index'
refs: mark `ref_transaction_update_reflog()` as static
Fetching into a bare repository incorrectly assumed it always used
a mirror layout when deciding to update remote-tracking HEAD, which
has been corrected.
* bf/fetch-set-head-fix:
fetch set_head: fix non-mirror remotes in bare repositories
fetch set_head: refactor to use remote directly
Going into a secondary worktree and asking "is the main worktree
bare?" did not work correctly when per-worktree configuration
option was in use, which has been corrected.
* op/worktree-is-main-bare-fix:
worktree: detect from secondary worktree if main worktree is bare
"git clone" learned to make a shallow clone for a single commit
that is not necessarily be at the tip of any branch.
* tc/clone-single-revision:
builtin/clone: teach git-clone(1) the --revision= option
parse-options: introduce die_for_incompatible_opt2()
clone: introduce struct clone_opts in builtin/clone.c
clone: add tags refspec earlier to fetch refspec
clone: refactor wanted_peer_refs()
clone: make it possible to specify --tags
clone: cut down on global variables in clone.c
All the documentation .txt files have been renamed to .adoc to help
content aware editors.
* bc/doc-adoc-not-txt:
Remove obsolete ".txt" extensions for AsciiDoc files
doc: use .adoc extension for AsciiDoc files
gitattributes: mark AsciiDoc files as LF-only
editorconfig: add .adoc extension
doc: update gitignore for .adoc extension
When 'remote.<name>.followRemoteHEAD' was added in b7f7d16562 (fetch:
add configuration for set_head behaviour, 2024-11-29), its description
was added to remote.txt in between the two paragraphs describing
'remote.<name>.serverOption'. Reunite these two paragraphs.
Signed-off-by: Philippe Blain <levraiphilippeblain@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Avoid O(n^2) complexity in `process_renames()` when building a sorted
`string_list` by constructing it unsorted and sorting it afterward,
reducing the complexity to O(n log n).
Signed-off-by: Meet Soni <meetsoni3017@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Back in 728b9ac0c3 (Makefile(s): avoid recipe prefix in conditional
statements, 2024-04-08), we prepared our Makefiles for a forthcoming
change in upstream Make that would ban the recipe prefix within a
conditional statement by replacing tabs (the prefix) with eight spaces.
In b9d6f64393 (compat/zlib: allow use of zlib-ng as backend,
2025-01-28), a handful of recipe prefix characters were introduced in a
conditional statement ('ifdef ZLIB_NG'), causing 'make' to fail on my
system, which uses GNU Make 4.4.90.
Remove the recipe prefix characters by replacing them with the same
script as is mentioned in 728b9ac0c3.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git -c help.autocorrect=0 psuh" shows the suggested typofix,
unlike the previous attempt in the base topic.
* da/help-autocorrect-one-fix:
help: add "show" as a valid configuration value
help: show the suggested command when help.autocorrect is false
"[help] autocorrect = 1" used to be a way to say "please wait for
0.1 second after suggesting a typofix of the command name before
running that command"; now it means "yes, if there is a plausible
typofix for the command name, please run it immediately".
* sc/help-autocorrect-one:
help: interpret boolean string values for help.autocorrect
Foreign language interface for Rust into our code base has been added.
* js/libgit-rust:
libgit: add higher-level libgit crate
libgit-sys: also export some config_set functions
libgit-sys: introduce Rust wrapper for libgit.a
common-main: split init and exit code into new files
"git repack --keep-unreachable" to send unreachable objects to the
main pack "git repack -ad" produces did not work when there is no
existing packs, which has been corrected.
* ps/repack-keep-unreachable-in-unpacked-repo:
builtin/repack: fix `--keep-unreachable` when there are no packs
"git pack-objects" and its wrapper "git repack" learned an option
to use an alternative path-hash function to improve delta-base
selection to produce a packfile with deeper history than window
size.
* ds/name-hash-tweaks:
pack-objects: prevent name hash version change
test-tool: add helper for name-hash values
p5313: add size comparison test
pack-objects: add GIT_TEST_NAME_HASH_VERSION
repack: add --name-hash-version option
pack-objects: add --name-hash-version option
pack-objects: create new name-hash function version
The comparisons all involve comparisons against unsigned values.
Signed-off-by: David Aguilar <davvid@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The loop iteration variable is non-negative and used in comparisons
against a size_t value. Use size_t to eliminate the mismatch.
Signed-off-by: David Aguilar <davvid@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The comparisons all involve unsigned variables. Cast the comparison
to unsigned to eliminate the mismatch.
Signed-off-by: David Aguilar <davvid@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The unsigned `ignored` variable causes expressions to promote to
unsigned. Use a signed value to make comparisons use the same types.
Signed-off-by: David Aguilar <davvid@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The loop iteration variable is non-negative and only used in comparisons
against other size_t values.
Signed-off-by: David Aguilar <davvid@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Allow each file to fix the warnings guarded by the macro separately by
moving the definition from the shared xinclude.h into each file that
needs it.
xmerge.c and xprepare.c do not contain any signed vs. unsigned
comparisons so the definition was not included in these files.
Signed-off-by: David Aguilar <davvid@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The -X renormalize (or merge.renormalize config) option is intended to
reduce conflicts due to normalization of newer versions of history. It
does so by renormalizing files that it is about to do a three-way
content merge on. Some folks thought it would renormalize all files
throughout the tree, and the previous wording wasn't clear enough to
dispell that misconception. Update the docs to make it clear that the
merge machinery will only apply renormalization to files which need a
three-way content merge.
(Technically, the merge machinery also does renormalization on
modify/delete conflicts, in order to see if the modification was merely
a normalization; if so, it can accept the delete and not report a
conflict. But it's not clear that this piece needs to be explained to
users, and trying to distinguish it might feel like splitting hairs and
overcomplicating the explanation, so we leave it out.)
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We do not seem to centrally document exhaustively ways to spell
Boolean values.
The description in the Environment Variables of git(1) section
assumes that the reader is already familiar with how "Boolean valued
configuration variables" are specified, without referring to
anything, so there is no way for the readers to find out more.
The description of `bool` in the section on "--type
<type>" in "git config --help" might be the place to do so, but it
is not telling us all that much.
The description of Boolean valued placeholders in the pretty formats
section of "git log --help" enumerates the possible values with "etc."
implying there may be other synonyms; shrink the list of samples and
instead refer to the canonical and authoritative source of truth, which
now is git-config(1).
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When rebase rewords a commit it picks the commit and then runs "git
commit --amend" to reword it. When the commit is picked the sequencer
tries to reuse existing commits by fast-forwarding if the parents are
unchanged. Rewording an empty commit that has been fast-forwarded fails
because "git commit --amend" is called without "--allow-empty". This
happens because when a commit is fast-forwarded the logic that checks
whether we should pass "--allow-empty" is skipped. Fix this by always
passing "--allow-empty" when rewording a commit. This is safe because we
are amending a commit that has already been picked so if it had become
empty when it was picked we'd have already returned an error.
As "git commit" will happily create empty merge commits without
"--allow-empty" we do not need to pass that flag when rewording merge
commits.
Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Remove the_repository global variable in favor of the repository
argument that gets passed in "builtin/update-server-info.c".
When `-h` is passed to the command outside a Git repository, the
`run_builtin()` will call the `cmd_update_server_info()` function
with `repo` set to NULL and then early in the function, "parse_options()"
call will give the options help and exit, without having to consult much
of the configuration file. So it is safe to omit reading the config when
`repo` argument the caller gave us is NULL.
Mentored-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Usman Akinyemi <usmanakinyemi202@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The use of "echo -e" is not portable and not specified by POSIX. dash
does not support any options except "-n", and so this script will not
work on operating systems which use that as /bin/sh.
Fortunately, the solution is easy: switch to printf(1), which is
specified by POSIX and allows the escape sequences we want to use. This
will allow the script to work with any POSIX shell.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Convert a handful of unit tests to work with the clar framework.
* sk/unit-tests-0130:
t/unit-tests: convert strcmp-offset test to use clar test framework
t/unit-tests: convert strbuf test to use clar test framework
t/unit-tests: adapt example decorate test to use clar test framework
t/unit-tests: convert hashmap test to use clar test framework
Further code clean-up on the use of hash functions. Now the
context object knows what hash function it is working with.
* ps/hash-cleanup:
global: adapt callers to use generic hash context helpers
hash: provide generic wrappers to update hash contexts
hash: stop typedeffing the hash context
hash: convert hashing context to a structure
Two CI tasks, whitespace check and style check, work on the
difference from the base version and the version being checked, but
the base was computed incorrectly in GitLab CI in some cases, which
has been corrected.
* jt/gitlab-ci-base-fix:
ci: fix base commit fallback for check-whitespace and check-style
"git apply" internally uses unsigned long for line numbers and uses
strtoul() to parse numbers on the hunk headers. It however forgot
to check parse errors.
* pw/apply-ulong-overflow-check:
apply: detect overflow when parsing hunk header
"git init" to reinitialize a repository that already exists cannot
change the hash function and ref backends; such a request is
silently ignored now.
* ps/setup-reinit-fixes:
setup: fix reinit of repos with incompatible GIT_DEFAULT_HASH
setup: fix reinit of repos with incompatible GIT_DEFAULT_REF_FORMAT
t0001: remove duplicate test
`test_path_is_file` provides a better output when asserting whether a
file exists. Replace the occurrences of `test -f` in t7603 with it,
facilitating the trace of possible test failures.
Signed-off-by: Lucas Oshiro <lucasseikioshiro@gmail.com>
Acked-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Remove `git_common_path()` in favor of the `repo_common_path()` family
of functions, which makes the implicit dependency on `the_repository` go
away.
Note that `git_common_path()` used to return a string allocated via
`get_pathname()`, which uses a rotating set of statically allocated
buffers. Consequently, callers didn't have to free the returned string.
The same isn't true for `repo_common_path()`, so we also have to add
logic to free the returned strings.
This refactoring also allows us to remove `repo_common_pathv()` from the
public interface.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The `get_worktree_git_dir()` function returns a string constant that
does not need to be free'd by the caller. This string is computed for
three different cases:
- If we don't have a worktree we return a path into the Git directory.
The returned string is owned by `the_repository`, so there is no
need for the caller to free it.
- If we have a worktree, but no worktree ID then the caller requests
the main worktree. In this case we return a path into the common
directory, which again is owned by `the_repository` and thus does
not need to be free'd.
- In the third case, where we have an actual worktree, we compute the
path relative to "$GIT_COMMON_DIR/worktrees/". This string does not
need to be released either, even though `git_common_path()` ends up
allocating memory. But this doesn't result in a memory leak either
because we write into a buffer returned by `get_pathname()`, which
returns one out of four static buffers.
We're about to drop `git_common_path()` in favor of `repo_common_path()`,
which doesn't use the same mechanism but instead returns an allocated
string owned by the caller. While we could adapt `get_worktree_git_dir()`
to also use `get_pathname()` and print the derived common path into that
buffer, the whole schema feels a lot like premature optimization in this
context. There are some callsites where we call `get_worktree_git_dir()`
in a loop that iterates through all worktrees. But none of these loops
seem to be even remotely in the hot path, so saving a single allocation
there does not feel worth it.
Refactor the function to instead consistently return an allocated path
so that we can start using `repo_common_path()` in a subsequent commit.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Remove `git_path_buf()` in favor of `repo_git_path_replace()`. The
latter does essentially the same, with the only exception that it does
not rely on `the_repository` but takes the repo as separate parameter.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Remove `git_pathdup()` in favor of `repo_git_path()`. The latter does
essentially the same, with the only exception that it does not rely on
`the_repository` but takes the repo as separate parameter.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The `strbuf_git_path()` function isn't used anywhere, and neither should
it grow any callers because it depends on `the_repository`. Remove it.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
As explained in an earlier commit, we're refactoring path-related
functions to provide a consistent interface for computing paths into the
commondir, gitdir and worktree. Refactor the "submodule" family of
functions accordingly.
Note that in contrast to the other `repo_*_path()` families, we have to
pass in the repository as a non-constant pointer. This is because we end
up calling `repo_read_gitmodules()` deep down in the callstack, which
may end up modifying the repository.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The `submodule_to_gitdir()` function implicitly uses `the_repository` to
resolve submodule paths. Refactor the function to instead accept a repo
as parameter to remove the dependency on global state.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
As explained in an earlier commit, we're refactoring path-related
functions to provide a consistent interface for computing paths into the
commondir, gitdir and worktree. Refactor the "worktree" family of
functions accordingly.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
As explained in an earlier commit, we're refactoring path-related
functions to provide a consistent interface for computing paths into the
commondir, gitdir and worktree. Refactor the "gitdir" family of
functions accordingly.
Note that the `repo_git_pathv()` function is converted into an internal
implementation detail. It is only used to implement `the_repository`
compatibility shims and will eventually be removed from the public
interface.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The functions provided by the "path" subsystem to derive repository
paths for the commondir, gitdir, worktrees and submodules are quite
inconsistent. Some functions have a `strbuf_` prefix, others have
different return values, some don't provide a variant working on top of
`strbuf`s.
We're thus about to refactor all of these family of functions so that
they follow a common pattern:
- `repo_*_path()` returns an allocated string.
- `repo_*_path_append()` appends the path to the caller-provided
buffer while returning a constant pointer to the buffer. This
clarifies whether the buffer is being appended to or rewritten,
which otherwise wasn't immediately obvious.
- `repo_*_path_replace()` replaces contents of the buffer with the
computed path, again returning a pointer to the buffer contents.
The returned constant pointer isn't being used anywhere yet, but it will
be used in subsequent commits. Its intent is to allow calling patterns
like the following somewhat contrived example:
if (!stat(&st, repo_common_path_replace(repo, &buf, ...)) &&
!unlink(repo_common_path_replace(repo, &buf, ...)))
...
Refactor the commondir family of functions accordingly and adapt all
callers.
Note that `repo_common_pathv()` is converted into an internal
implementation detail. It is only used to implement `the_repository`
compatibility shims and will eventually be removed from the public
interface.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The code paths to interact with zlib has been cleaned up in
preparation for building with zlib-ng.
* ps/zlib-ng:
ci: make "linux-musl" job use zlib-ng
ci: switch linux-musl to use Meson
compat/zlib: allow use of zlib-ng as backend
git-zlib: cast away potential constness of `next_in` pointer
compat/zlib: provide stubs for `deflateSetHeader()`
compat/zlib: provide `deflateBound()` shim centrally
git-compat-util: move include of "compat/zlib.h" into "git-zlib.h"
compat: introduce new "zlib.h" header
git-compat-util: drop `z_const` define
compat: drop `uncompress2()` compatibility shim
The code path used when "git fetch" fetches from a bundle file
closed the same file descriptor twice, which sometimes broke things
unexpectedly when the file descriptor was reused, which has been
corrected.
* js/bundle-unbundle-fd-reuse-fix:
bundle: avoid closing file descriptor twice
CI updates (containerization, dropping stale ones, etc.).
* ps/ci-misc-updates:
ci: remove stale code for Azure Pipelines
ci: use latest Ubuntu release
ci: stop special-casing for Ubuntu 16.04
gitlab-ci: add linux32 job testing against i386
gitlab-ci: remove the "linux-old" job
github: simplify computation of the job's distro
github: convert all Linux jobs to be containerized
github: adapt containerized jobs to be rootless
t7422: fix flaky test caused by buffered stdout
t0060: fix EBUSY in MinGW when setting up runtime prefix
Remove the USE_THE_REPOSITORY_VARIABLE #define now that all
state is passed to each function from callers.
Signed-off-by: David Aguilar <davvid@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Make callers pass a repository struct into each function instead
of relying on the global the_repository variable.
Signed-off-by: David Aguilar <davvid@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Move difftool's global variables into a difftools_option struct
in preparation for removal of USE_THE_REPOSITORY_VARIABLE.
Signed-off-by: David Aguilar <davvid@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In Git v2.44.0 support for 'git archive' over HTTP protocol
was added, but it was nowhere documented how it should be
enabled in git-http-backend.
Add missing documentation.
Signed-off-by: Piotr Szlazak <piotr.szlazak@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The git-clone(1) command has the option `--branch` that allows the user
to select the branch they want HEAD to point to. In a non-bare
repository this also checks out that branch.
Option `--branch` also accepts a tag. When a tag name is provided, the
commit this tag points to is checked out and HEAD is detached. Thus
`--branch` can be used to clone a repository and check out a ref kept
under `refs/heads` or `refs/tags`. But some other refs might be in use
as well. For example Git forges might use refs like `refs/pull/<id>` and
`refs/merge-requests/<id>` to track pull/merge requests. These refs
cannot be selected upon git-clone(1).
Add option `--revision` to git-clone(1). This option accepts a fully
qualified reference, or a hexadecimal commit ID. This enables the user
to clone and check out any revision they want. `--revision` can be used
in conjunction with `--depth` to do a minimal clone that only contains
the blob and tree for a single revision. This can be useful for
automated tests running in CI systems.
Using option `--branch` and `--single-branch` together is a similar
scenario, but serves a different purpose. Using these two options, a
singlet remote tracking branch is created and the fetch refspec is set
up so git-fetch(1) will receive updates on that branch from the remote.
This allows the user work on that single branch.
Option `--revision` on contrary detaches HEAD, creates no tracking
branches, and writes no fetch refspec.
Signed-off-by: Toon Claes <toon@iotcl.com>
Acked-by: Patrick Steinhardt <ps@pks.im>
[jc: removed unnecessary TEST_PASSES_SANITIZE_LEAK from the test]
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The functions die_for_incompatible_opt3() and
die_for_incompatible_opt4() already exist to die whenever a user
specifies three or four options respectively that are not compatible.
Introduce die_for_incompatible_opt2() which dies when two options that
are incompatible are set.
Signed-off-by: Toon Claes <toon@iotcl.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
There is a lot of state stored in global variables in builtin/clone.c.
In the long run we'd like to remove many of those.
Introduce `struct clone_opts` in this file. This struct will be used to
contain all details needed to perform the clone. The struct object can
be thrown around to all the functions that need these details.
The first field we're adding is `wants_head`. In some scenarios
(specifically when both `--single-branch` and `--branch` are given) we
are not interested in `HEAD` on the remote. The field `wants_head` in
`struct clone_opts` will hold this information. We could have put
`option_branch` and `option_single_branch` into that struct instead, but
in a following commit we'll be using `wants_head` as well.
Signed-off-by: Toon Claes <toon@iotcl.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Option --no-tags was added in 0dab2468ee (clone: add a --no-tags option
to clone without tags, 2017-04-26). At the time there was no need to
support --tags as well, although there was some conversation about
it[1].
To simplify the code and to prepare for future commits, invert the flag
internally. Functionally there is no change, because the flag is
default-enabled passing `--tags` has no effect, so there's no need to
add tests for this.
[1]: https://lore.kernel.org/git/CAGZ79kbHuMpiavJ90kQLEL_AR0BEyArcZoEWAjPPhOFacN16YQ@mail.gmail.com/
Signed-off-by: Toon Claes <toon@iotcl.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In clone.c we call refspec_ref_prefixes() to copy the fetch refspecs
from the `remote->fetch` refspec into `ref_prefixes` of
`transport_ls_refs_options`. Afterwards we add the tags prefix
`refs/tags/` prefix as well. At a later point, in wanted_peer_refs() we
process refs using both `remote->fetch` and `TAG_REFSPEC`.
Simplify the code by appending `TAG_REFSPEC` to `remote->fetch` before
calling refspec_ref_prefixes().
To be able to do this, we set `option_tags` to 0 when --mirror is given.
This is because --mirror mirrors (hence the name) all the refs,
including tags and they do not need to be treated separately.
Signed-off-by: Toon Claes <toon@iotcl.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In clone.c the `struct option` which is used to parse the input options
for git-clone(1) is a global variable. Due to this, many variables that
are used to parse the value into, are also global.
Make `builtin_clone_options` a local variable in cmd_clone() and carry
along all variables that are only used in that function.
Signed-off-by: Toon Claes <toon@iotcl.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The function wanted_peer_refs() is used to map the refs returned by the
server to refs we will save in our clone.
Over time this function grown to be very complex. Refactor it.
Previously, there was a separate code path for when
`option_single_branch` was set. It resulted in duplicated code and
deeper nested conditions. After this refactor the code path for when
`option_single_branch` is truthy modifies `refs` and then falls through
to the common code path. This approach relies on the `refspec` being set
correctly and thus only mapping refs that are relevant.
Signed-off-by: Toon Claes <toon@iotcl.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When extensions.worktreeConfig is true and the main worktree is
bare -- that is, its config.worktree file contains core.bare=true
-- commands run from secondary worktrees incorrectly see the main
worktree as not bare. As such, those commands incorrectly think
that the repository's default branch (typically "main" or
"master") is checked out in the bare repository even though it's
not. This makes it impossible, for instance, to checkout or delete
the default branch from a secondary worktree, among other
shortcomings.
This problem occurs because, when extensions.worktreeConfig is
true, commands run in secondary worktrees only consult
$commondir/config and $commondir/worktrees/<id>/config.worktree,
thus they never see the main worktree's core.bare=true setting in
$commondir/config.worktree.
Fix this problem by consulting the main worktree's config.worktree
file when checking whether it is bare. (This extra work is
performed only when running from a secondary worktree.)
Helped-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Olga Pilipenco <olga.pilipenco@shopify.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
0a01d41ee4 (http: add support for different sslcert and sslkey types.,
2023-03-20) added useful SSL config options, but did not document them.
Signed-off-by: Andrew Carter <andrew@emailcarter.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Additional information about missing objects found in git-rev-list(1)
can be printed by specifying the `print-info` missing action for the
`--missing` option. Extend this action to also print missing object type
information inferred from its containing object. This token follows the
form `type=<type>` and specifies the expected object type of the missing
object.
Signed-off-by: Justin Tobler <jltobler@gmail.com>
Acked-by: Christian Couder <christian.couder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Missing objects identified through git-rev-list(1) can be printed by
setting the `--missing=print` option. Additional information about the
missing object, such as its path and type, may be present in its
containing object.
Add the `print-info` missing action for the `--missing` option that,
when set, prints additional insight about the missing object inferred
from its containing object. Each line of output for a missing object is
in the form: `?<oid> [<token>=<value>]...`. The `<token>=<value>` pairs
containing additional information are separated from each other by a SP.
The value is encoded in a token specific fashion, but SP or LF contained
in value are always expected to be represented in such a way that the
resulting encoded value does not have either of these two problematic
bytes. This format is kept generic so it can be extended in the future
to support additional information.
For now, only a missing object path info is implemented. It follows the
form `path=<path>` and specifies the full path to the object from the
top-level tree. A path containing SP or special characters is enclosed
in double-quotes in the C style as needed. In a subsequent commit,
missing object type info will also be added.
Signed-off-by: Justin Tobler <jltobler@gmail.com>
Acked-by: Christian Couder <christian.couder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The "--keep-unreachable" flag is supposed to append any unreachable
objects to the newly written pack. This flag is explicitly documented as
appending both packed and loose unreachable objects to the new packfile.
And while this works alright when repacking with preexisting packfiles,
it stops working when the repository does not have any packfiles at all.
The root cause are the conditions used to decide whether or not we want
to append "--pack-loose-unreachable" to git-pack-objects(1). There are
a couple of conditions here:
- `has_existing_non_kept_packs()` checks whether there are existing
packfiles. This condition makes sense to guard "--keep-pack=",
"--unpack-unreachable" and "--keep-unreachable", because all of
these flags only make sense in combination with existing packfiles.
But it does not make sense to disable `--pack-loose-unreachable`
when there aren't any preexisting packfiles, as loose objects can be
packed into the new packfile regardless of that.
- `delete_redundant` checks whether we want to delete any objects or
packs that are about to become redundant. The documentation of
`--keep-unreachable` explicitly says that `git repack -ad` needs to
be executed for the flag to have an effect.
It is not immediately obvious why such redundant objects need to be
deleted in order for "--pack-unreachable-objects" to be effective.
But as things are working as documented this is nothing we'll change
for now.
- `pack_everything & PACK_CRUFT` checks that we're not creating a
cruft pack. This condition makes sense in the context of
"--pack-loose-unreachable", as unreachable objects would end up in
the cruft pack anyway.
So while the second and third condition are sensible, it does not make
any sense to condition `--pack-loose-unreachable` on the existence of
packfiles.
Fix the bug by splitting out the "--pack-loose-unreachable" and only
making it depend on the second and third condition. Like this, loose
unreachable objects will be packed regardless of any preexisting
packfiles.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Acked-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Move the `valid_remote_name()` function from the refspec subsystem to
the remote subsystem to better align with the separation of concerns.
Signed-off-by: Meet Soni <meetsoni3017@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Move the functions `apply_refspecs()` and `apply_negative_refspecs()`
from `remote.c` to `refspec.c`. These functions focus on applying
refspecs, so centralizing them in `refspec.c` improves code organization
by keeping refspec-related logic in one place.
Signed-off-by: Meet Soni <meetsoni3017@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Move the functions `refspec_find_match()`, `refspec_find_all_matches()`
and `refspec_find_negative_match()` from `remote.c` to `refspec.c`.
These functions focus on matching refspecs, so centralizing them in
`refspec.c` improves code organization by keeping refspec-related logic
in one place.
Signed-off-by: Meet Soni <meetsoni3017@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Rename functions related to handling refspecs in preparation for their
move from `remote.c` to `refspec.c`. Update their names to better
reflect their intent:
- `query_refspecs()` -> `refspec_find_match()` for clarity, as it
finds a single matching refspec.
- `query_refspecs_multiple()` -> `refspec_find_all_matches()` to
better reflect that it collects all matching refspecs instead of
returning just the first match.
- `query_matches_negative_refspec()` ->
`refspec_find_negative_match()` for consistency with the
updated naming convention, even though this static function
didn't strictly require renaming.
Signed-off-by: Meet Soni <meetsoni3017@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Move the functions `refname_matches_negative_refspec_item()`,
`refspec_match()`, and `match_name_with_pattern()` from `remote.c` to
`refspec.c`. These functions focus on refspec matching, so placing them
in `refspec.c` aligns with the separation of concerns. Keep
refspec-related logic in `refspec.c` and remote-specific logic in
`remote.c` for better code organization.
Signed-off-by: Meet Soni <meetsoni3017@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Rename the function `omit_name_by_refspec()` to
`refname_matches_negative_refspec_item()` to provide clearer intent.
The previous function name was vague and did not accurately describe its
purpose. By using `refname_matches_negative_refspec_item`, make the
function's purpose more intuitive, clarifying that it checks if a
reference name matches any negative refspec.
Rename function parameters for consistency with existing naming
conventions. Use `refname` instead of `name` to align with terminology
in `refs.h`.
Remove the redundant doc comment since the function name is now
self-explanatory.
Signed-off-by: Meet Soni <meetsoni3017@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The previous change introduced the '--[no-]sparse' option for the 'git
backfill' command, but did not assume it as enabled by default. However,
this is likely the behavior that users will most often want to happen.
Without this default, users with a small sparse-checkout may be confused
when 'git backfill' downloads every version of every object in the full
history.
However, this is left as a separate change so this decision can be reviewed
independently of the value of the '--[no-]sparse' option.
Add a test of adding the '--sparse' option to a repo without sparse-checkout
to make it clear that supplying it without a sparse-checkout is an error.
Signed-off-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
One way to significantly reduce the cost of a Git clone and later fetches is
to use a blobless partial clone and combine that with a sparse-checkout that
reduces the paths that need to be populated in the working directory. Not
only does this reduce the cost of clones and fetches, the sparse-checkout
reduces the number of objects needed to download from a promisor remote.
However, history investigations can be expensive as computing blob diffs
will trigger promisor remote requests for one object at a time. This can be
avoided by downloading the blobs needed for the given sparse-checkout using
'git backfill' and its new '--sparse' mode, at a time that the user is
willing to pay that extra cost.
Note that this is distinctly different from the '--filter=sparse:<oid>'
option, as this assumes that the partial clone has all reachable trees and
we are using client-side logic to avoid downloading blobs outside of the
sparse-checkout cone. This avoids the server-side cost of walking trees
while also achieving a similar goal. It also downloads in batches based on
similar path names, presenting a resumable download if things are
interrupted.
This augments the path-walk API to have a possibly-NULL 'pl' member that may
point to a 'struct pattern_list'. This could be more general than the
sparse-checkout definition at HEAD, but 'git backfill --sparse' is currently
the only consumer.
Be sure to test this in both cone mode and not cone mode. Cone mode has the
benefit that the path-walk can skip certain paths once they would expand
beyond the sparse-checkout. Non-cone mode can describe the included files
using both positive and negative patterns, which changes the possible return
values of path_matches_pattern_list(). Test both kinds of matches for
increased coverage.
To test this, we can create a blobless sparse clone, expand the
sparse-checkout slightly, and then run 'git backfill --sparse' to see
how much data is downloaded. The general steps are
1. git clone --filter=blob:none --sparse <url>
2. git sparse-checkout set <dir1> ... <dirN>
3. git backfill --sparse
For the Git repository with the 'builtin' directory in the
sparse-checkout, we get these results for various batch sizes:
| Batch Size | Pack Count | Pack Size | Time |
|-----------------|------------|-----------|-------|
| (Initial clone) | 3 | 110 MB | |
| 10K | 12 | 192 MB | 17.2s |
| 15K | 9 | 192 MB | 15.5s |
| 20K | 8 | 192 MB | 15.5s |
| 25K | 7 | 192 MB | 14.7s |
This case matters less because a full clone of the Git repository from
GitHub is currently at 277 MB.
Using a copy of the Linux repository with the 'kernel/' directory in the
sparse-checkout, we get these results:
| Batch Size | Pack Count | Pack Size | Time |
|-----------------|------------|-----------|------|
| (Initial clone) | 2 | 1,876 MB | |
| 10K | 11 | 2,187 MB | 46s |
| 25K | 7 | 2,188 MB | 43s |
| 50K | 5 | 2,194 MB | 44s |
| 100K | 4 | 2,194 MB | 48s |
This case is more meaningful because a full clone of the Linux
repository is currently over 6 GB, so this is a valuable way to download
a fraction of the repository and no longer need network access for all
reachable objects within the sparse-checkout.
Choosing a batch size will depend on a lot of factors, including the
user's network speed or reliability, the repository's file structure,
and how many versions there are of the file within the sparse-checkout
scope. There will not be a one-size-fits-all solution.
Signed-off-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Users may want to specify a minimum batch size for their needs. This is only
a minimum: the path-walk API provides a list of OIDs that correspond to the
same path, and thus it is optimal to allow delta compression across those
objects in a single server request.
We could consider limiting the request to have a maximum batch size in the
future. For now, we let the path-walk API batches determine the
boundaries.
To get a feeling for the value of specifying the --min-batch-size parameter,
I tested a number of open source repositories available on GitHub. The
procedure was generally:
1. git clone --filter=blob:none <url>
2. git backfill
Checking the number of packfiles and the size of the .git/objects/pack
directory helps to identify the effects of different batch sizes.
For the Git repository, we get these results:
| Batch Size | Pack Count | Pack Size | Time |
|-----------------|------------|-----------|-------|
| (Initial clone) | 2 | 119 MB | |
| 25K | 8 | 290 MB | 24s |
| 50K | 5 | 290 MB | 24s |
| 100K | 4 | 290 MB | 29s |
Other than the packfile counts decreasing as we need fewer batches, the
size and time required is not changing much for this small example.
For the nodejs/node repository, we see these results:
| Batch Size | Pack Count | Pack Size | Time |
|-----------------|------------|-----------|--------|
| (Initial clone) | 2 | 330 MB | |
| 25K | 19 | 1,222 MB | 1m 22s |
| 50K | 11 | 1,221 MB | 1m 24s |
| 100K | 7 | 1,223 MB | 1m 40s |
| 250K | 4 | 1,224 MB | 2m 23s |
| 500K | 3 | 1,216 MB | 4m 38s |
Here, we don't have much difference in the size of the repo, though the
500K batch size results in a few MB gained. That comes at a cost of a
much longer time. This extra time is due to server-side delta
compression happening as the on-disk deltas don't appear to be reusable
all the time. But for smaller batch sizes, the server is able to find
reasonable deltas partly because we are asking for objects that appear
in the same region of the directory tree and include all versions of a
file at a specific path.
To contrast this example, I tested the microsoft/fluentui repo, which
has been known to have inefficient packing due to name hash collisions.
These results are found before GitHub had the opportunity to repack the
server with more advanced name hash versions:
| Batch Size | Pack Count | Pack Size | Time |
|-----------------|------------|-----------|--------|
| (Initial clone) | 2 | 105 MB | |
| 5K | 53 | 348 MB | 2m 26s |
| 10K | 28 | 365 MB | 2m 22s |
| 15K | 19 | 407 MB | 2m 21s |
| 20K | 15 | 393 MB | 2m 28s |
| 25K | 13 | 417 MB | 2m 06s |
| 50K | 8 | 509 MB | 1m 34s |
| 100K | 5 | 535 MB | 1m 56s |
| 250K | 4 | 698 MB | 1m 33s |
| 500K | 3 | 696 MB | 1m 42s |
Here, a larger variety of batch sizes were chosen because of the great
variation in results. By asking the server to download small batches
corresponding to fewer paths at a time, the server is able to provide
better compression for these batches than it would for a regular clone.
A typical full clone for this repository would require 738 MB.
This example justifies the choice to batch requests by path name,
leading to improved communication with a server that is not optimally
packed.
Finally, the same experiment for the Linux repository had these results:
| Batch Size | Pack Count | Pack Size | Time |
|-----------------|------------|-----------|---------|
| (Initial clone) | 2 | 2,153 MB | |
| 25K | 63 | 6,380 MB | 14m 08s |
| 50K | 58 | 6,126 MB | 15m 11s |
| 100K | 30 | 6,135 MB | 18m 11s |
| 250K | 14 | 6,146 MB | 18m 22s |
| 500K | 8 | 6,143 MB | 33m 29s |
Even in this example, where the default name hash algorithm leads to
decent compression of the Linux kernel repository, there is value for
selecting a smaller batch size, to a limit. The 25K batch size has the
fastest time, but uses 250 MB more than the 50K batch size. The 500K
batch size took much more time due to server compression time and thus
we should avoid large batch sizes like this.
Based on these experiments, a batch size of 50,000 was chosen as the
default value.
Signed-off-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The default behavior of 'git backfill' is to fetch all missing blobs that
are reachable from HEAD. Document and test this behavior.
The implementation is a very simple use of the path-walk API, initializing
the revision walk at HEAD to start the path-walk from all commits reachable
from HEAD. Ignore the object arrays that correspond to tree entries,
assuming that they are all present already.
The path-walk API provides lists of objects in batches according to a
common path, but that list could be very small. We want to balance the
number of requests to the server with the ability to have the process
interrupted with minimal repeated work to catch up in the next run.
Based on some experiments (detailed in the next change) a minimum batch
size of 50,000 is selected for the default.
This batch size is a _minimum_. As the path-walk API emits lists of blob
IDs, they are collected into a list of objects for a request to the
server. When that list is at least the minimum batch size, then the
request is sent to the server for the new objects. However, the list of
blob IDs from the path-walk API could be much longer than the batch
size. At this moment, it is unclear if there is a benefit to split the
list when there are too many objects at the same path.
Signed-off-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In anticipation of implementing 'git backfill', populate the necessary files
with the boilerplate of a new builtin. Mark the builtin as experimental at
this time, allowing breaking changes in the near future, if necessary.
Signed-off-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* master: (446 commits)
The seventh batch
The sixth batch
The fifth batch
The fourth batch
refs/reftable: fix uninitialized memory access of `max_index`
remote: announce removal of "branches/" and "remotes/"
The third batch
hash.h: drop unsafe_ function variants
csum-file: introduce hashfile_checkpoint_init()
t/helper/test-hash.c: use unsafe_hash_algo()
csum-file.c: use unsafe_hash_algo()
hash.h: introduce `unsafe_hash_algo()`
csum-file.c: extract algop from hashfile_checksum_valid()
csum-file: store the hash algorithm as a struct field
t/helper/test-tool: implement sha1-unsafe helper
trace2: prevent segfault on config collection with valueless true
refs: fix creation of reflog entries for symrefs
ci: wire up Visual Studio build with Meson
ci: raise error when Meson generates warnings
meson: fix compilation with Visual Studio
...
Patrick reported an issue that the exit code of git-receive-pack(1) is
ignored during atomic push with "--porcelain" flag, and added new test
cases in t5543.
This issue originated from commit 7dcbeaa0df (send-pack: fix
inconsistent porcelain output, 2020-04-17). At that time, I chose to
ignore the exit code of "finish_connect()" without investigating the
root cause of the abnormal termination of git-receive-pack. That was an
incorrect solution.
The root cause is that an atomic push operation terminates early without
sending a flush packet to git-receive-pack. As a result,
git-receive-pack continues waiting for commands without exiting. By
sending a flush packet at the appropriate location in "send_pack()", we
ensure that the git-receive-pack process closes properly, avoiding an
erroneous exit code for git-push. At the same time, revert the changes
to the "transport.c" file made in commit 7dcbeaa0df.
Reported-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Jiang Xin <zhiyou.jx@alibaba-inc.com>
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add new test cases in t5543 to avoid ignoring the exit code of
git-receive-pack(1) during atomic push with "--porcelain" flag.
We'd typically notice this case because the refs would have their error
message set. But there is an edge case when pushing refs succeeds, but
git-receive-pack(1) exits with a non-zero exit code at a later point in
time due to another error. An atomic git-push(1) would ignore that error
code, and consequently it would return successfully and not print any
error message at all.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The "push_refs" function in the transport_vtable is the handler for
git-push operation. All the "push_refs" functions for different
transports (protocols) should have the same behavior, but the behavior
of "git_transport_push()" function for builtin_smart_vtable in
"transport.c" (which calls "send_pack()" in "send-pack.c") differs from
the handler of the HTTP protocol.
The "push_refs()" function for the HTTP protocol which calls the
"push_refs_with_push()" function in "transport-helper.c" will return 0
even when a bad REF_STATUS (such as REF_STATUS_REJECT_NONFASTFORWARD)
was found. But "send_pack()" for Git smart protocol will return -1 for
a bad REF_STATUS.
We cannot ignore bad REF_STATUS directly in the "send_pack()" function,
because the function is also used in "builtin/send-pack.c". So we add a
new non-zero error code "SEND_PACK_ERROR_REF_STATUS" for "send_pack()".
Ignore the specific error code in the "git_transport_push()" function to
have the same behavior as "push_refs()" for HTTP protocol. Note that
even though we ignore the error here, we'll ultimately still end up
detecting that a subset of refs was not pushed in `transport_push()`
because we eventually call `push_had_errors()` on the remote refs.
Signed-off-by: Jiang Xin <zhiyou.jx@alibaba-inc.com>
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add two more test cases exercising git-push(1) with `--procelain`, one
exercising a non-atomic and one exercising an atomic push.
Based-on-patch-by: Jiang Xin <zhiyou.jx@alibaba-inc.com>
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Refactor the test cases with the following changes:
- Calling setup_upstream() to reset upstream after running each test
case.
- Change the initial branch tips of the workspace to reduce the branch
setup operations in the workspace.
- Reduced the two steps of setting up and cleaning up the pre-receive
hook by moving the operations into the corresponding test case,
Signed-off-by: Jiang Xin <zhiyou.jx@alibaba-inc.com>
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Refactor the function setup_upstream_and_workbench(), extracting
create_upstream_template() and setup_upstream() from it. The former is
used to create the upstream repository template, while the latter is
used to rebuild the upstream repository and will be reused in subsequent
commits.
To ensure that setup_upstream() works properly in both local and HTTP
protocols, the HTTP settings have been moved to the setup_upstream() and
setup_upstream_and_workbench() functions.
Signed-off-by: Jiang Xin <zhiyou.jx@alibaba-inc.com>
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We have several heredocs in t5504 located outside of any particular test
bodies. Move these into the test bodies to match our modern coding
style.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Some test in t6423 supress Git's exit code, which can cause test
failures go unnoticed. Specifically using git <subcommand> |
<other-command> masks potential failures of the Git command.
This commit ensures that Git's exit status is correctly propogated by:
- Avoiding pipes that suppress exit codes.
Signed-off-by: Ayush Chandekar <ayu.chandekar@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add a literal value for showing the suggested autocorrection
for consistency with the rest of the help.autocorrect options.
Signed-off-by: David Aguilar <davvid@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Make the handling of false boolean values for help.autocorrect
consistent with the handling of value 0 by showing the suggested
commands but not running them.
Suggested-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: David Aguilar <davvid@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"test -f" does not provide a nice error message when we hit test
failures, so use test_path_is_file instead.
Signed-off-by: ambar chakravartty <amch9605@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
More build fixes and enhancements on meson based build procedure.
* ps/build-meson-fixes:
ci: wire up Visual Studio build with Meson
ci: raise error when Meson generates warnings
meson: fix compilation with Visual Studio
meson: make the CSPRNG backend configurable
meson: wire up fuzzers
meson: wire up generation of distribution archive
meson: wire up development environments
meson: fix dependencies for generated headers
meson: populate project version via GIT-VERSION-GEN
GIT-VERSION-GEN: allow running without input and output files
GIT-VERSION-GEN: simplify computing the dirty marker
Following the procedure we established to introduce breaking
changes for Git 3.0, allow an early opt-in for removing support of
$GIT_DIR/branches/ and $GIT_DIR/remotes/ directories to configure
remotes.
* ps/3.0-remote-deprecation:
remote: announce removal of "branches/" and "remotes/"
builtin/pack-redundant: remove subcommand with breaking changes
ci: repurpose "linux-gcc" job for deprecations
ci: merge linux-gcc-default into linux-gcc
Makefile: wire up build option for deprecated features
Code clean-up for code paths around combined diff.
* jk/combine-diff-cleanup:
tree-diff: make list tail-passing more explicit
tree-diff: simplify emit_path() list management
tree-diff: use the name "tail" to refer to list tail
tree-diff: drop list-tail argument to diff_tree_paths()
combine-diff: drop public declaration of combine_diff_path_size()
tree-diff: inline path_appendnew()
tree-diff: pass whole path string to path_appendnew()
tree-diff: drop path_appendnew() alloc optimization
run_diff_files(): de-mystify the size of combine_diff_path struct
diff: add a comment about combine_diff_path.parent.path
combine-diff: use pointer for parent paths
tree-diff: clear parent array in path_appendnew()
combine-diff: add combine_diff_path_new()
run_diff_files(): delay allocation of combine_diff_path
The API around choosing to use unsafe variant of SHA-1
implementation has been updated in an attempt to make it harder to
abuse.
* tb/unsafe-hash-cleanup:
hash.h: drop unsafe_ function variants
csum-file: introduce hashfile_checkpoint_init()
t/helper/test-hash.c: use unsafe_hash_algo()
csum-file.c: use unsafe_hash_algo()
hash.h: introduce `unsafe_hash_algo()`
csum-file.c: extract algop from hashfile_checksum_valid()
csum-file: store the hash algorithm as a struct field
t/helper/test-tool: implement sha1-unsafe helper
The main GitHub Actions workflow switched away from the "$distro"
variable in b133d3071a (github: simplify computation of the job's
distro, 2025-01-10). Since the Coverity job also depends on our
ci/install-dependencies.sh script, it needs to likewise set CI_JOB_IMAGE
to find the correct dependencies (without this patch, we don't install
curl and the build fails).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* ps/ci-misc-updates:
ci: remove stale code for Azure Pipelines
ci: use latest Ubuntu release
ci: stop special-casing for Ubuntu 16.04
gitlab-ci: add linux32 job testing against i386
gitlab-ci: remove the "linux-old" job
github: simplify computation of the job's distro
github: convert all Linux jobs to be containerized
github: adapt containerized jobs to be rootless
t7422: fix flaky test caused by buffered stdout
t0060: fix EBUSY in MinGW when setting up runtime prefix
Adapt strcmp-offset test script to clar framework by using clar
assertions where necessary. Introduce `test_strcmp_offset__empty()` to
verify `check_strcmp_offset()` behavior when both input strings are
empty. This ensures the function correctly handles edge cases and
returns expected values.
Mentored-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Seyi Kuforiji <kuforiji98@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Adapt strbuf test script to clar framework by using clar assertions
where necessary.
Mentored-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Seyi Kuforiji <kuforiji98@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Introduce `test_example_decorate__initialize()` to explicitly set up
object IDs and retrieve corresponding objects before tests run. This
ensures a consistent and predictable test state without relying on data
from previous tests.
Add `test_example_decorate__cleanup()` to clear decorations after each
test, preventing interference between tests and ensuring each runs in
isolation.
Adapt example decorate test script to clar framework by using clar
assertions where necessary. Previously, tests relied on data written by
earlier tests, leading to unintended dependencies between them. This
explicitly initializes the necessary state within
`test_example_decorate__readd`, ensuring it does not depend on prior
test executions.
Mentored-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Seyi Kuforiji <kuforiji98@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Adapts hashmap test script to clar framework by using clar assertions
where necessary.
Mentored-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Seyi Kuforiji <kuforiji98@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The check-whitespace and check-style CI scripts require a base commit.
In GitLab CI, the base commit can be provided by several different
predefined CI variables depending on the type of pipeline being
performed.
In 30c4f7e350 (check-whitespace: detect if no base_commit is provided,
2024-07-23), the GitLab check-whitespace CI job was modified to support
CI_MERGE_REQUEST_DIFF_BASE_SHA as a fallback base commit if
CI_MERGE_REQUEST_TARGET_BRANCH_SHA was not provided. The same fallback
strategy was also implemented for the GitLab check-style CI job in
bce7e52d4e (ci: run style check on GitHub and GitLab, 2024-07-23).
The base commit fallback is implemented using shell parameter expansion
where, if the first variable is unset, the second variable is used as
fallback. In GitLab CI, these variables can be set but null. This has
the unintended effect of selecting an empty first variable which results
in CI jobs providing an invalid base commit and failing.
Fix the issue by defaulting to the fallback variable if the first is
unset or null.
Signed-off-by: Justin Tobler <jltobler@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Adapt callers to use generic hash context helpers instead of using the
hash algorithm to update them. This makes the callsites easier to reason
about and removes the possibility that the wrong hash algorithm is used
to update the hash context's state. And as a nice side effect this also
gets rid of a bunch of users of `the_hash_algo`.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The hash context is supposed to be updated via the `git_hash_algo`
structure, which contains a list of function pointers to update, clone
or finalize a hashing context. This requires the callers to track which
algorithm was used to initialize the context and continue to use the
exact same algorithm. If they fail to do that correctly, it can happen
that we start to access context state of one hash algorithm with
functions of a different hash algorithm. The result would typically be a
segfault, as could be seen e.g. in the patches part of 98422943f0 (Merge
branch 'ps/weak-sha1-for-tail-sum-fix', 2025-01-01).
The situation was significantly improved starting with 04292c3796
(hash.h: drop unsafe_ function variants, 2025-01-23) and its parent
commits. These refactorings ensure that it is not possible to mix up
safe and unsafe variants of the same hash algorithm anymore. But in
theory, it is still possible to mix up different hash algorithms with
each other, even though this is a lot less likely to happen.
But still, we can do better: instead of asking the caller to remember
the hash algorithm used to initialize a context, we can instead make the
context itself remember which algorithm it has been initialized with. If
we do so, callers can use a set of generic helpers to update the context
and don't need to be aware of the hash algorithm at all anymore.
Adapt the context initialization functions to store the hash algorithm
in the hashing context and introduce these generic helpers. Callers will
be adapted in the subsequent commit.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We generally avoid using `typedef` in the Git codebase. One exception
though is the `git_hash_ctx`, likely because it used to be a union
rather than a struct until the preceding commit refactored it. But now
that it is a normal `struct` there isn't really a need for a typedef
anymore.
Drop the typedef and adapt all callers accordingly.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The `git_hash_context` is a union containing the different hash-specific
states for SHA1, its unsafe variant as well as SHA256. We know that only
one of these states will ever be in use at the same time because hash
contexts cannot be used for multiple different hashes at the same point
in time.
We're about to extend the structure though to keep track of the hash
algorithm used to initialize the context, which is impossible to do
while the context is a union. Refactor it to instead be a structure that
contains the union of context states.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* tb/unsafe-hash-cleanup:
hash.h: drop unsafe_ function variants
csum-file: introduce hashfile_checkpoint_init()
t/helper/test-hash.c: use unsafe_hash_algo()
csum-file.c: use unsafe_hash_algo()
hash.h: introduce `unsafe_hash_algo()`
csum-file.c: extract algop from hashfile_checksum_valid()
csum-file: store the hash algorithm as a struct field
t/helper/test-tool: implement sha1-unsafe helper
Doc and short-help text for "show-index" has been clarified to
stress that the command reads its data from the standard input.
* jc/show-index-h-update:
show-index: the short help should say the command reads from its input
* ps/build-meson-fixes:
ci: wire up Visual Studio build with Meson
ci: raise error when Meson generates warnings
meson: fix compilation with Visual Studio
meson: make the CSPRNG backend configurable
meson: wire up fuzzers
meson: wire up generation of distribution archive
meson: wire up development environments
meson: fix dependencies for generated headers
meson: populate project version via GIT-VERSION-GEN
GIT-VERSION-GEN: allow running without input and output files
GIT-VERSION-GEN: simplify computing the dirty marker
The exact same issue as described in the preceding commit also exists
for GIT_DEFAULT_HASH. Thus, reinitializing a repository that e.g. uses
SHA1 with `GIT_DEFAULT_HASH=sha256 git init` will cause the object
format of that repository to change to SHA256. This is of course bogus
as any existing objects and refs will not be converted, thus causing
repository corruption:
$ git init repo
Initialized empty Git repository in /tmp/repo/.git/
$ cd repo/
$ git commit --allow-empty -m message
[main (root-commit) 35a7344] message
$ GIT_DEFAULT_HASH=sha256 git init
Reinitialized existing Git repository in /tmp/repo/.git/
$ git show
fatal: your current branch appears to be broken
Fix the issue by ignoring the environment variable in case the repo has
already been initialized with an object hash.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The GIT_DEFAULT_REF_FORMAT environment variable can be set to influence
the default ref format that new repostiories shall be initialized with.
While this is the expected behaviour when creating a new repository, it
is not when reinitializing a repository: we should retain the ref format
currently used by it in that case.
This doesn't work correctly right now:
$ git init --ref-format=files repo
Initialized empty Git repository in /tmp/repo/.git/
$ GIT_DEFAULT_REF_FORMAT=reftable git init repo
fatal: could not open '/tmp/repo/.git/refs/heads' for writing: Is a directory
Instead of retaining the current ref format, the reinitialization tries
to reinitialize the repository with the different format. This action
fails when git-init(1) tries to write the ".git/refs/heads" stub, which
in the context of the reftable backend is always written as a file so
that we can detect clients which inadvertently try to access the repo
with the wrong ref format. Seems like the protection mechanism works for
this case, as well.
Fix the issue by ignoring the environment variable in case the repo has
already been initialized with a ref storage format.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The test in question is an exact copy of the testcase preceding it.
Remove it.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git apply" uses strtoul() to parse the numbers in the hunk header but
silently ignores overflows. As LONG_MAX is a legitimate return value for
strtoul() we need to set errno to zero before the call to strtoul() and
check that it is still zero afterwards. The error message we display is
not particularly helpful as it does not say what was wrong. However, it
seems pretty unlikely that users are going to trigger this error in
practice and we can always improve it later if needed.
Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We don't free the result of `remote_default_branch()`, leading to a
memory leak. This leak is exposed by t9211, but only when run with Meson
with the `-Db_sanitize=leak` option:
Direct leak of 5 byte(s) in 1 object(s) allocated from:
#0 0x5555555cfb93 in malloc (scalar+0x7bb93)
#1 0x5555556b05c2 in do_xmalloc ../wrapper.c:55:8
#2 0x5555556b06c4 in do_xmallocz ../wrapper.c:89:8
#3 0x5555556b0656 in xmallocz ../wrapper.c:97:9
#4 0x5555556b0728 in xmemdupz ../wrapper.c:113:16
#5 0x5555556b07a7 in xstrndup ../wrapper.c:119:9
#6 0x5555555d3a4b in remote_default_branch ../scalar.c:338:14
#7 0x5555555d20e6 in cmd_clone ../scalar.c:493:28
#8 0x5555555d196b in cmd_main ../scalar.c:992:14
#9 0x5555557c4059 in main ../common-main.c:64:11
#10 0x7ffff7a2a1fb in __libc_start_call_main (/nix/store/h7zcxabfxa7v5xdna45y2hplj31ncf8a-glibc-2.40-36/lib/libc.so.6+0x2a1fb) (BuildId: 0a855678aa0cb573cecbb2bcc73ab8239ec472d0)
#11 0x7ffff7a2a2b8 in __libc_start_main@GLIBC_2.2.5 (/nix/store/h7zcxabfxa7v5xdna45y2hplj31ncf8a-glibc-2.40-36/lib/libc.so.6+0x2a2b8) (BuildId: 0a855678aa0cb573cecbb2bcc73ab8239ec472d0)
#12 0x555555592054 in _start (scalar+0x3e054)
DEDUP_TOKEN: __interceptor_malloc--do_xmalloc--do_xmallocz--xmallocz--xmemdupz--xstrndup--remote_default_branch--cmd_clone--cmd_main--main--__libc_start_call_main--__libc_start_main@GLIBC_2.2.5--_start
SUMMARY: LeakSanitizer: 5 byte(s) leaked in 1 allocation(s).
As the `branch` variable may contain a string constant obtained from
parsing command line arguments we cannot free the leaking variable
directly. Instead, introduce a new `branch_to_free` variable that only
ever gets assigned the allocated string and free that one to plug the
leak.
It is unclear why the leak isn't flagged when running the test via our
Makefile.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When trying to create a Unix socket in a path that exceeds the maximum
socket name length we try to first change the directory into the parent
folder before creating the socket to reduce the length of the name. When
this fails we error out of `unix_sockaddr_init()` with an error code,
which indicates to the caller that the context has not been initialized.
Consequently, they don't release that context.
This leads to a memory leak: when we have already populated the context
with the original directory that we need to chdir(3p) back into, but
then the chdir(3p) into the socket's parent directory fails, then we
won't release the original directory's path. The leak is exposed by
t0301, but only when running tests in a directory hierarchy whose path
is long enough to make the socket name length exceed the maximum socket
name length:
Direct leak of 129 byte(s) in 1 object(s) allocated from:
#0 0x5555555e85c6 in realloc.part.0 lsan_interceptors.cpp.o
#1 0x55555590e3d6 in xrealloc ../wrapper.c:140:8
#2 0x5555558c8fc6 in strbuf_grow ../strbuf.c:114:2
#3 0x5555558cacab in strbuf_getcwd ../strbuf.c:605:3
#4 0x555555923ff6 in unix_sockaddr_init ../unix-socket.c:65:7
#5 0x555555923e42 in unix_stream_connect ../unix-socket.c:84:6
#6 0x55555562a984 in send_request ../builtin/credential-cache.c:46:11
#7 0x55555562a89e in do_cache ../builtin/credential-cache.c:108:6
#8 0x55555562a655 in cmd_credential_cache ../builtin/credential-cache.c:178:3
#9 0x555555700547 in run_builtin ../git.c:480:11
#10 0x5555556ff0e0 in handle_builtin ../git.c:740:9
#11 0x5555556ffee8 in run_argv ../git.c:807:4
#12 0x5555556fee6b in cmd_main ../git.c:947:19
#13 0x55555593f689 in main ../common-main.c:64:11
#14 0x7ffff7a2a1fb in __libc_start_call_main (/nix/store/h7zcxabfxa7v5xdna45y2hplj31ncf8a-glibc-2.40-36/lib/libc.so.6+0x2a1fb) (BuildId: 0a855678aa0cb573cecbb2bcc73ab8239ec472d0)
#15 0x7ffff7a2a2b8 in __libc_start_main@GLIBC_2.2.5 (/nix/store/h7zcxabfxa7v5xdna45y2hplj31ncf8a-glibc-2.40-36/lib/libc.so.6+0x2a2b8) (BuildId: 0a855678aa0cb573cecbb2bcc73ab8239ec472d0)
#16 0x5555555ad1d4 in _start (git+0x591d4)
DEDUP_TOKEN: ___interceptor_realloc.part.0--xrealloc--strbuf_grow--strbuf_getcwd--unix_sockaddr_init--unix_stream_connect--send_request--do_cache--cmd_credential_cache--run_builtin--handle_builtin--run_argv--cmd_main--main--__libc_start_call_main--__libc_start_main@GLIBC_2.2.5--_start
SUMMARY: LeakSanitizer: 129 byte(s) leaked in 1 allocation(s).
Fix this leak.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The C functions exported by libgit-sys do not provide an idiomatic Rust
interface. To make it easier to use these functions via Rust, add a
higher-level "libgit" crate, that wraps the lower-level configset API
with an interface that is more Rust-y.
This combination of $X and $X-sys crates is a common pattern for FFI in
Rust, as documented in "The Cargo Book" [1].
[1] https://doc.rust-lang.org/cargo/reference/build-scripts.html#-sys-packages
Co-authored-by: Josh Steadmon <steadmon@google.com>
Signed-off-by: Josh Steadmon <steadmon@google.com>
Signed-off-by: Calvin Wan <calvinwan@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In preparation for implementing a higher-level Rust API for accessing
Git configs, export some of the upstream configset API via libgitpub and
libgit-sys. Since this will be exercised as part of the higher-level API
in the next commit, no tests have been added for libgit-sys.
While we're at it, add git_configset_alloc() and git_configset_free()
functions in libgitpub so that callers can manage config_set structs on
the heap. This also allows non-C external consumers to treat config_sets
as opaque structs.
Co-authored-by: Calvin Wan <calvinwan@google.com>
Signed-off-by: Calvin Wan <calvinwan@google.com>
Signed-off-by: Josh Steadmon <steadmon@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The "git refs migrate" command did not migrate the reflog for
refs/stash, which is the contents of the stashes, which has been
corrected.
* ps/reflog-migration-with-logall-fix:
refs: fix migration of reflogs respecting "core.logAllRefUpdates"
The trace2 code was not prepared to show a configuration variable
that is set to true using the valueless true syntax, which has been
corrected.
* am/trace2-with-valueless-true:
trace2: prevent segfault on config collection with valueless true
reflog entries for symbolic ref updates were broken, which has been
corrected.
* kn/reflog-symref-fix:
refs: fix creation of reflog entries for symrefs
"git branch --sort=..." and "git for-each-ref --format=... --sort=..."
did not work as expected with some atoms, which has been corrected.
* rs/ref-fitler-used-atoms-value-fix:
ref-filter: remove ref_format_clear()
ref-filter: move is-base tip to used_atom
ref-filter: move ahead-behind bases into used_atom
Doc updates.
* ja/doc-commit-markup-updates:
doc: migrate git-commit manpage secondary files to new format
doc: convert git commit config to new format
doc: make more direct explanations in git commit options
doc: the mode param of -u of git commit is optional
doc: apply new documentation guidelines to git commit
Introduce a new API to visit objects in batches based on a common
path, or by type.
* ds/path-walk-1:
path-walk: drop redundant parse_tree() call
path-walk: reorder object visits
path-walk: mark trees and blobs as UNINTERESTING
path-walk: visit tags and cached objects
path-walk: allow consumer to specify object types
t6601: add helper for testing path-walk API
test-lib-functions: add test_cmp_sorted
path-walk: introduce an object walk by path
Introduce libgit-sys, a Rust wrapper crate that allows Rust code to call
functions in libgit.a. This initial patch defines build rules and an
interface that exposes user agent string getter functions as a proof of
concept. This library can be tested with `cargo test`. In later commits,
a higher-level library containing a more Rust-friendly interface will be
added at `contrib/libgit-rs`.
Symbols in libgit can collide with symbols from other libraries such as
libgit2. We avoid this by first exposing library symbols in
public_symbol_export.[ch]. These symbols are prepended with "libgit_" to
avoid collisions and set to visible using a visibility pragma. In
build.rs, Rust builds contrib/libgit-rs/libgit-sys/libgitpub.a, which also
contains libgit.a and other dependent libraries, with
-fvisibility=hidden to hide all symbols within those libraries that
haven't been exposed with a visibility pragma.
Co-authored-by: Kyle Lippincott <spectral@google.com>
Co-authored-by: Calvin Wan <calvinwan@google.com>
Signed-off-by: Calvin Wan <calvinwan@google.com>
Signed-off-by: Kyle Lippincott <spectral@google.com>
Signed-off-by: Josh Steadmon <steadmon@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Currently, object files in libgit.a reference common_exit(), which is
contained in common-main.o. However, common-main.o also includes main(),
which references cmd_main() in git.o, which in turn depends on all the
builtin/*.o objects.
We would like to allow external users to link libgit.a without needing
to include so many extra objects. Enable this by splitting common_exit()
and check_bug_if_BUG() into a new file common-exit.c, and add
common-exit.o to LIB_OBJS so that these are included in libgit.a.
This split has previously been proposed ([1], [2]) to support fuzz tests
and unit tests by avoiding conflicting definitions for main(). However,
both of those issues were resolved by other methods of avoiding symbol
conflicts. Now we are trying to make libgit.a more self-contained, so
hopefully we can revisit this approach.
Additionally, move the initialization code out of main() into a new
init_git() function in its own file. Include this in libgit.a as well,
so that external users can share our setup code without calling our
main().
[1] https://lore.kernel.org/git/Yp+wjCPhqieTku3X@google.com/
[2] https://lore.kernel.org/git/20230517-unit-tests-v2-v2-1-21b5b60f4b32@google.com/
Signed-off-by: Josh Steadmon <steadmon@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We don't yet have any test coverage for the new zlib-ng backend as part
of our CI. Add it by installing zlib-ng in Alpine Linux, which causes
Meson to pick it up automatically.
Note that we are somewhat limited with regards to where we run that job:
Debian-based distributions don't have zlib-ng in their repositories,
Fedora has it but doesn't run tests, and Alma Linux doesn't have the
package either. Alpine Linux does have it available and is running our
test suite, which is why it was picked.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Switch over the "linux-musl" job to use Meson instead of Makefiles. This
is done due to multiple reasons:
- It simplifies our CI infrastructure a bit as we don't have to
manually specify a couple of build options anymore.
- It verifies that Meson detects and sets those build options
automatically.
- It makes it easier for us to wire up a new CI job using zlib-ng as
backend.
One platform compatibility that Meson cannot easily detect automatically
is the `GIT_TEST_UTF8_LOCALE` variable used in tests. Wire up a build
option for it, which we set via a new "MESONFLAGS" environment variable.
Note that we also drop the CC variable, which is set to "gcc". We
already default to GCC when CC is unset in "ci/lib.sh", so this is not
needed.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The zlib-ng library is a hard fork of the old and venerable zlib
library. It describes itself as zlib replacement with optimizations for
"next generation" systems. As such, it contains several implementations
of central algorithms using for example SSE2, AVX2 and other vectorized
CPU intrinsics that supposedly speed up in- and deflating data.
And indeed, compiling Git against zlib-ng leads to a significant speedup
when reading objects. The following benchmark uses git-cat-file(1) with
`--batch --batch-all-objects` in the Git repository:
Benchmark 1: zlib
Time (mean ± σ): 52.085 s ± 0.141 s [User: 51.500 s, System: 0.456 s]
Range (min … max): 52.004 s … 52.335 s 5 runs
Benchmark 2: zlib-ng
Time (mean ± σ): 40.324 s ± 0.134 s [User: 39.731 s, System: 0.490 s]
Range (min … max): 40.135 s … 40.484 s 5 runs
Summary
zlib-ng ran
1.29 ± 0.01 times faster than zlib
So we're looking at a ~25% speedup compared to zlib. This is of course
an extreme example, as it makes us read through all objects in the
repository. But regardless, it should be possible to see some sort of
speedup in most commands that end up accessing the object database.
The zlib-ng library provides a compatibility layer that makes it a
proper drop-in replacement for zlib: nothing needs to change in the
build system to support it. Unfortunately though, this mode isn't easy
to use on most systems because distributions do not allow you to install
zlib-ng in that way, as that would mean that the zlib library would be
globally replaced. Instead, many distributions provide a package that
installs zlib-ng without the compatibility layer. This version does
provide effectively the same APIs like zlib does, but all of the symbols
are prefixed with `zng_` to avoid symbol collisions.
Implement a new build option that allows us to link against zlib-ng
directly. If set, we redefine zlib symbols so that we use the `zng_`
prefixed versions thereof provided by that library. Like this, it
becomes possible to install both zlib and zlib-ng (without the compat
layer) and then pick whichever library one wants to link against for
Git.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The `struct git_zstream::next_in` variable points to the input data and
is used in combination with `struct z_stream::next_in`. While that
latter field is not marked as a constant in zlib, it is marked as such
in zlib-ng. This causes a couple of compiler errors when we try to
assign these fields to one another due to mismatching constness.
Fix the issue by casting away the potential constness of `next_in`.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The function `deflateSetHeader()` has been introduced with zlib v1.2.2.1,
so we don't use it when linking against an older version of it. Refactor
the code to instead provide a central stub via "compat/zlib.h" so that
we can adapt it based on whether or not we use zlib-ng in a subsequent
commit.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The `deflateBound()` function has only been introduced with zlib 1.2.0.
When linking against a zlib version older than that we thus provide our
own compatibility shim. Move this shim into "compat/zlib.h" so that we
can adapt it based on whether or not we use zlib-ng in a subsequent
commit.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We include "compat/zlib.h" in "git-compat-util.h", which is
unnecessarily broad given that we only have a small handful of files
that use the zlib library. Move the header into "git-zlib.h" instead and
adapt users of zlib to include that header.
One exception is the reftable library, as we don't want to use the
Git-specific wrapper of zlib there, so we include "compat/zlib.h"
instead. Furthermore, we move the include into "reftable/system.h" so
that users of the library other than Git can wire up zlib themselves.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Introduce a new "compat/zlib-compat.h" header that we include instead of
including <zlib.h> directly. This will allow us to wire up zlib-ng as an
alternative backend for zlib compression in a subsequent commit.
Note that we cannot just call the file "compat/zlib.h", as that may
otherwise cause us to include that file instead of <zlib.h>.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Before including <zlib.h> we explicitly define `z_const` to an empty
value. This has the effect that the `z_const` macro in "zconf.h" itself
will remain empty instead of being defined as `const`, which effectively
adapts a couple of APIs so that their parameters are not marked as being
constants.
It is dubious though whether this is something we actually want: not
marking a parameter as a constant doesn't make it any less constant than
it was. The define was added via 07564773c2 (compat: auto-detect if zlib
has uncompress2(), 2022-01-24), where it was seemingly carried over from
our internal compatibility shim for `uncompress2()` that was removed in
the preceding commit. The commit message doesn't mention why we carry
over the define and make it public, either, and I cannot think of any
reason for why we would want to have it.
Drop the define.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Our compat library has an implementation of zlib's `uncompress2()`
function that gets used when linking against an old version of zlib
that doesn't yet have it. The last user of `uncompress2()` got removed
in 15a60b747e (reftable/block: open-code call to `uncompress2()`,
2024-04-08), so the compatibility code is not required anymore. Drop it.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The reftable/ library code has been made -Wsign-compare clean.
* ps/reftable-sign-compare:
reftable: address trivial -Wsign-compare warnings
reftable/blocksource: adjust `read_block()` to return `ssize_t`
reftable/blocksource: adjust type of the block length
reftable/block: adjust type of the restart length
reftable/block: adapt header and footer size to return a `size_t`
reftable/basics: adjust `hash_size()` to return `uint32_t`
reftable/basics: adjust `common_prefix_size()` to return `size_t`
reftable/record: handle overflows when decoding varints
reftable/record: drop unused `print` function pointer
meson: stop disabling -Wsign-compare
The "cache" credential back-end did not handle authtype correctly,
which has been corrected.
* mh/credential-cache-authtype-request-fix:
credential-cache: respect authtype capability
It was possible for "git unpack-objects" and "git index-pack" to
make an unaligned access, which has been corrected.
* jk/pack-header-parse-alignment-fix:
index-pack, unpack-objects: use skip_prefix to avoid magic number
index-pack, unpack-objects: use get_be32() for reading pack header
parse_pack_header_option(): avoid unaligned memory writes
packfile: factor out --pack_header argument parsing
bswap.h: squelch potential sparse -Wcast-truncate warnings
The meson-driven build is now aware of "git-subtree" housed in
contrib/subtree hierarchy.
* ps/build-meson-subtree:
meson: wire up the git-subtree(1) command
meson: introduce build option for contrib
contrib/subtree: fix building docs
The code in connect.c has been updated to work around complaints
from -Wsign-compare.
* mh/connect-sign-compare:
connect: address -Wsign-compare warnings
Move a few more unit tests to the clar test framework.
* sk/unit-tests:
t/unit-tests: convert reftable tree test to use clar test framework
t/unit-tests: adapt priority queue test to use clar test framework
t/unit-tests: convert mem-pool test to use clar test framework
t/unit-tests: handle dashes in test suite filenames
The help text from "git $cmd -h" appear on the standard output for
some $cmd and the standard error for others. The built-in commands
have been fixed to show them on the standard output consistently.
* jc/show-usage-help:
builtin: send usage() help text to standard output
oddballs: send usage() help text to standard output
builtins: send usage_with_options() help text to standard output
usage: add show_usage_if_asked()
parse-options: add show_usage_with_options_if_asked()
t0012: optionally check that "-h" output goes to stdout
When the --name-hash-version option is used in 'git pack-objects', it
can change from the initial assignment to when it is used based on
interactions with other arguments. Specifically, when writing or reading
bitmaps, we must force version 1 for now. This could change in the
future when the bitmap format can store a name hash version value,
indicating which was used during the writing of the packfile.
Protect the 'git pack-objects' process from getting confused by failing
with a BUG() statement if the value of the name hash version changes
between calls to pack_name_hash_fn().
Signed-off-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add a new test-tool helper, name-hash, to output the value of the
name-hash algorithms for the input list of strings, one per line.
Since the name-hash values can be stored in the .bitmap files, it is
important that these hash functions do not change across Git versions.
Add a simple test to t5310-pack-bitmaps.sh to provide some testing of
the current values. Due to how these functions are implemented, it would
be difficult to change them without disturbing these values. The paths
used for this test are carefully selected to demonstrate some of the
behavior differences of the two current name hash versions, including
which conditions will cause them to collide.
Create a performance test that uses test_size to demonstrate how
collisions occur for these hash algorithms. This test helps inform
someone as to the behavior of the name-hash algorithms for their repo
based on the paths at HEAD.
My copy of the Git repository shows modest statistics around the
collisions of the default name-hash algorithm:
Test this tree
--------------------------------------------------
5314.1: paths at head 4.5K
5314.2: distinct hash value: v1 4.1K
5314.3: maximum multiplicity: v1 13
5314.4: distinct hash value: v2 4.2K
5314.5: maximum multiplicity: v2 9
Here, the maximum collision multiplicity is 13, but around 10% of paths
have a collision with another path.
In a more interesting example, the microsoft/fluentui [1] repo had these
statistics at time of committing:
Test this tree
--------------------------------------------------
5314.1: paths at head 19.5K
5314.2: distinct hash value: v1 8.2K
5314.3: maximum multiplicity: v1 279
5314.4: distinct hash value: v2 17.8K
5314.5: maximum multiplicity: v2 44
[1] https://github.com/microsoft/fluentui
That demonstrates that of the nearly twenty thousand path names, they
are assigned around eight thousand distinct values. 279 paths are
assigned to a single value, leading the packing algorithm to sort
objects from those paths together, by size.
With the v2 name hash function, the maximum multiplicity lowers to 44,
leaving some room for further improvement.
In a more extreme example, an internal monorepo had a much worse
collision rate:
Test this tree
--------------------------------------------------
5314.1: paths at head 227.3K
5314.2: distinct hash value: v1 72.3K
5314.3: maximum multiplicity: v1 14.4K
5314.4: distinct hash value: v2 166.5K
5314.5: maximum multiplicity: v2 138
Here, we can see that the v2 name hash function provides somem
improvements, but there are still a number of collisions that could lead
to repacking problems at this scale.
Signed-off-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
As custom options are added to 'git pack-objects' and 'git repack' to
adjust how compression is done, use this new performance test script to
demonstrate their effectiveness in performance and size.
The recently-added --name-hash-version option allows for testing
different name hash functions. Version 2 intends to preserve some of the
locality of version 1 while more often breaking collisions due to long
filenames.
Distinguishing objects by more of the path is critical when there are
many name hash collisions and several versions of the same path in the
full history, giving a significant boost to the full repack case. The
locality of the hash function is critical to compressing something like
a shallow clone or a thin pack representing a push of a single commit.
This can be seen by running pt5313 on the open source fluentui
repository [1]. Most commits will have this kind of output for the thin
and big pack cases, though certain commits (such as [2]) will have
problematic thin pack size for other reasons.
[1] https://github.com/microsoft/fluentui
[2] a637a06df05360ce5ff21420803f64608226a875
Checked out at the parent of [2], I see the following statistics:
Test HEAD
---------------------------------------------------------------
5313.2: thin pack with version 1 0.37(0.44+0.02)
5313.3: thin pack size with version 1 1.2M
5313.4: big pack with version 1 2.04(7.77+0.23)
5313.5: big pack size with version 1 20.4M
5313.6: shallow fetch pack with version 1 1.41(2.94+0.11)
5313.7: shallow pack size with version 1 34.4M
5313.8: repack with version 1 95.70(676.41+2.87)
5313.9: repack size with version 1 439.3M
5313.10: thin pack with version 2 0.12(0.12+0.06)
5313.11: thin pack size with version 2 22.0K
5313.12: big pack with version 2 2.80(5.43+0.34)
5313.13: big pack size with version 2 25.9M
5313.14: shallow fetch pack with version 2 1.77(2.80+0.19)
5313.15: shallow pack size with version 2 33.7M
5313.16: repack with version 2 33.68(139.52+2.58)
5313.17: repack size with version 2 160.5M
To make comparisons easier, I will reformat this output into a different
table style:
| Test | V1 Time | V2 Time | V1 Size | V2 Size |
|--------------|---------|---------|---------|---------|
| Thin Pack | 0.37 s | 0.12 s | 1.2 M | 22.0 K |
| Big Pack | 2.04 s | 2.80 s | 20.4 M | 25.9 M |
| Shallow Pack | 1.41 s | 1.77 s | 34.4 M | 33.7 M |
| Repack | 95.70 s | 33.68 s | 439.3 M | 160.5 M |
The v2 hash function successfully differentiates the CHANGELOG.md files
from each other, which leads to significant improvements in the thin
pack (simulating a push of this commit) and the full repack. There is
some bloat in the "big pack" scenario and essentially the same results
for the shallow pack.
In the case of the Git repository, these numbers show some of the issues
with this approach:
| Test | V1 Time | V2 Time | V1 Size | V2 Size |
|--------------|---------|---------|---------|---------|
| Thin Pack | 0.02 s | 0.02 s | 1.1 K | 1.1 K |
| Big Pack | 1.69 s | 1.95 s | 13.5 M | 14.5 M |
| Shallow Pack | 1.26 s | 1.29 s | 12.0 M | 12.2 M |
| Repack | 29.51 s | 29.01 s | 237.7 M | 238.2 M |
Here, the attempts to remove conflicts in the v2 function seem to cause
slight bloat to these sizes. This shows that the Git repository benefits
a lot from cross-path delta pairs.
The results are similar with the nodejs/node repo:
| Test | V1 Time | V2 Time | V1 Size | V2 Size |
|--------------|---------|---------|---------|---------|
| Thin Pack | 0.02 s | 0.02 s | 1.6 K | 1.6 K |
| Big Pack | 4.61 s | 3.26 s | 56.0 M | 52.8 M |
| Shallow Pack | 7.82 s | 7.51 s | 104.6 M | 107.0 M |
| Repack | 88.90 s | 73.75 s | 740.1 M | 764.5 M |
Here, the v2 name-hash causes some size bloat more often than it reduces
the size, but it also universally improves performance time, which is an
interesting reversal. This must mean that it is helping to short-circuit
some delta computations even if it is not finding the most efficient
ones. The performance improvement cannot be explained only due to the
I/O cost of writing the resulting packfile.
The Linux kernel repository was the initial target of the default name
hash value, and its naming conventions are practically build to take the
most advantage of the default name hash values:
| Test | V1 Time | V2 Time | V1 Size | V2 Size |
|--------------|----------|----------|---------|---------|
| Thin Pack | 0.17 s | 0.07 s | 4.6 K | 4.6 K |
| Big Pack | 17.88 s | 12.35 s | 201.1 M | 159.1 M |
| Shallow Pack | 11.05 s | 22.94 s | 269.2 M | 273.8 M |
| Repack | 727.39 s | 566.95 s | 2.5 G | 2.5 G |
Here, the thin and big packs gain some performance boosts in time, with
a modest gain in the size of the big pack. The shallow pack, however, is
more expensive to compute, likely because similarly-named files across
different directories are farther apart in the name hash ordering in v2.
The repack also gains benefits in computation time but no meaningful
change to the full size.
Finally, an internal Javascript repo of moderate size shows significant
gains when repacking with --name-hash-version=2 due to it having many name
hash collisions. However, it's worth noting that only the full repack
case has significant differences from the v1 name hash:
| Test | V1 Time | V2 Time | V1 Size | V2 Size |
|-----------|-----------|----------|---------|---------|
| Thin Pack | 8.28 s | 7.28 s | 16.8 K | 16.8 K |
| Big Pack | 12.81 s | 11.66 s | 29.1 M | 29.1 M |
| Shallow | 4.86 s | 4.06 s | 42.5 M | 44.1 M |
| Repack | 3126.50 s | 496.33 s | 6.2 G | 855.6 M |
Signed-off-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add a new environment variable to opt-in to different values of the
--name-hash-version=<n> option in 'git pack-objects'. This allows for
extra testing of the feature without repeating all of the test
scenarios. Unlike many GIT_TEST_* variables, we are choosing to not add
this to the linux-TEST-vars CI build as that test run is already
overloaded. The behavior exposed by this test variable is of low risk
and should be sufficient to allow manual testing when an issue arises.
But this option isn't free. There are a few tests that change behavior
with the variable enabled.
First, there are a few tests that are very sensitive to certain delta
bases being picked. These are both involving the generation of thin
bundles and then counting their objects via 'git index-pack --fix-thin'
which pulls the delta base into the new packfile. For these tests,
disable the option as a decent long-term option.
Second, there are some tests that compare the exact output of a 'git
pack-objects' process when using bitmaps. The warning that ignores the
--name-hash-version=2 and forces version 1 causes these tests to fail.
Disable the environment variable to get around this issue.
Signed-off-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The new '--name-hash-version' option for 'git repack' is a simple
pass-through to the underlying 'git pack-objects' subcommand. However,
this subcommand may have other options and a temporary filename as part
of the subcommand execution that may not be predictable or could change
over time.
The existing test_subcommand method requires an exact list of arguments
for the subcommand. This is too rigid for our needs here, so create a
new method, test_subcommand_flex. Use it to check that the
--name-hash-version option is passing through.
Since we are modifying the 'git repack' command, let's bring its usage
in line with the Documentation's synopsis. This removes it from the
allow list in t0450 so it will remain in sync in the future.
Signed-off-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The previous change introduced a new pack_name_hash_v2() function that
intends to satisfy much of the hash locality features of the existing
pack_name_hash() function while also distinguishing paths with similar
final components of their paths.
This change adds a new --name-hash-version option for 'git pack-objects'
to allow users to select their preferred function version. This use of
an integer version allows for future expansion and a direct way to later
store a name hash version in the .bitmap format.
For now, let's consider how effective this mechanism is when repacking a
repository with different name hash versions. Specifically, we will
execute 'git pack-objects' the same way a 'git repack -adf' process
would, except we include --name-hash-version=<n> for testing.
On the Git repository, we do not expect much difference. All path names
are short. This is backed by our results:
| Stage | Pack Size | Repack Time |
|-----------------------|-----------|-------------|
| After clone | 260 MB | N/A |
| --name-hash-version=1 | 127 MB | 129s |
| --name-hash-version=2 | 127 MB | 112s |
This example demonstrates how there is some natural overhead coming from
the cloned copy because the server is hosting many forks and has not
optimized for exactly this set of reachable objects. But the full repack
has similar characteristics for both versions.
Let's consider some repositories that are hitting too many collisions
with version 1. First, let's explore the kinds of paths that are
commonly causing these collisions:
* "/CHANGELOG.json" is 15 characters, and is created by the beachball
[1] tool. Only the final character of the parent directory can
differentiate different versions of this file, but also only the two
most-significant digits. If that character is a letter, then this is
always a collision. Similar issues occur with the similar
"/CHANGELOG.md" path, though there is more opportunity for
differences In the parent directory.
* Localization files frequently have common filenames but
differentiates via parent directories. In C#, the name
"/strings.resx.lcl" is used for these localization files and they
will all collide in name-hash.
[1] https://github.com/microsoft/beachball
I've come across many other examples where some internal tool uses a
common name across multiple directories and is causing Git to repack
poorly due to name-hash collisions.
One open-source example is the fluentui [2] repo, which uses beachball
to generate CHANGELOG.json and CHANGELOG.md files, and these files have
very poor delta characteristics when comparing against versions across
parent directories.
| Stage | Pack Size | Repack Time |
|-----------------------|-----------|-------------|
| After clone | 694 MB | N/A |
| --name-hash-version=1 | 438 MB | 728s |
| --name-hash-version=2 | 168 MB | 142s |
[2] https://github.com/microsoft/fluentui
In this example, we see significant gains in the compressed packfile
size as well as the time taken to compute the packfile.
Using a collection of repositories that use the beachball tool, I was
able to make similar comparisions with dramatic results. While the
fluentui repo is public, the others are private so cannot be shared for
reproduction. The results are so significant that I find it important to
share here:
| Repo | --name-hash-version=1 | --name-hash-version=2 |
|----------|-----------------------|-----------------------|
| fluentui | 440 MB | 161 MB |
| Repo B | 6,248 MB | 856 MB |
| Repo C | 37,278 MB | 6,755 MB |
| Repo D | 131,204 MB | 7,463 MB |
Future changes could include making --name-hash-version implied by a config
value or even implied by default during a full repack.
It is important to point out that the name hash value is stored in the
.bitmap file format, so we must force --name-hash-version=1 when bitmaps
are being read or written. Later, the bitmap format could be updated to
be aware of the name hash version so deltas can be quickly computed
across the bitmapped/not-bitmapped boundary. To promote the safety of
this parameter, the validate_name_hash_version() method will die() if
the given name-hash version is incorrect and will disable newer versions
if not yet compatible with other features, such as --write-bitmap-index.
Signed-off-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
As we will explore in later changes, the default name-hash function used
in 'git pack-objects' has a tendency to cause collisions and cause poor
delta selection. This change creates an alternative that avoids some
collisions while preserving some amount of hash locality.
The pack_name_hash() method has not been materially changed since it was
introduced in ce0bd64 (pack-objects: improve path grouping
heuristics., 2006-06-05). The intention here is to group objects by path
name, but also attempt to group similar file types together by making
the most-significant digits of the hash be focused on the final
characters.
Here's the crux of the implementation:
/*
* This effectively just creates a sortable number from the
* last sixteen non-whitespace characters. Last characters
* count "most", so things that end in ".c" sort together.
*/
while ((c = *name++) != 0) {
if (isspace(c))
continue;
hash = (hash >> 2) + (c << 24);
}
As the comment mentions, this only cares about the last sixteen
non-whitespace characters. This cause some filenames to collide more than
others. This collision is somewhat by design in order to promote hash
locality for files that have similar types (.c, .h, .json) or could be the
same file across a directory rename (a/foo.txt to b/foo.txt). This leads to
decent cross-path deltas in cases like shallow clones or packing a
repository with very few historical versions of files that share common data
with other similarly-named files.
However, when the name-hash instead leads to a large number of name-hash
collisions for otherwise unrelated files, this can lead to confusing the
delta calculation to prefer cross-path deltas over previous versions of the
same file.
The new pack_name_hash_v2() function attempts to fix this issue by
taking more of the directory path into account through its hash
function. Its naming implies that we will later wire up details for
choosing a name-hash function by version.
The first change is to be more careful about paths using non-ASCII
characters. With these characters in mind, reverse the bits in the byte
as the least-significant bits have the highest entropy and we want to
maximize their influence. This is done with some bit manipulation that
swaps the two halves, then the quarters within those halves, and then
the bits within those quarters.
The second change is to perform hash composition operations at every
level of the path. This is done by storing a 'base' hash value that
contains the hash of the parent directory. When reaching a directory
boundary, we XOR the current level's name-hash value with a downshift of
the previous level's hash. This perturbation intends to create low-bit
distinctions for paths with the same final 16 bytes but distinct parent
directory structures.
The collision rate and effectiveness of this hash function will be
explored in later changes as the function is integrated with 'git
pack-objects' and 'git repack'.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When migrating reflogs between reference backends, maintaining the
original order of the reflog entries is crucial. To achieve this, an
`index` field is stored within the `ref_update` struct that encodes the
relative order of reflog entries. This field is used by the reftable
backend as update index for the respective reflog entries to maintain
that ordering.
These update indices must be respected when writing table headers, which
encode the minimum and maximum update index of contained records in the
header and footer. This logic was added in commit bc67b4ab5f (reftable:
write correct max_update_index to header, 2025-01-15), which started to
use `reftable_writer_set_limits()` to propagate the mininum and maximum
update index of all records contained in a ref transaction.
However, we only set the maximum update index for the first transaction
argument, even though there can be multiple such arguments. This is the
case when we write to multiple stacks in a single transaction, e.g. when
updating references in two different worktrees at once. Consequently,
the update index for all but the first argument remain uninitialized,
which may cause undefined behaviour.
Fix this by moving the assignment of the maximum update index in
`reftable_be_transaction_finish()` inside the loop, which ensures that
all elements of the array are correctly initialized.
Furthermore, initialize the `max_index` field to 0 when queueing a new
transaction argument. This is not strictly necessary, as all elements of
`write_transaction_table_arg.max_index` are now assigned correctly.
However, this initialization is added for consistency and to safeguard
against potential future changes that might inadvertently introduce
uninitialized memory access.
Reported-by: Johannes Schindelin <Johannes.Schindelin@gmx.de>
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In b1b713f722 (fetch set_head: handle mirrored bare repositories,
2024-11-22) it was implicitly assumed that all remotes will be mirrors
in a bare repository, thus fetching a non-mirrored remote could lead to
HEAD pointing to a non-existent reference. Make sure we only overwrite
HEAD if we are in a bare repository and fetching from a mirror.
Otherwise, proceed as normally, and create
refs/remotes/<nonmirrorremote>/HEAD instead.
Reported-by: Christian Hesse <list@eworm.de>
Signed-off-by: Bence Ferdinandy <bence@ferdinandy.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
As a preparatory step to use even more properties from the remote
struct, refactor set_head to take the entire struct as a parameter,
instead of the necessary bits. This also allows consolidating the use of
gtransport->remote in set_head, making the access of the remote's
properties consistent in the function.
Signed-off-by: Bence Ferdinandy <bence@ferdinandy.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Already when introduced in c7a8a16239 (Add bundle transport,
2007-09-10), the `bundle` transport had a bug where it would open a file
descriptor to the bundle file and then close it _twice_: First, the file
descriptor (`data->fd`) is passed to `unbundle()`, which would use it as
the `stdin` of the `index-pack` process, which as a consequence would
close it via `start_command()`. However, `data->fd` would still hold the
numerical value of the file descriptor, and `close_bundle()` would see
that and happily close it again.
This seems not to have caused too many problems in almost two decades,
but I encountered a situation today where it _does_ cause problems: In
i686 variants of Git for Windows, it seems that file descriptors are
reused quickly after they have been closed.
In the particular scenario I faced, `git fetch <bundle> <ref>` gets the
same file descriptor value when opening the bundle file and importing
its embedded packfile (which implicitly closes the file descriptor) and
then when opening a pack file in `fetch_and_consume_refs()` while
looking up an object's header.
Later on, after the bundle has been imported (and the `close_bundle()`
function erroneously closes the file descriptor that has _already_ been
closed when using it as `stdin` for `git index-pack`), the same file
descriptor value has now been reused via `use_pack()`. Now, when either
the recursive fetch (which defaults to "on", unfortunately) or a
commit-graph update needs to `mmap()` the packfile, it fails due to a
now-invalid file descriptor that _should_ point to the pack file but
doesn't anymore.
To fix that, let's invalidate `data->fd` after calling `unbundle()`.
That way, `close_bundle()` does not close a file descriptor that may
have been reused for something different. While at it, document that
`unbundle()` closes the file descriptor, and ensure that it also does
that when failing to verify the bundle.
Luckily, this bug does not affect the bundle URI feature, it only
affects the `git fetch <bundle>` code path.
Note that this patch does not _completely_ clarifies who is responsible
to close that file descriptor, as `run_command()` may fail _without_
closing `cmd->in`. Addressing this issue thoroughly, however, would
require a rather thorough re-design of the `start_command()` and
`finish_command()` functionality to make it a lot less murky who is
responsible for what file descriptors.
At least this here patch is relatively easy to reason about, and
addresses a hard failure (`fatal: mmap: could not determine filesize`)
at the expense of leaking a file descriptor under very rare
circumstances in which `git fetch` would error out anyway.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This commit extends the functionality of `git gc`
by adding a new option, `--expire-to=<dir>`. Previously,
this feature was implemented in 91badeba32 (builtin/repack.c:
implement `--expire-to` for storing pruned objects, 2022-10-24),
which allowing users to specify a directory where unreachable
and expired cruft packs are stored during garbage collection.
However, users had to run `git repack --cruft --expire-to=<dir>`
followed by `git prune` to achieve similar results within `git gc`.
By introducing `--expire-to=<dir>` directly into `git gc`,
we simplify the process for users who wish to manage their
repository's cleanup more efficiently. This change involves
passing the `--expire-to=<dir>` parameter through to `git repack`,
making it easier for users to set up a backup location for cruft
packs that will be pruned.
Due to the original `git gc --prune=now` deleting all unreachable
objects by passing the `-a` parameter to git repack. With the
addition of the `--cruft` and `--expire-to` options, it is necessary
to modify this default behavior: instead of deleting these
unreachable objects, they should be merged into a cruft pack and
collected in a specified directory. Therefore, we do not pass `-a`
to the repack command but instead pass `--cruft`, `--expire-to`,
and `--cruft-expiration=now` to repack.
Signed-off-by: ZheNing Hu <adlternative@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The trailer.* configuration variables are currently only described in
git-interpret-trailers(1) but affect git-commit and git-tag as well.
Move that section into its own config/trailer.txt file and also include
it in git-config(1).
Signed-off-by: Julian Prein <julian@druckdev.xyz>
Acked-by: Eric Sesterhenn <eric.sesterhenn@x41-dsec.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Back when Git was in its infancy, remotes were configured via separate
files in "branches/" (back in 2005). This mechanism was replaced later
that year with the "remotes/" directory. Both mechanisms have eventually
been replaced by config-based remotes, and it is very unlikely that
anybody still uses these directories to configure their remotes.
Both of these directories have been marked as deprecated, one in 2005
and the other one in 2011. Follow through with the deprecation and
finally announce the removal of these features in Git 3.0.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
[jc: with a small tweak to the help message]
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Document that it is insecure to use Personal Access Tokens, which
some hosting providers take as username/password, embedded in URLs.
* mh/doc-credential-helpers-with-pat:
docs: discuss caching personal access tokens
docs: list popular credential helpers
The "instaweb" bound only to local IP address without "--local" and
to all addresses with "--local", which was the other way around, when
using Python's http.server class, which has been corrected.
* ak/instaweb-python-port-binding-fix:
instaweb: fix ip binding for the python http.server
The meson build procedure for Documentation/technical/ hierarchy was
missing necessary dependencies, which has been corrected.
* sj/meson-doc-technical-dependency-fix:
meson: fix missing deps for technical articles
The meson build procedure looked for the 'version-def.h' file in a
wrong directory, which has been corrected.
* tc/meson-use-our-version-def-h:
meson: ensure correct version-def.h is used
Extended SHA-1 expression parser did not work well when a branch
with an unusual name (e.g. "foo{bar") is involved.
* en/object-name-with-funny-refname-fix:
object-name: be more strict in parsing describe-like output
object-name: fix resolution of object names containing curly braces
* ds/path-walk-1:
path-walk: drop redundant parse_tree() call
path-walk: reorder object visits
path-walk: mark trees and blobs as UNINTERESTING
path-walk: visit tags and cached objects
path-walk: allow consumer to specify object types
t6601: add helper for testing path-walk API
test-lib-functions: add test_cmp_sorted
path-walk: introduce an object walk by path
Now that all callers have been converted from:
the_hash_algo->unsafe_init_fn();
to
unsafe_hash_algo(the_hash_algo)->init_fn();
and similar, we can remove the scaffolding for the unsafe_ function
variants and force callers to use the new unsafe_hash_algo() mechanic
instead.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In 106140a99f (builtin/fast-import: fix segfault with unsafe SHA1
backend, 2024-12-30) and 9218c0bfe1 (bulk-checkin: fix segfault with
unsafe SHA1 backend, 2024-12-30), we observed the effects of failing to
initialize a hashfile_checkpoint with the same hash function
implementation as is used by the hashfile it is used to checkpoint.
While both 106140a99f and 9218c0bfe1 work around the immediate crash,
changing the hash function implementation within the hashfile API to,
for example, the non-unsafe variant would re-introduce the crash. This
is a result of the tight coupling between initializing hashfiles and
hashfile_checkpoints.
Introduce and use a new function which ensures that both parts of a
hashfile and hashfile_checkpoint pair use the same hash function
implementation to avoid such crashes.
A few things worth noting:
- In the change to builtin/fast-import.c::stream_blob(), we can see
that by removing the explicit reference to
'the_hash_algo->unsafe_init_fn()', we are hardened against the
hashfile API changing away from the_hash_algo (or its unsafe
variant) in the future.
- The bulk-checkin code no longer needs to explicitly zero-initialize
the hashfile_checkpoint, since it is now done as a result of calling
'hashfile_checkpoint_init()'.
- Also in the bulk-checkin code, we add an additional call to
prepare_to_stream() outside of the main loop in order to initialize
'state->f' so we know which hash function implementation to use when
calling 'hashfile_checkpoint_init()'.
This is OK, since subsequent 'prepare_to_stream()' calls are noops.
However, we only need to call 'prepare_to_stream()' when we have the
HASH_WRITE_OBJECT bit set in our flags. Without that bit, calling
'prepare_to_stream()' does not assign 'state->f', so we have nothing
to initialize.
- Other uses of the 'checkpoint' in 'deflate_blob_to_pack()' are
appropriately guarded.
Helped-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Remove a series of conditionals within the shared cmd_hash_impl() helper
that powers the 'sha1' and 'sha1-unsafe' helpers.
Instead, replace them with a single conditional that transforms the
specified hash algorithm into its unsafe variant. Then all subsequent
calls can directly use whatever function it wants to call without having
to decide between the safe and unsafe variants.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Instead of calling the unsafe_ hash function variants directly, make use
of the shared 'algop' pointer by initializing it to:
f->algop = unsafe_hash_algo(the_hash_algo);
, thus making all calls use the unsafe variants directly.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In 253ed9ecff (hash.h: scaffolding for _unsafe hashing variants,
2024-09-26), we introduced "unsafe" variants of the SHA-1 hashing
functions by introducing new functions like "unsafe_init_fn()" and so
on.
This approach has a major shortcoming that callers must remember to
consistently use one variant or the other. Failing to consistently use
(or not use) the unsafe variants can lead to crashes at best, or subtle
memory corruption issues at worst.
In the hashfile API, this isn't difficult to achieve, but verifying that
all callers consistently use the unsafe variants is somewhat of a chore
given how spread out all of the callers are. In the sha1 and sha1-unsafe
test helpers, all of the calls to various hash functions are guarded by
an "if (unsafe)" conditional, which is repetitive and cumbersome.
Address these issues by introducing a new pattern whereby one
'git_hash_algo' can return a pointer to another 'git_hash_algo' that
represents the unsafe version of itself. So instead of having something
like:
if (unsafe)
the_hash_algo->init_fn(...);
the_hash_algo->update_fn(...);
the_hash_algo->final_fn(...);
else
the_hash_algo->unsafe_init_fn(...);
the_hash_algo->unsafe_update_fn(...);
the_hash_algo->unsafe_final_fn(...);
we can instead write:
struct git_hash_algo *algop = the_hash_algo;
if (unsafe)
algop = unsafe_hash_algo(algop);
algop->init_fn(...);
algop->update_fn(...);
algop->final_fn(...);
This removes the existing shortcoming by no longer forcing the caller to
"remember" which variant of the hash functions it wants to call, only to
hold onto a 'struct git_hash_algo' pointer that is initialized once.
Similarly, while there currently is still a way to "mix" safe and unsafe
functions, this too will go away after subsequent commits remove all
direct calls to the unsafe_ variants.
Note that hash_algo_by_ptr() needs an adjustment to allow passing in the
unsafe variant of a hash function. All other query functions on the
hash_algos array will continue to return the safe variants of any
function.
Suggested-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Perform a similar transformation as in the previous commit, but focused
instead on hashfile_checksum_valid(). This function does not work with a
hashfile structure itself, and instead validates the raw contents of a
file written using the hashfile API.
We'll want to be prepared for a similar change to this function in the
future, so prepare ourselves for that by extracting 'the_hash_algo' into
its own field for use within this function.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Throughout the hashfile API, we rely on a reference to 'the_hash_algo',
and call its _unsafe function variants directly.
Prepare for a future change where we may use a different 'git_hash_algo'
pointer (instead of just relying on 'the_hash_algo' throughout) by
making the 'git_hash_algo' pointer a member of the 'hashfile' structure
itself.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
With the new "unsafe" SHA-1 build knob, it is convenient to have a
test-tool that can exercise Git's unsafe SHA-1 wrappers for testing,
similar to 't/helper/test-tool sha1'.
Implement that helper by altering the implementation of that test-tool
(in cmd_hash_impl(), which is generic and parameterized over different
hash functions) to conditionally run the unsafe variants of the chosen
hash function, and expose the new behavior via a new 'sha1-unsafe' test
helper.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When TRACE2 analytics is enabled, a configuration variable set to
"valueless true" causes a segfault.
Steps to Reproduce
GIT_TRACE2=true GIT_TRACE2_CONFIG_PARAMS=status.*
git -c status.relativePaths version
Expected Result
git version 2.46.0
Actual Result
zsh: segmentation fault GIT_TRACE2=true
Add checks to prevent the segfault and instead show that the
variable without value.
Signed-off-by: Adam Murray <ad@canva.com>
Acked-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The commit 297c09eabb (refs: allow multiple reflog entries for the
same refname, 2024-12-16) added logic to exit early in
`lock_ref_for_update()` after obtaining the required lock. This was
added as a performance optimization on a false assumption that no
further processing was required for reflog-only updates.
However the assumption was wrong. For a symref's reflog entry, the
update needs to be populated with the old_oid value, but the early
exit skipped this necessary step.
This caused a bug in Git 2.48 in the files backend where target
references of symrefs being updated would create a corrupted reflog
entry for the symref since the old_oid is not populated.
Everything the early exit skipped in the code path is necessary for
both regular and symbolic ref, so eliminate the mistaken
optimization, and also add a test to ensure that such an issue
doesn't arise in the future.
Reported-by: Nika Layzell <nika@thelayzells.com>
Co-authored-by: Jeff King <peff@peff.net>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This call to parse_tree() was flagged by Coverity for ignoring the
return value. But if we look a little further up the function, we can
see that there is already a call to parse_tree_gently(), and we'll
return early if that fails. So by this point the tree will always be
parsed, and the call is redundant.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* ps/build-meson-fixes:
ci: wire up Visual Studio build with Meson
ci: raise error when Meson generates warnings
meson: fix compilation with Visual Studio
meson: make the CSPRNG backend configurable
meson: wire up fuzzers
meson: wire up generation of distribution archive
meson: wire up development environments
meson: fix dependencies for generated headers
meson: populate project version via GIT-VERSION-GEN
GIT-VERSION-GEN: allow running without input and output files
GIT-VERSION-GEN: simplify computing the dirty marker
Add a new job to GitHub Actions and GitLab CI that builds and tests
Meson-based builds with Visual Studio.
A couple notes:
- While the build job is mandatory, the test job is marked as "manual"
on GitLab so that it doesn't run by default. We already have a bunch
of Windows-based jobs, and the computational overhead that these
cause is simply out of proportion to run the test suite twice.
The same isn't true for GitHub as I could not find a way to make a
subset of jobs manually triggered.
- We disable Perl. This is because we pick up Perl from Git for
Windows, which outputs different paths ("/c/" instead of "C:\") than
what we expect in our tests.
- We don't use the Git for Windows SDK. Instead, the build only
depends on Visual Studio, Meson and Git for Windows. All the other
dependencies like curl, pcre2 and zlib get pulled in and compiled
automatically by Meson and thus do not have to be provided by the
system.
- We open-code "ci/run-test-slice.sh". This is because we only have
direct access to PowerShell, so we manually implement the logic.
There is an upstream pull request for the Meson build system [1] to
implement test slicing in Meson directly.
- We don't process test artifacts for failed CI jobs. This is done to
keep down prerequisites to a minimum.
All tests are passing.
[1]: https://github.com/mesonbuild/meson/pull/14092
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Meson prints warnings in several cases, like for example when using a
feature supported by the current version of Meson, but not yet supported
by the minimum required version as declared by the project. These
warnings will not cause the setup to fail by default, which makes it
quite easy to miss them.
Improve this by passing `--fatal-meson-warnings` to `meson setup` so
that our CI jobs will fail on warnings.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The Visual Studio compiler defaults to C89 unless explicitly asked to
use a different version of the C standard. We don't specify any C
standard at all though in our Meson build, and consequently compiling
Git fails:
...\git\git-compat-util.h(14): fatal error C1189: #error: "Required C99 support is in a test phase. Please see git-compat-util.h for more details."
Fix the issue by specifying the project's C standard. Funny enough,
specifying C99 does not work because apparently, `__STDC_VERSION__` is
not getting defined in that version at all. Instead, we have to specify
C11 as the project's C standard, which is also done in our CMake build
instructions.
We don't want to generally enforce C11 though, as our requiremets only
state that a C99 compiler is required. In fact, we don't even require
plain C99, but rather the GNU variant thereof.
Meson allows us to handle this case rather easily by specifying
"gnu99,c11", which will cause it to fall back to C11 in case GNU C99 is
unsupported. This feature has only been introduced with Meson 1.3.0
though, and we support 0.61.0 and newer. In case we use such an oldish
version though we fall back to requiring GNU99 unconditionally. This
means that Windows essentially requires Meson 1.3.0 and newer when using
Visual Studio, but I doubt that this is ever going to be a real problem.
Tested-by: M Hickford <mirth.hickford@gmail.com>
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The CSPRNG backend is not configurable in Meson and isn't quite
discoverable, either. Make it configurable and add the actual backend
used to the summary.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Meson does not yet know to build our fuzzers. Introduce a new build
option "fuzzers" and wire up the fuzzers in case it is enabled. Adapt
our CI jobs so that they build the fuzzers by default.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Meson knows to generate distribution archives via `meson dist`. In
addition to generating the archive itself, this target also knows to
compile and execute tests from that archive, which helps to ensure that
the result is an adequate drop-in replacement for the versioned project.
While this already works as-is, one omission is that we don't propagate
the commit that this is built from into the resulting archive. This can
be fixed though by adding a distribution script that propagates the
version into the "version" file, which GIT-VERSION-GEN knows to read if
present.
Use GIT-VERSION-GEN to populate that file. As the script is executed in
the build directory, not in the directory where we generate the archive,
we have to use a shell to resolve the "MESON_DIST_ROOT" environment
variable.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The Meson build system is able to wire up development environments. The
intent is to make build artifacts of the project available. This is
typically used to export e.g. paths to linkable libraries, which isn't
all that interesting in our context given that we don't have an official
library interface.
But what we can use this mechanism for is to expose the built Git
executables as well as the build directory. This allows users to play
around with the built Git version in the devenv, and allows them to
execute our test scripts directly with the built distribution.
Wire up this feature, which can then be used via `meson devenv` in the
build directory.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We generate a couple of headers from our documentation. These headers
are added to the libgit sources, but two of them aren't used by the
library, but instead by our builtins. This can cause parallel builds to
fail because the builtin object may be compiled before the header was
generated.
Fix the issue by adding both "config-list.h" and "hook-list.h" to the
list of builtin sources. While "command-list.h" is generated similarly,
it is used by "help.c" and thus part of the libgit sources indeed.
Reported-by: Evan Martin <evan.martin@gmail.com>
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The Git version for Meson is currently wired up manually. It can thus
grow (and already has grown) stale quite easily, as having multiple
sources of truth is never a good idea. This issue is mostly of cosmetic
nature as we don't use the project version anywhere, and instead use the
GIT-VERSION-GEN script to propagate the correct version into our build.
But it is somewhat puzzling when `meson setup` announces to build an old
Git release.
There are a couple of alternatives for how to solve this:
- We can keep the version undefined, but this makes Meson output
"undefined" for the version, as well.
- We can use GIT-VERSION-GEN to generate the version for us. At the
point of configuring the project we haven't yet figured out host
details though, and thus we didn't yet set up the shell environment.
While not an issue for Unix-based systems, this would be an issue in
Windows, where the shell typically gets provided via Git for Windows
and thus requires some special setup.
- We can pull the default version out of GIT-VERSION-GEN and move it
into its own file. This likely requires some adjustments for scripts
that bump the version, but allows Meson to read the version from
that file trivially.
Pick the second option and use GIT-VERSION-GEN as it gives us the most
accurate version. In order to fix the bootstrapping issue on Windows
systems we simply set the version to 'unknown' in case no shell was
found. As the version is only of cosmetic value this isn't really much
of an issue.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The GIT-VERSION-GEN script requires an input file containing formatting
directives to be replaced as well as an output file that will get
overwritten in case the file contents have changed. When computing the
project version for Meson we don't want to have either though:
- We only want to compute the version without anything else, but don't
have an input file that would match that exact format. While we
could of course introduce a new file just for that usecase, it feels
suboptimal to add another file every time we want to have a slightly
different format for versioned data.
- The computed version needs to be read from stdout so that Meson can
wire it up for the project.
Extend the script to handle both usecases by recognizing `--format=` as
alternative to providing an input path and by writing to stdout in case
no output file was given.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The GIT-VERSION-GEN script computes the version that Git is being built
from. When building from a commit with an unclean worktree it knows to
append "-dirty" to that version to indicate that there were custom
changes applied and that it isn't the exact same as that commit.
The dirtiness check is done manually via git-diff-index(1), which is
somewhat puzzling though: we already use git-describe(1) to compute the
version, which also knows to compute dirtiness via the "--dirty" flag.
But digging back in history explains why: the "-dirty" suffix was added
in 31e0b2ca81 (GIT 1.5.4.3, 2008-02-23), and git-describe(1) didn't yet
have support for "--dirty" back then.
Refactor the script to use git-describe(1). Despite being simpler, it
also results in a small speedup:
Benchmark 1: git describe --dirty --match "v[0-9]*"
Time (mean ± σ): 12.5 ms ± 0.3 ms [User: 6.3 ms, System: 8.8 ms]
Range (min … max): 12.0 ms … 13.5 ms 200 runs
Benchmark 2: git describe --match "v[0-9]*" HEAD && git update-index -q --refresh && git diff-index --name-only HEAD --
Time (mean ± σ): 17.9 ms ± 1.1 ms [User: 8.8 ms, System: 14.4 ms]
Range (min … max): 17.0 ms … 30.6 ms 148 runs
Summary
git describe --dirty --match "v[0-9]*" ran
1.43 ± 0.09 times faster than git describe --match "v[0-9]*" && git update-index -q --refresh && git diff-index --name-only HEAD --
While the speedup doesn't really matter on Unix-based systems, where
filesystem operations are typically fast, they do matter on Windows
where the commands take a couple hundred milliseconds. A quick and dirty
check on that system shows a speedup from ~800ms to ~400ms.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The git-pack-redundant(1) subcommand has been castrated to require
the "--i-still-use-this" option to do anything since 4406522b
(pack-redundant: escalate deprecation warning to an error,
2023-03-23), which appeared in Git 2.41 and was announced for
removal with 53a92c9552 (Documentation/BreakingChanges: announce
removal of git-pack-redundant(1), 2024-09-02). Stop compiling the
subcommand in case the `WITH_BREAKING_CHANGES` build flag is set.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The "linux-gcc" job isn't all that interesting by itself and can be
considered more or less the "standard" job: it is running with a
reasonably up-to-date image and uses GCC as a compiler, both of which we
already cover in other jobs.
There is one exception though: we change the default branch to be "main"
instead of "master", so it is forging ahead a bit into the future to
make sure that this change does not cause havoc. So let's expand on this
a bit and also add the new "WITH_BREAKING_CHANGES" flag to the mix.
Rename the job to "linux-breaking-changes" accordingly.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The "linux-gcc-default" job is mostly doing the same as the "linux-gcc"
job, except for a couple of minor differences:
- We use an explicit GCC version instead of the default version
provided by the distribution. We have other jobs that test with
"gcc-8", making this distinction pointless.
- We don't set up the Python version explicitly, and instead use the
default Python version. Python 2 has been end-of-life for quite a
while now though, making this distinction less interesting.
- We set up the default branch name to be "main" in "linux-gcc". We
have other testcases that don't and also some that explicitly use
"master".
- We use "ubuntu:20.04" in one job and "ubuntu:latest" in another. We
already have a couple other jobs testing these respectively.
So overall, the job does not add much to our test coverage.
Drop the "linux-gcc-default" job and adapt "linux-gcc" to start using
the default GCC compiler, effectively merging those two jobs into one.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
With 57ec9254eb (docs: introduce document to announce breaking changes,
2024-06-14), we have introduced a new document that tracks upcoming
breaking changes in the Git project. In 2454970930 (BreakingChanges:
early adopter option, 2024-10-11) we have amended the document a bit to
mention that any introduced breaking changes must be accompanied by
logic that allows us to enable the breaking change at compile-time.
While we already have two breaking changes lined up, neither of them has
such a switch because they predate those instructions.
Introduce the proposed `WITH_BREAKING_CHANGES` preprocessor macro and
wire it up with both our Makefiles and Meson. This does not yet wire up
the build flag for existing deprecations.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In 246cebe320 (refs: add support for migrating reflogs, 2024-12-16) we
have added support to git-refs(1) to migrate reflogs between reference
backends. It was reported [1] though that not we don't migrate reflogs
for a subset of references, most importantly "refs/stash".
This issue is caused by us still honoring "core.logAllRefUpdates" when
trying to migrate reflogs: we do queue the updates, but depending on the
value of that config we may decide to just skip writing the reflog entry
altogether. And given that:
- The default for "core.logAllRefUpdates" is to only create reflogs
for branches, remotes, note refs and "HEAD"
- "refs/stash" is neither of these ref types.
We end up skipping the reflog creation for that particular reference.
Fix the bug by setting `REF_FORCE_CREATE_REFLOG`, which instructs the
ref backends to create the reflog entry regardless of the config or any
preexisting state.
[1]: <Z5BTQRlsOj1sygun@tapette.crustytoothpaste.net>
Reported-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The function `reftable_writer_set_limits()` allows updating the
'min_update_index' and 'max_update_index' of a reftable writer. These
values are written to both the writer's header and footer.
Since the header is written during the first block write, any subsequent
changes to the update index would create a mismatch between the header
and footer values. The footer would contain the newer values while the
header retained the original ones.
To protect against this bug, prevent callers from updating these values
after any record is written. To do this, modify the function to return
an error whenever the limits are modified after any record adds. Check
for record adds within `reftable_writer_set_limits()` by checking the
`last_key` and `next` variable. The former is updated after each record
added, but is reset at certain points. The latter is set after writing
the first block.
Modify all callers of the function to anticipate a return type and
handle it accordingly. Add a unit test to also ensure the function
returns the error as expected.
Helped-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The 'ref_update.index' variable is used to store an index for a given
reference update. This index is used to order the updates in a
predetermined order, while the default ordering is alphabetical as per
the refname.
For large repositories with millions of references, it should be safer
to use 'uint64_t'. Let's do that. This also is applied for all other
code sections where we store 'index' and pass it around.
Reported-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The `ref_transaction_update_reflog()` function is only used within
'refs.c', so mark it as static.
Reported-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Address the last couple of trivial -Wsign-compare warnings in the
reftable library and remove the DISABLE_SIGN_COMPARE_WARNINGS macro that
we have in "reftable/system.h".
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The `block_source_read_block()` function and its implementations return
an integer as a result that reflects either the number of bytes read, or
an error. As such its return type, a signed integer, isn't wrong, but it
doesn't give the reader a good hint what it actually returns.
Refactor the function to return an `ssize_t` instead, which is typical
for functions similar to read(3p) and should thus give readers a better
signal what they can expect as a result.
Adjust callers to better handle the returned value to avoid warnings
with -Wsign-compare. One of these callers is `reader_get_block()`, whose
return value is only ever used by its callers to figure out whether or
not the read was successful. So instead of bubbling up the `ssize_t`
there, too, we adapt it to only indicate success or errors.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The block length is used to track the number of bytes available in a
specific block. As such, it is never set to a negative value, but is
still represented by a signed integer.
Adjust the type of the variable to be `size_t`.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The restart length is tracked as a positive integer even though it
cannot ever be negative. Furthermore, it is effectively capped via the
MAX_RESTARTS variable.
Adjust the type of the variable to be `uint32_t`. While this type is
excessive given that MAX_RESTARTS fits into an `uint16_t`, other places
already use 32 bit integers for restarts, so this type is being more
consistent.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The functions `header_size()` and `footer_size()` return a positive
integer representing the size of the header and footer, respectively,
dependent on the version of the reftable format. Similar to the
preceding commit, these functions return a signed integer though, which
is nonsensical given that there is no way for these functions to return
negative.
Adapt the functions to return a `size_t` instead to fix a couple of sign
comparison warnings.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The `hash_size()` function returns the number of bytes used by the hash
function. Weirdly enough though, it returns a signed integer for its
size even though the size obviously cannot ever be negative. The only
case where it could be negative is if the function returned an error
when asked for an unknown hash, but we assert(3p) instead.
Adjust the type of `hash_size()` to be `uint32_t` and adapt all places
that use signed integers for the hash size to follow suit. This also
allows us to get rid of a couple asserts that we had which verified that
the size was indeed positive, which further stresses the point that this
refactoring makes sense.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The `common_prefix_size()` function computes the length of the common
prefix between two buffers. As such its return value will always be an
unsigned integer, as the length cannot be negative. Regardless of that,
the function returns a signed integer, which is nonsensical and causes a
couple of -Wsign-compare warnings all over the place.
Adjust the function to return a `size_t` instead.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The logic to decode varints isn't able to detect integer overflows: as
long as the buffer still has more data available, and as long as the
current byte has its 0x80 bit set, we'll continue to add up these values
to the result. This will eventually cause the `uint64_t` to overflow, at
which point we'll return an invalid result.
Refactor the function so that it is able to detect such overflows. The
implementation is basically copied from Git's own `decode_varint()`,
which already knows to handle overflows. The only adjustment is that we
also take into account the string view's length in order to not overrun
it. The reftable documentation explicitly notes that those two encoding
schemas are supposed to be the same:
Varint encoding
^^^^^^^^^^^^^^^
Varint encoding is identical to the ofs-delta encoding method used
within pack files.
Decoder works as follows:
....
val = buf[ptr] & 0x7f
while (buf[ptr] & 0x80) {
ptr++
val = ((val + 1) << 7) | (buf[ptr] & 0x7f)
}
....
While at it, refactor `put_var_int()` in the same way by copying over
the implementation of `encode_varint()`. While `put_var_int()` doesn't
have an issue with overflows, it generates warnings with -Wsign-compare.
The implementation of `encode_varint()` doesn't, is battle-tested and at
the same time way simpler than what we currently have.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In 42c424d69d (t/helper: inline printing of reftable records,
2024-08-22) we stopped using the `print` function of the reftable record
vtable and instead moved its implementation into the single user of it.
We didn't remove the function itself from the vtable though. Drop it.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In 4f9264b0cd (config.mak.dev: drop `-Wno-sign-compare`, 2024-12-06) we
have started an effort to make our codebase compile with -Wsign-compare.
But while we removed the -Wno-sign-compare flag from "config.mak.dev",
we didn't adjust the Meson build instructions in the same way.
Fix this.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In e7fb2ca945 (builtin/blame: fix out-of-bounds write with blank
boundary commits, 2025-01-10), we have introduced two new tests that
expect a certain amount of padding. This padding is generated via
printf using the "%0.s" conversion specification. That directive is
ambiguous because it might be interpreted as field width (most shells)
or 0-padding flag for numeric fields (coreutils).
Fix this issue by using "%${N}s" instead, which is already being
used in other tests (i.e. t5300, t0450) and is unambiguous.
Helped-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Jan Palus <jpalus@fastmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Since we no longer have any AsciiDoc files that end in ".txt", don't
modify them with .gitattributes or ignore them with .gitignore.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We presently use the ".txt" extension for our AsciiDoc files. While not
wrong, most editors do not associate this extension with AsciiDoc,
meaning that contributors don't get automatic editor functionality that
could be useful, such as syntax highlighting and prose linting.
It is much more common to use the ".adoc" extension for AsciiDoc files,
since this helps editors automatically detect files and also allows
various forges to provide rich (HTML-like) rendering. Let's do that
here, renaming all of the files and updating the includes where
relevant. Adjust the various build scripts and makefiles to use the new
extension as well.
Note that this should not result in any user-visible changes to the
documentation.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In a future commit, we'll move the AsciiDoc documentation files to the
".adoc" extension rather than the extension ".txt". We need these files
to use only LF because they are read by generate-cmdlist.sh using the
read builtin.
If we allow CRLF here, the CR at the end of the line is treated as part
of the synopsis, since a POSIX shell doesn't consider it special like
LF. In that case, we generate synopsis strings in C that contain a CR,
which the compiler does not like because it believes that the double
quote string terminator is missing, and as a consequence, compilation
fails.
Because we rely on LF-only endings here to compile successfully and we
want Git to continue to be able to compile on Windows, mark these files
as LF-only in the .gitattributes file.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The .adoc extension is commonly used for AsciiDoc files. In a future
commit, we'll update some files to switch from the .txt extension to the
.adoc extension, so update the EditorConfig file to use the same
configuration for both extensions, since we want the files to be
formatted completely identically whether they're using the older or
newer extension.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We presently use the ".txt" extension for our AsciiDoc files. While not
wrong, most editors do not associate this extension with AsciiDoc,
meaning that contributors don't get automatic editor functionality that
could be useful, such as syntax highlighting and prose linting.
Instead, in a future commit, we're going to move to using the more
common ".adoc" extension for these files, which many editors
intrinsically recognize as an AsciiDoc file. To avoid contributors
accidentally checking in generated files, ignore the new extension for
generated files in the documentation .gitignore files.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The internal functions `write_rev_trailer()`, `write_rev_trailer()`,
`write_mtimes_header()` and write_mtimes_trailer()` use the global
`the_hash_algo` variable to access the repository's hash function. Pass
the hash_algo down from callers, all of which already have access to the
variable.
This removes all global variables from the 'pack-write.c' file, so
remove the 'USE_THE_REPOSITORY_VARIABLE' macro.
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The `write_rev_file()` function uses the global `the_hash_algo` variable
to access the repository's hash_algo. To avoid global variable usage,
pass a hash_algo from the layers above. Also modify children functions
`write_rev_file_order()` and `write_rev_header()` to accept
'the_hash_algo'.
Altough the layers above could have access to the hash_algo internally,
simply pass in `the_hash_algo`. This avoids any compatibility issues and
bubbles up global variable usage to upper layers which can be eventually
resolved.
However, in `midx-write.c`, since all usage of global variables is
removed, don't reintroduce them and instead use the `repo` available in
the context.
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The `write_idx_file()` function uses the global `the_hash_algo` variable
to access the repository's hash_algo. To avoid global variable usage,
pass a hash_algo from the layers above.
Since `stage_tmp_packfiles()` also resides in 'pack-write.c' and calls
`write_idx_file()`, update it to accept a `struct git_hash_algo` as a
parameter and pass it through to the callee.
Altough the layers above could have access to the hash_algo internally,
simply pass in `the_hash_algo`. This avoids any compatibility issues and
bubbles up global variable usage to upper layers which can be eventually
resolved.
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The `index_pack_lockfile()` function uses the global `the_repository`
variable to access the repository. To avoid global variable usage, pass
the repository from the layers above.
Altough the layers above could have access to the repository internally,
simply pass in `the_repository`. This avoids any compatibility issues
and bubbles up global variable usage to upper layers which can be
eventually resolved.
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The `fixup_pack_header_footer()` function uses the global
`the_hash_algo` variable to access the repository's hash function. To
avoid global variable usage, pass a hash_algo from the layers above.
Altough the layers above could have access to the hash_algo internally,
simply pass in `the_hash_algo`. This avoids any compatibility issues and
bubbles up global variable usage to upper layers which can be eventually
resolved.
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Now that ref_format_clear() no longer releases any memory we don't need
it anymore. Remove it and its counterpart, ref_format_init().
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The string_list "is_base_tips" in struct ref_format stores the
committish part of "is-base:<committish>". It has the same problems
that its sibling string_list "bases" had. Fix them the same way as the
previous commit did for the latter, by replacing the string_list with
fields in "used_atom".
Helped-by: Jeff King <peff@peff.net>
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
verify_ref_format() parses a ref-filter format string and stores
recognized items in the static array "used_atom". For
"ahead-behind:<committish>" it stores the committish part in a
string_list member "bases" of struct ref_format.
ref_sorting_options() also parses bare ref-filter format items and
stores stores recognized ones in "used_atom" as well. The committish
parts go to a dummy struct ref_format in parse_sorting_atom(), though,
and are leaked and forgotten.
If verify_ref_format() is called before ref_sorting_options(), like in
git for-each-ref, then all works well if the sort key is included in the
format string. If it isn't then sorting cannot work as the committishes
are missing.
If ref_sorting_options() is called first, like in git branch, then we
have the additional issue that if the sort key is included in the format
string then filter_ahead_behind() can't see its committish, will not
generate any results for it and thus it will be expanded to an empty
string.
Fix those issues by replacing the string_list with a field in used_atom
for storing the committish. This way it can be shared for handling both
ref-filter format strings and sorting options in the same command.
Reported-by: Ross Goldberg <ross.goldberg@gmail.com>
Helped-by: Jeff King <peff@peff.net>
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
More code paths have a repository passed through the callchain,
instead of assuming the primary the_repository object.
* ps/the-repository:
match-trees: stop using `the_repository`
graph: stop using `the_repository`
add-interactive: stop using `the_repository`
tmp-objdir: stop using `the_repository`
resolve-undo: stop using `the_repository`
credential: stop using `the_repository`
mailinfo: stop using `the_repository`
diagnose: stop using `the_repository`
server-info: stop using `the_repository`
send-pack: stop using `the_repository`
serve: stop using `the_repository`
trace: stop using `the_repository`
pager: stop using `the_repository`
progress: stop using `the_repository`
A misconfigured "fsck.skiplist" configuration variable was not
diagnosed as an error, which has been corrected.
* jt/fsck-skiplist-parse-fix:
fsck: reject misconfigured fsck.skipList
The code to compute "unique" name used git_rand() which can fail or
get stuck; the callsite does not require cryptographic security.
Introduce the "insecure" mode and use it appropriately.
* ps/reftable-get-random-fix:
reftable/stack: accept insecure random bytes
wrapper: allow generating insecure random bytes
The code to check LSan results has been simplified and made more
robust.
* jk/lsan-race-ignore-false-positive:
test-lib: add a few comments to LSan log checking
test-lib: simplify lsan results check
test-lib: invert return value of check_test_results_san_file_empty
When parsing --pack_header=, we manually skip 14 bytes to the data.
Let's use skip_prefix() to do this automatically.
Note that we overwrite our pointer to the front of the string, so we
have to add more context to the error message. We could avoid this by
declaring an extra pointer to hold the value, but I think the modified
message is actually preferable; it should give translators a bit more
context.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Both of these commands read the incoming pack into a static unsigned
char buffer in BSS, and then parse it by casting the start of the buffer
to a struct pack_header. This can result in SIGBUS on some platforms if
the compiler doesn't place the buffer in a position that is properly
aligned for 4-byte integers.
This reportedly happens with unpack-objects (but not index-pack) on
sparc64 when compiled with clang (but not gcc). But we are definitely in
the wrong in both spots; since the buffer's type is unsigned char, we
can't depend on larger alignment. When it works it is only because we
are lucky.
We'll fix this by switching to get_be32() to read the headers (just like
the last few commits similarly switched us to put_be32() for writing
into the same buffer).
It would be nice to factor this out into a common helper function, but
the interface ends up quite awkward. Either the caller needs to hardcode
how many bytes we'll need, or it needs to pass us its fill()/use()
functions as pointers. So I've just fixed both spots in the same way;
this is not code that is likely to be repeated a third time (most of the
pack reading code uses an mmap'd buffer, which should be properly
aligned).
I did make one tweak to the shared code: our pack_version_ok() macro
expects us to pass the big-endian value we'd get by casting. We can
introduce a "native" variant which uses the host integer ordering.
Reported-by: Koakuma <koachan@protonmail.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In order to recreate a pack header in our in-memory buffer, we cast the
buffer to a "struct pack_header" and assign the individual fields. This
is reported to cause SIGBUS on sparc64 due to alignment issues.
We can work around this by using put_be32() which will write individual
bytes into the buffer.
Reported-by: Koakuma <koachan@protonmail.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Both index-pack and unpack-objects accept a --pack_header argument. This
is an undocumented internal argument used by receive-pack and fetch to
pass along information about the header of the pack, which they've
already read from the incoming stream.
In preparation for a bugfix, let's factor the duplicated code into a
common helper.
The callers are still responsible for identifying the option. While this
could likewise be factored out, it is more flexible this way (e.g., if
they ever started using parse-options and wanted to handle both the
stuck and unstuck forms).
Likewise, the callers are responsible for reporting errors, though they
both just call die(). I've tweaked unpack-objects to match index-pack in
marking the error for translation.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In put_be32(), we right-shift a uint32_t value various amounts and then
assign the low 8-bits to individual "unsigned char" bytes, throwing away
the high bits. For shifts smaller than 24 bits, those thrown away bits
will be arbitrary bits from the original uint32_t.
This works exactly as we want, but if you feed a constant, then sparse
complains. For example if we write this (which we plan to do in a future
patch):
put_be32(hdr, PACK_SIGNATURE);
then "make sparse" produces:
compat/bswap.h:175:22: error: cast truncates bits from constant value (5041 becomes 41)
compat/bswap.h:176:22: error: cast truncates bits from constant value (504143 becomes 43)
compat/bswap.h:177:22: error: cast truncates bits from constant value (5041434b becomes 4b)
And the same issue exists in the other put_be*() functions, when used
with a constant.
We can silence this warning by explicitly masking off the truncated
bits. The compiler is smart enough to know the result is the same, and
the asm generated by gcc (with both -O0 and -O2) is identical.
Curiously this line already exists:
put_be32(&hdr_version, INDEX_EXTENSION_VERSION2);
in the fsmonitor.c file, but it does not get flagged because the CPP
macro expands to a small integer (2).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Adapts reftable tree test script to clar framework by using clar
assertions where necessary.
Mentored-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Seyi Kuforiji <kuforiji98@gmail.com>
Acked-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Convert the prio-queue test script to clar framework by using clar
assertions where necessary. Test functions are created as a standalone
to test different cases.
update the type of the variable `j` from int to `size_t`, this ensures
compatibility with the type used for result_size, which is also size_t,
preventing a potential warning or error caused by comparisons between
signed and unsigned integers.
Mentored-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Seyi Kuforiji <kuforiji98@gmail.com>
Acked-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Adapt the mem-pool test script to use clar framework by using clar
assertions where necessary.Test functions are created as a standalone to
test different test cases.
Mentored-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Seyi Kuforiji <kuforiji98@gmail.com>
Acked-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"generate-clar-decls.sh" script is designed to extract function
signatures that match a specific pattern derived from the unit test
file's name. The script does not know to massage file names with dashes,
which will make it search for functions that look like, for example,
`test_mem-pool_*`. Having dashes in function names is not allowed
though, so these patterns won't ever match a legal function name.
Adapt script to translate dashes (`-`) in test suite filenames to
underscores (`_`) to correctly extract the function signatures and run
the corresponding tests. This will be used by subsequent commits which
follows the same construct.
Mentored-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Seyi Kuforiji <kuforiji98@gmail.com>
Acked-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Using the show_usage_and_exit_if_asked() helper we introduced
earlier, fix callers of usage() that want to show the help text when
explicitly asked by the end-user. The help text now goes to the
standard output stream for them.
These are the bog standard "if we got only '-h', then that is a
request for help" callers. Their
if (argc == 2 && !strcmp(argv[1], "-h"))
usage(message);
are simply replaced with
show_usage_and_exit_if_asked(argc, argv, message);
With this, the built-ins tested by t0012 all send their help text to
their standard output stream, so the check in t0012 that was half
tightened earlier is now fully tightened to insist on standard error
stream being empty.
Acked-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Using the show_usage_if_asked() helper we introduced earlier, fix
callers of usage() that want to show the help text when explicitly
asked by the end-user. The help text now goes to the standard
output stream for them.
The callers in this step are oddballs in that their invocations of
usage() are *not* guarded by
if (argc == 2 && !strcmp(argv[1], "-h")
usage(...);
There are (unnecessarily) being clever ones that do things like
if (argc != 2 || !strcmp(argv[1], "-h")
usage(...);
to say "I know I take only one argument, so argc != 2 is always an
error regardless of what is in argv[]. Ah, by the way, even if argc
is 2, "-h" is a request for usage text, so we do the same".
Some like "git var -h" just do not treat "-h" any specially, and let
it take the same error code paths as a parameter error.
Now we cannot do the same, so these callers are rewrittin to do the
show_usage_and_exit_if_asked() first and then handle the usage error
the way they used to.
Acked-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Using the show_usage_with_options_if_asked() helper we introduced
earlier, fix callers of usage_with_options() that want to show the
help text when explicitly asked by the end-user. The help text now
goes to the standard output stream for them.
The test in t7600 for "git merge -h" may want to be retired, as the
same is covered by t0012 already, but it is specifically testing that
the "-h" option gets a response even with a corrupt index file, so
for now let's leave it there.
Acked-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Some commands call usage() when they are asked to give the help
message with "git cmd -h", but this has the same problem as we
fixed with callers of usage_with_options() for the same purpose.
Introduce a helper function that captures the common pattern
if (argc == 2 && !strcmp(argv[1], "-h"))
usage(usage);
and replaces it with
show_usage_if_asked(argc, argv, usage);
to help correct these code paths.
Note that this helper function still exits with status 129, and
t0012 insists on it. After converting all the mistaken callers of
usage_with_options() to call this new helper, we may want to address
it---the end user is asking us to give the help text, and we are
doing exactly as asked, so there is no reason to exit with non-zero
status.
Helped-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Many commands call usage_with_options() when they are asked to give
the help message, but it sends the help text to the standard error
stream. When the user asked for it with "git cmd -h", the help
message is the primary output from the command, hence we should send
it to the standard output stream, instead.
Introduce a helper function that captures the common pattern
if (argc == 2 && !strcmp(argv[1], "-h"))
usage_with_options(usage, options);
and replaces it with
show_usage_with_options_if_asked(argc, argv, usage, options);
to help correct code paths.
Note that this helper function still exits with status 129, and
t0012 insists on it. After converting all the mistaken callers of
usage_with_options() to call this new helper, we may want to address
it---the end user is asking us to give the help text, and we are
doing exactly as asked, so there is no reason to exit with non-zero
status.
Suggested-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
For most commands, "git foo -h" will send the help output to stdout, as
this is what parse-options.c does. But some commands send it to stderr
instead. This is usually because they call usage_with_options(), and
should be switched to show_usage_help_and_exit_if_asked().
Currently t0012 is permissive and allows either behavior. We'd like it
to eventually enforce that help goes to stdout, and teaching it to do so
identifies the commands that need to be changed. But during the
transition period, we don't want to enforce that for most test runs.
So let's introduce a flag that will let most test runs use the
permissive behavior, and people interested in converting commands can
run:
GIT_TEST_HELP_MUST_BE_STDOUT=1 ./t0012-help.sh
to see the failures. Eventually (when all builtins have been converted)
we'll remove this flag entirely and always check the strict behavior.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We centrally explain that "--no-whatever" is the way to countermand
the "--whatever" option. Explain that a configured default and the
value specified by an environment variable can be overridden by the
corresponding command line option, too.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Acked-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Wire up the git-subtree(1) command, which is part of "contrib/". Note
that we have to move around the exact location where we include the
"contrib/" subdirectory so that it comes after building the docs so that
we have access to some of the common functionality.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We unconditionally wire up building command completion present in the
"contrib/" directory. This may or may not be what users want, and we
don't provide a way to disable it.
Introduce a new "contrib" build option. This option is introduced as an
array so that users can manually pick which exact features they want to
include from the "contrib" directory. By default, we build and install
shell completions, which is a commonly used feature and also the current
default.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In a38edab7c8 (Makefile: generate doc versions via GIT-VERSION-GEN,
2024-12-06), we have refactored how we build our documentation by
injecting the Git version into the Asciidoc and AsciiDoctor config
files instead of doing so via arguments. As such, the original config
files were removed, where the expectation is that they get generated via
`GIT-VERSION-GEN` now.
Whie the git-subtree(1) command part of "contrib/" also builds docs
using these same config files, its Makefile wasn't adjusted accordingly
and thus building the docs is broken.
Fix this by using `GIT-VERSION-GEN` to generate those files.
Reported-by: Renato Botelho <garga@FreeBSD.org>
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Most of the warnings were about loop variables being declared as ints
with a condition using a size_t, whereby switching the variable to
size_t fixes the warning.
One other case was comparing the result of strlen to an int passed
as an argument, which turns out could just as well be passed as a
size_t, albeit trickling to other functions.
Signed-off-by: Mike Hommey <mh@glandium.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
meson-based build now supports the unsafe-sha1 build knob.
* ps/meson-weak-sha1-build:
meson: provide a summary of configured backends
meson: wire up unsafe SHA1 backend
meson: add missing dots for build options
meson: simplify conditions for HTTPS and SHA1 dependencies
meson: require SecurityFramework when it's used as SHA1 backend
meson: deduplicate access to SHA1/SHA256 backend options
meson: consistenlty spell 'CommonCrypto'
More -Wsign-compare fixes.
* ps/more-sign-compare:
sign-compare: avoid comparing ptrdiff with an int/unsigned
commit-reach: use `size_t` to track indices when computing merge bases
shallow: fix -Wsign-compare warnings
builtin/log: fix remaining -Wsign-compare warnings
builtin/log: use `size_t` to track indices
commit-reach: use `size_t` to track indices in `get_reachable_subset()`
commit-reach: use `size_t` to track indices in `remove_redundant()`
commit-reach: fix type of `min_commit_date`
commit-reach: fix index used to loop through unsigned integer
prio-queue: fix type of `insertion_ctr`
CI jobs gave sporadic failures, which turns out that that the
object finalization code was giving an error when it did not have
to.
* ps/object-collision-check:
object-file: retry linking file into place when occluding file vanishes
object-file: don't special-case missing source file in collision check
object-file: rename variables in `check_collision()`
object-file: fix race in object collision check
Tweak the help text used for the option value placeholders by
parse-options API so that translations can customize the "<>"
placeholder signal (e.g. "--option=<value>").
* as/long-option-help-i18n:
parse-options: localize mark-up of placeholder text in the short help
"git submodule" learned various ways to spell the same option,
e.g. "--branch=B" can be spelled "--branch B" or "-bB".
* re/submodule-parse-opt:
git-submodule.sh: rename some variables
git-submodule.sh: improve variables readability
git-submodule.sh: add some comments
git-submodule.sh: get rid of unused variable
git-submodule.sh: get rid of isnumber
git-submodule.sh: improve parsing of short options
git-submodule.sh: improve parsing of some long options
Also prevent git-commit manpage to refer to itself in the config
description by using a variable.
Signed-off-by: Jean-Noël Avila <jn.avila@free.fr>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
- Use imperative mood
- make use of the placeholder format to simplify style
Signed-off-by: Jean-Noël Avila <jn.avila@free.fr>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Fix the synopsis to reflect the option description.
Signed-off-by: Jean-Noël Avila <jn.avila@free.fr>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
- switch the synopsis to a synopsis block which will automatically
format placeholders in italics and keywords in monospace
- use _<placeholder>_ instead of <placeholder> in the description
- use `backticks for keywords and more complex option
descriptions`. The new rendering engine will apply synopsis rules to
these spans.
Signed-off-by: Jean-Noël Avila <jn.avila@free.fr>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In 297c09eabb (refs: allow multiple reflog entries for the same refname,
2024-12-16), the reftable backend learned to handle multiple reflog
entries within the same transaction. This was done modifying the
`update_index` for reflogs with multiple indices. During writing the
logs, the `max_update_index` of the writer was modified to ensure the
limits were raised to the modified `update_index`s.
However, since ref entries are written before the modification to the
`max_update_index`, if there are multiple blocks to be written, the
reftable backend writes the header with the old `max_update_index`. When
all logs are finally written, the footer will be written with the new
`min_update_index`. This causes a mismatch between the header and the
footer and causes the reftable file to be corrupted. The existing tests
only spawn a single block and since headers are lazily written with the
first block, the tests didn't capture this bug.
To fix the issue, the appropriate `max_update_index` limit must be set
even before the first block is written. Add a `max_index` field to the
transaction which holds the `max_index` within all its updates, then
propagate this value to the reftable backend, wherein this is used to
the set the `max_update_index` correctly.
Add a test which creates a few thousand reference updates with multiple
reflog entries, which should trigger the bug.
Reported-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We need an explicit `depends: documentation_deps` so that all of our
Documentation targets know they require asciidoc.conf. This shows up
as parallel build failures with it not yet being available.
Other targets look OK already.
Signed-off-by: Sam James <sam@gentoo.org>
Acked-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
To build the libgit-version library, Meson first generates
`version-def.h` in the build directory. Then it compiles `version.c`
into a library. During compilation, Meson tells to include both the
build directory and the project root directory.
However, when the user previously has compiled Git using Make, they will
have a `version-def.h` file in project root directory as well. Because
`version-def.h` is included in `version.c` using the #include directive
with double quotes, some preprocessors will look for the header file in
the same directory as the source file. This will cause compilation of
`version.c` ran by Meson to include `version-def.h` previously made by
Make, which might be out of date.
To explicitly tell the preprocessor which `version-def.h` to use, pass
the absolute path of this file as macro GIT_VERSION_H to the
preprocessor using option `-D` and have `version.c` `#include
GIT_VERSION_H`. To remain working with other build systems than Meson,
include "version-def.h" if that macro is not defined.
Co-authored-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Toon Claes <toon@iotcl.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
From Documentation/revisions.txt:
'<describeOutput>', e.g. 'v1.7.4.2-679-g3bee7fb'::
Output from `git describe`; i.e. a closest tag, optionally
followed by a dash and a number of commits, followed by a dash, a
'g', and an abbreviated object name.
which means that output of the format
${REFNAME}-${INTEGER}-g${HASH}
should parse to fully expanded ${HASH}. This is fine. However, we
currently don't validate any of ${REFNAME}-${INTEGER}, we only parse
-g${HASH} and assume the rest is valid. That is problematic, since it
breaks things like
git cat-file -p branchname:path/to/file/named/i-gaffed
which, when commit (or tree or blob) affed exists, will not return us
information about the file we are looking for but will instead
erroneously tell us about object affed.
A few additional notes:
- This is a slight backward incompatibility break, because we used
to allow ${GARBAGE}-g${HASH} as a way to spell ${HASH}. However,
a backward incompatible break is necessary, because there is no
other way for someone to be more specific and disambiguate that they
want the blob master:path/to/who-gabbed instead of the object abbed.
- There is a possibility that check_refname_format() rules change in
the future. However, we can only realistically loosen the rules
for what that function accepts rather than tighten. If we were to
tighten the rules, some real world repositories may already have
refnames that suddenly become unacceptable and we break those
repositories. As such, any describe-like syntax of the form
${VALID_FOR_A_REFNAME}-${INTEGER}-g${HASH} that is valid with the
changes in this commit will remain valid in the future.
- The fact that check_refname_format() rules could loosen in the
future is probably also an important reason to make this change. If
the rules loosen, there might be additional cases within
${GARBAGE}-g${HASH} that become ambiguous in the future. While
abbreviated hashes can be disambiguated by abbreviating less, it may
well be that these alternative object names have no way of being
disambiguated (much like pathnames cannot be). Accepting all random
${GARBAGE} thus makes it difficult for us to allow future
extensions to object naming.
So, tighten up the parsing to make sure ${REFNAME} and ${INTEGER} are
present in the string, and would be considered a valid ref and
non-negative integer.
Also, add a few tests for git describe using object names of the form
${REVISION_NAME}${MODIFIERS}
since an early version of this patch failed on constructs like
git describe v2.48.0-rc2-161-g6c2274cdbc^0
Reported-by: Gabriel Amaral <gabriel-amaral@github.com>
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Given a branch name of 'foo{bar', commands like
git cat-file -p foo{bar:README.md
should succeed (assuming that branch had a README.md file, of course).
However, the change in cce91a2caef9 (Change 'master@noon' syntax to
'master@{noon}'., 2006-05-19) presumed that curly braces would always
come after an '@' or '^' and be paired, causing e.g. 'foo{bar:README.md'
to entirely miss the ':' and assume there's no object being referenced.
In short, git would report:
fatal: Not a valid object name foo{bar:README.md
Change the parsing to only make the assumption of paired curly braces
immediately after either a '@' or '^' character appears.
Add tests for this, as well as for a few other test cases that initial
versions of this patch broke:
* 'foo@@{...}'
* 'foo^{/${SEARCH_TEXT_WITH_COLON}}:${PATH}'
Note that we'd prefer not duplicating the special logic for "@^" characters
here, because if get_oid_basic() or interpret_nth_prior_checkout() or
get_oid_basic() or similar gain extra methods of using curly braces,
then the logic in get_oid_with_context_1() would need to be updated as
well. But it's not clear how to refactor all of these to have a simple
common callpoint with the specialized logic.
Reported-by: Gabriel Amaral <gabriel-amaral@github.com>
Helped-by: Michael Haggerty <mhagger@github.com>
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* ps/meson-weak-sha1-build:
meson: provide a summary of configured backends
meson: wire up unsafe SHA1 backend
meson: add missing dots for build options
meson: simplify conditions for HTTPS and SHA1 dependencies
meson: require SecurityFramework when it's used as SHA1 backend
meson: deduplicate access to SHA1/SHA256 backend options
meson: consistenlty spell 'CommonCrypto'
A help.autocorrect value of 1 is currently interpreted as "wait 1
decisecond", which can be confusing to users who believe they are setting a
boolean value to turn the autocorrect feature on.
Interpret the value of help.autocorrect as either one of the accepted list
of special values ("never", "immediate", ...), a boolean or an integer. If
the value is 1, it is no longer interpreted as a decisecond value of 0.1s
but as a true boolean, the equivalent of "immediate". If the value is 2 or
more, continue treating it as a decisecond wait time.
False boolean string values ("off", "false", "no") are now equivalent to
"never", meaning that guessed values are still shown but nothing is
executed. True boolean string values are interpreted as "immediate".
Signed-off-by: Scott Chacon <schacon@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When using remotes (with git-flow especially), the remote reference names
are almost always wordwrapped in the "list references" window because it's
somewhat narrow by default. It's possible to resize it with a mouse,
but it's annoying to have to do this every time, especially on Windows 10,
where the window border seems to be only one (1) pixel wide, thus making
the grabbing of the window border tricky.
Signed-off-by: James J. Raden <james.raden@gmail.com>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Tcl/Tk 8.6 introduced new events for the cursor left/right keys and
apparently changed the behavior of the previous event.
Let's work around that by using the new events when we are running with
Tcl/Tk 8.6 or later.
This fixes https://github.com/git-for-windows/git/issues/495
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Git for Windows now ships with the new Git icon from git-scm.com. Use that
icon file if it exists instead of the old procedurally drawn one.
This patch was sent upstream but so far no decision on its inclusion was
made, so commit it to our fork.
Signed-off-by: Sebastian Schuberth <sschuberth@gmail.com>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Assumes file names in git tree objects are UTF-8 encoded.
On most unix systems, the system encoding (and thus the TCL system
encoding) will be UTF-8, so file names will be displayed correctly.
On Windows, it is impossible to set the system encoding to UTF-8.
Changing the TCL system encoding (via 'encoding system ...', e.g. in the
startup code) is explicitly discouraged by the TCL docs.
Change gitk functions dealing with file names to always convert
from and to UTF-8.
Signed-off-by: Karsten Blees <blees@dcon.de>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Just like CVE-2022-41953 for Git GUI, there exists a vulnerability of
`gitk` where it looks for `taskkill.exe` in the current directory before
searching `PATH`.
Note that the many `exec git` calls are unaffected, due to an obscure
quirk in Tcl's `exec` function. Typically, `git.exe` lives next to
`wish.exe` (i.e. the program that is run to execute `gitk` or Git GUI)
in Git for Windows, and that is the saving grace for `git.exe because
`exec` searches the directory where `wish.exe` lives even before the
current directory, according to
https://www.tcl-lang.org/man/tcl/TclCmd/exec.htm#M24:
If a directory name was not specified as part of the application
name, the following directories are automatically searched in
order when attempting to locate the application:
The directory from which the Tcl executable was loaded.
The current directory.
The Windows 32-bit system directory.
The Windows home directory.
The directories listed in the path.
The same is not true, however, for `taskkill.exe`: it lives in the
Windows system directory (never mind the 32-bit, Tcl's documentation is
outdated on that point, it really means `C:\Windows\system32`).
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
`git instaweb -d python` should bind the server to 0.0.0.0, while
`git instaweb -d python -l` should bind the server to 127.0.0.1.
The code had them backwards by mistake since 2eb14bb2d4
(git-instaweb: add Python builtin http.server support, 2019-01-28).
Signed-off-by: Alecs King <alecsk@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
- Switch the synopsis to a 'synopsis' block which will
automatically format placeholders in italics and keywords in
monospace
- Use _<placeholder>_ instead of <placeholder> in the description
- Use backticks for keywords and more complex option
descriptions. The new rendering engine will apply synopsis rules to
these spans.
While at it, also convert an option description to imperative mood.
Signed-off-by: Jean-Noël Avila <jn.avila@free.fr>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
- Switch the synopsis to a synopsis block which will automatically
format placeholders in italics and keywords in monospace
- Use _<placeholder>_ instead of <placeholder> in the description
- Use `backticks` for keywords and more complex option
descriptions. The new rendering engine will apply synopsis rules to
these spans.
Signed-off-by: Jean-Noël Avila <jn.avila@free.fr>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* ps/meson-weak-sha1-build:
meson: provide a summary of configured backends
meson: wire up unsafe SHA1 backend
meson: add missing dots for build options
meson: simplify conditions for HTTPS and SHA1 dependencies
meson: require SecurityFramework when it's used as SHA1 backend
meson: deduplicate access to SHA1/SHA256 backend options
meson: consistenlty spell 'CommonCrypto'
Describe problems storing personal access tokens in git-credential-cache
and suggest alternatives.
Research suggests that many users are confused about this:
> the point of passwords is that (ideally) you memorise them [so]
> they're never stored anywhere in plain text. Yet GitHub's personal
> access token system seems to basically force you to store the token in
> plain text?
https://stackoverflow.com/questions/46645843/where-to-store-my-git-personal-access-token#comment89963004_46645843
Signed-off-by: M Hickford <mirth.hickford@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
git-credential-store saves credentials unencrypted on disk. It is the
least secure choice of credential helper. Nevertheless, it appears
several times more popular than any other credential helper [1].
Inform users about more secure alternatives.
[1] https://stackoverflow.com/questions/35942754/how-can-i-save-username-and-password-in-git
Signed-off-by: M Hickford <mirth.hickford@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Last-minute fix for a regression in "git blame --abbrev=<length>"
when insane <length> is specified; we used to correctly cap it to
the hash output length but broke it during the cycle.
* ps/build-sign-compare:
builtin/blame: fix out-of-bounds write with blank boundary commits
builtin/blame: fix out-of-bounds read with excessive `--abbrev`
Support for Azure Pipelines has been retired in 6081d3898f (ci: retire
the Azure Pipelines definition, 2020-04-11) in favor of GitHub Actions.
Our CI library still has some infrastructure left for Azure though that
is now unused. Remove it.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Both GitHub Actions and GitLab CI use the "ubuntu:latest" tag as the
default image for most jobs. This tag is somewhat misleading though, as
it does not refer to the latest release of Ubuntu, but to the latest LTS
release thereof. But as we already have a couple of jobs exercising the
oldest LTS release of Ubuntu that Git still supports, it would make more
sense to test the oldest and youngest versions of Ubuntu.
Adapt these jobs to instead use the "ubuntu:rolling" tag, which refers
to the actual latest release, which currently is Ubuntu 24.10.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
With c85bcb5de1 (gitlab-ci: switch from Ubuntu 16.04 to 20.04,
2024-10-31) we have adapted the last CI job to stop using Ubuntu 16.04
in favor of Ubuntu 20.04. Remove the special-casing we still have in our
CI scripts.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add another job to GitLab CI that tests against the i386 architecture.
This job is equivalent to the same job in GitHub Workflows.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The "linux-old" job was historically testing against the oldest
supported LTS release of Ubuntu. But with c85bcb5de1 (gitlab-ci: switch
from Ubuntu 16.04 to 20.04, 2024-10-31) it has been converted to test
against Ubuntu 20.04, which already gets exercised in a couple of other
CI jobs. It's thus not adding any significant test coverage.
Drop the job.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We explicitly list the distro of Linux-based jobs, but it is equivalent
to the name of the image in almost all cases, except that colons are
replaced with dashes. Drop the redundant information and massage it in
our CI scripts, which is equivalent to how we do it in GitLab CI.
There are a couple of exceptions:
- The "linux32" job, whose distro name is different than the image
name. This is handled by adapting all sites to use the new name.
- The "alpine" and "fedora" jobs, neither of which specify a tag for
their image. This is handled by adding the "latest" tag.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We have split the CI jobs in GitHub Workflows into two categories:
- Those running on a machine pool directly.
- Those running in a container on the machine pool.
The latter is more flexible because it allows us to freely pick whatever
container image we want to use for a specific job, while the former only
allows us to pick from a handful of different distros. The containerized
jobs do not have any significant downsides to the best of my knowledge:
- They aren't significantly slower to start up. A quick comparison by
Peff shows that the difference is mostly lost in the noise:
job | old | new
--------------------|------|------
linux-TEST-vars 11m30s 10m54s
linux-asan-ubsan 30m26s 31m14s
linux-gcc 9m47s 10m6s
linux-gcc-default 9m47s 9m41s
linux-leaks 25m50s 25m21s
linux-meson 10m36s 10m41s
linux-reftable 10m25s 10m23s
linux-reftable-leaks 27m18s 27m28s
linux-sha256 9m54s 10m31s
Some jobs are a bit faster, some are a bit slower, but there does
not seem to be any significant change.
- Containerized jobs run as root, which keeps a couple of tests from
running. This has been addressed in the preceding commit though,
where we now use setpriv(1) to run tests as a separate user.
- GitHub injects a Node binary into containerized jobs, which is
dynamically linked. This has led to some issues in the past [1], but
only for our 32 bit jobs. The issues have since been resolved.
Overall there seem to be no downsides, but the upside is that we have
more control over the exact image that these jobs use. Convert the Linux
jobs accordingly.
[1]: https://lore.kernel.org/git/20240912094841.GD589828@coredump.intra.peff.net/
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The containerized jobs in GitHub Actions run as root, giving them
special permissions to for example delete files even when the user
shouldn't be able to due to file permissions. This limitation keeps us
from using containerized jobs for most of our Ubuntu-based jobs as it
causes a number of tests to fail.
Adapt the jobs to create a separate user that executes the test suite.
This follows similar infrastructure that we already have in GitLab CI.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
One test in t7422 asserts that `git submodule status --recursive`
properly handles SIGPIPE. This test is flaky though and may sometimes
not see a SIGPIPE at all:
expecting success of 7422.18 'git submodule status --recursive propagates SIGPIPE':
{ git submodule status --recursive 2>err; echo $?>status; } |
grep -q X/S &&
test_must_be_empty err &&
test_match_signal 13 "$(cat status)"
++ git submodule status --recursive
++ grep -q X/S
++ echo 0
++ test_must_be_empty err
++ test 1 -ne 1
++ test_path_is_file err
++ test 1 -ne 1
++ test -f err
++ test -s err
+++ cat status
++ test_match_signal 13 0
++ test 0 = 141
++ test 0 = 269
++ return 1
error: last command exited with $?=1
not ok 18 - git submodule status --recursive propagates SIGPIPE
The issue is caused by a race between git-submodule(1) and grep(1):
1. git-submodule(1) (or its child process) writes the first X/S line
we're trying to match.
2. grep(1) matches the line.
3a. grep(1) exits, closing the pipe.
3b. git-submodule(1) (or its child process) writes the rest of its
lines.
Steps 3a and 3b happen at the same time without any guarantees. If 3a
happens first, we get SIGPIPE. Otherwise, we don't and the test fails.
Fix the issue by generating a couple thousand nested submodules and
matching on the first nested submodule. This ensures that the recursive
git-submodule(1) process completely fills its stdout buffer, which makes
subsequent writes block until the downstream consumer of the pipe either
reads more or closes it.
To verify that this works as expected one can apply the following patch
to the preimage of this commit, which used to reliably trigger the race:
diff --git a/t/t7422-submodule-output.sh b/t/t7422-submodule-output.sh
index 3c5177cc30..df6001f8a0 100755
--- a/t/t7422-submodule-output.sh
+++ b/t/t7422-submodule-output.sh
@@ -202,7 +202,7 @@ test_expect_success !MINGW 'git submodule status --recursive propagates SIGPIPE'
cd repo &&
GIT_ALLOW_PROTOCOL=file git submodule add "$(pwd)"/../submodule &&
{ git submodule status --recursive 2>err; echo $?>status; } |
- grep -q recursive-submodule-path-1 &&
+ { sleep 1 && grep -q recursive-submodule-path-1 && sleep 1; } &&
test_must_be_empty err &&
test_match_signal 13 "$(cat status)"
)
With the pipe-stuffing workaround the test runs successfully.
Helped-by: Jeff King <peff@peff.net>
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Two of our tests in t0060 verify that the runtime prefix functionality
works as expected by creating a separate directory hierarchy, copying
the Git executable in there and then creating scripts relative to that
executable.
These tests fail quite regularly in GitLab CI with the following error:
expecting success of 0060.218 '%(prefix)/ works':
mkdir -p pretend/bin &&
cp "$GIT_EXEC_PATH"/git$X pretend/bin/ &&
git config yes.path "%(prefix)/yes" &&
GIT_EXEC_PATH= ./pretend/bin/git config --path yes.path >actual &&
echo "$(pwd)/pretend/yes" >expect &&
test_cmp expect actual
++ mkdir -p pretend/bin
++ cp /c/GitLab-Runner/builds/gitlab-org/git/git.exe pretend/bin/
cp: cannot create regular file 'pretend/bin/git.exe': Device or resource busy
error: last command exited with $?=1
not ok 218 - %(prefix)/ works
Seemingly, the "git.exe" binary we are trying to overwrite is still
being held open. It is somewhat puzzling why exactly that is: while the
preceding test _does_ write to and execute the same path, it should have
exited and shouldn't keep any backgrounded processes around. So it must
be held open by something else, either in MinGW or in Windows itself.
While the root cause is puzzling, the workaround is trivial enough:
instead of writing the file twice we simply pull the common setup into a
separate test case so that we won't observe EBUSY in the first place.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"Why would one want to run it in parallel?" I hear you ask. I am glad
you are curious, because a curious story is what it is, indeed.
The `GIT-VERSION-GEN` script is quite a pillar of Git's source code,
with most lines being unchanged for the past 15 years. Until the v2.48.0
release candidate cycle.
Its original purpose was to generate the version string and store it in
the `GIT-VERSION-FILE`.
This paradigm changed quite dramatically when support for building with
Meson was introduced. Most crucially, a38edab7c88b (Makefile: generate
doc versions via GIT-VERSION-GEN, 2024-12-06) changed the way the
documentation is built by using the `GIT-VERSION-GEN` file to write out
the `asciidocor-extensions.rb` and `asciidoc.conf` files with now
hard-coded version strings.
Crucially, the Makefile rule to generate those files needs to be run in
every build because `GIT_VERSION` could have been specified in the
`make` command-line, which would require these files to be modified.
This introduced a surprising race condition!
And this is how that race surfaces: When calling `make -j2 html man`
from the top-level directory (a variant of which is invoked in Git for
Windows' release process), two sub-processes are spawned, a `make -C
Documentation html` one and a `make -C Documentation man` one. Both run
the rule to (re-)generate `asciidoctor-extensions.rb` or
`asciidoc.conf`, invoking `GIT-VERSION-GEN` to do so. That script first
generates a temporary file (appending the `+` character to the
filename), then looks whether it contains something different than the
already existing file (if it exists, that is), and either replaces it if
needed, or removes the temporary file. If one of the two parallel
invocations removes that temporary file before the other can compare it,
or even worse: if one tries to replace the target file just after the
other _started_ writing the temporary file (but did not finish writing
it yet), that race condition now causes bad builds.
This may sound highly theoretical, but due to the design of Git's build
process, Git for Windows is forced to use a (slow) POSIX emulation layer
to run that script and in the blink of an eye it becomes very much not
theoretical at all. See Exhibit A: These GitHub workflow runs failed
because one of the two competing `make` processes tried to remove the
temporary file when the other process had already done so:
https://github.com/git-for-windows/git-sdk-32/actions/runs/12663456654https://github.com/git-for-windows/git-sdk-32/actions/runs/12683174970https://github.com/git-for-windows/git-sdk-64/actions/runs/12649348496
While it is undesirable to run this script over and over again,
certainly when this involves above-mentioned slow POSIX emulation layer,
the stage of the release cycle in which we are presently finding
ourselves does not lend itself to a re-design where this script could be
run once, and once only, but instead dictates that a quick and reliable
work-around be implemented that prevents the race condition without
changing the overall architecture of the build process.
This patch does that: By using a filename suffix for the temporary file
which is based on the currently-executing script's process ID, We
guarantee that the two competing invocations cannot overwrite or remove
each others' temporary files.
The filename suffix still ends in `+` to ensure that the temporary
artifacts are matched by the `*+` pattern in `.gitignore` that was added
in f9bbaa384ef (Add intermediate build products to .gitignore,
2009-11-08).
Helped-by: Martin Ågren <martin.agren@gmail.com>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When passing the `-b` flag to git-blame(1), then any blamed boundary
commits which were marked as uninteresting will not get their actual
commit ID printed, but will instead be replaced by a couple of spaces.
The flag can lead to an out-of-bounds write as though when combined with
`--abbrev=` when the abbreviation length is longer than `GIT_MAX_HEXSZ`
as we simply use memset(3p) on that array with the user-provided length
directly. The result is most likely that we segfault.
An obvious fix would be to cull `length` to `GIT_MAX_HEXSZ` many bytes.
But when the underlying object ID is SHA1, and if the abbreviated length
exceeds the SHA1 length, it would cause us to print more bytes than
desired, and the result would be misaligned.
Instead, fix the bug by computing the length via strlen(3p). This makes
us write as many bytes as the formatted object ID requires and thus
effectively limits the length of what we may end up printing to the
length of its hash. If `--abbrev=` asks us to abbreviate to something
shorter than the full length of the underlying hash function it would be
handled by the call to printf(3p) correctly.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In 6411a0a896 (builtin/blame: fix type of `length` variable when
emitting object ID, 2024-12-06) we have fixed the type of the `length`
variable. In order to avoid a cast from `size_t` to `int` in the call to
printf(3p) with the "%.*s" formatter we have converted the code to
instead use fwrite(3p), which accepts the length as a `size_t`.
It was reported though that this makes us read over the end of the OID
array when the provided `--abbrev=` length exceeds the length of the
object ID. This is because fwrite(3p) of course doesn't stop when it
sees a NUL byte, whereas printf(3p) does.
Fix the bug by reverting back to printf(3p) and culling the provided
length to `GIT_MAX_HEXSZ` to keep it from overflowing when cast to an
`int`.
Reported-by: Johannes Schindelin <Johannes.Schindelin@gmx.de>
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Previously, credential-cache populated authtype regardless whether
"get" request had authtype capability. As documented in
git-credential.txt, authtype "should not be sent unless the appropriate
capability ... is provided".
Add test. Without this change, the test failed because "credential fill"
printed an incomplete credential with only protocol and host attributes
(the unexpected authtype attribute was discarded by credential.c).
Signed-off-by: M Hickford <mirth.hickford@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The ll_diff_tree_paths() function and its helpers all take a pointer to
a list tail, possibly add to it, and then return the new tail. This
works but has two downsides:
- The top-level caller (diff_tree_paths() in this case) has to make a
fake combine_diff_path struct to act as the list head. This is
especially weird here, as it's a flexible-sized struct which will
have an empty FLEX_ARRAY field. That used to be a portability
problem, though these days it is legal because our FLEX_ARRAY macro
over-allocates if necessary. It's still kind of ugly, though.
- Besides the name "tail", it's not immediately obvious that the entry
we pass around will not be examined by each function. Using a
pointer-to-pointer or similar makes it more obvious we only care
about the pointer itself, not its contents.
We can solve both by passing around a pointer to the tail instead. That
gets rid of the return value entirely, though note that because of the
recursion we actually need a three-star pointer for this to work.
The result is fairly readable, as we only need to dereference the tail
in one spot. If we wanted to make it simpler we could wrap the tail in a
struct, which we pass around.
Another option is to convert combine_diff to use our generic list_head
API. I tried that and found the result became much harder to read
overall. It means that _all_ code that looks at combine_diff_path
structs needs to be modified, since the "next" pointer is now inside a
list_head which has to be dereferenced with list_entry(). And we lose
some type safety, since we're just passing around a list_head struct
everywhere, and everybody who looks at it has to specify the type to
list_entry themselves.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In emit_path() we may append a new combine_diff_path entry to our list,
decide that we don't want it (because opt->pathchange() told us so) and
then roll it back.
Between the addition and the rollback, it doesn't matter if it's in the
list or not (no functions can even tell, since it's a singly-linked list
and we pass around just the tail entry).
So it's much simpler to just wait until opt->pathchange() tells us
whether to keep it, and either attach it (or free it) then. We do still
have to allocate it up front since it's that struct itself which is
passed to the pathchange callback.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The ll_diff_tree_paths() function and its helpers all append to a
running list by taking in a pointer to the old tail and returning the
new tail. But they just call this argument "p", which is not very
descriptive.
It gets particularly confusing in emit_path(), where we actually add to
the list, because "p" does double-duty: it is the tail of the list, but
it is also the entry which we add. Except that in some cases we _don't_
add a new entry (or we might even add it and roll it back) if the path
isn't interesting. At first glance, this makes it look like a bug that
we pass "p" on to ll_diff_tree_paths() to recurse; sometimes it is
getting the new entry we made and sometimes not!
But it's not a bug, because ll_diff_tree_paths() does not care about the
entry itself at all. It is only using its "next" pointer as the tail of
the list.
Let's swap out "p" for "tail" to make this obvious. And then in
emit_path() we'll continue to use "p" for our newly allocated entry.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The internals of the path diffing code, including ll_diff_tree_paths(),
all take an extra combine_diff_path parameter which they use as the tail
of a list of results, appending any new entries to it.
The public-facing diff_tree_paths() takes the same argument, but it just
makes the callers more awkward. They always start with a clean list, and
have to set up a fake head struct to pass in.
Let's keep the public API clean by always returning a new list. That
keeps the fake struct as an implementation detail of tree-diff.c.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We want callers to use combine_diff_path_new() to allocate structs,
rather than using combine_diff_path_size() and xmalloc(). That gives us
more consistency over the initialization of the fields.
Now that the final external user of combine_diff_path_size() is gone, we
can stop declaring it publicly. And since our constructor is the only
caller, we can just inline it there.
Breaking the size computation into two parts also lets us reuse the
intermediate multiplication result of the parent length, since we need
to know it to perform our memset(). The result is a little easier to
read.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Our path_appendnew() has been simplified to the point that it is mostly
just implementing combine_diff_path_new(), plus setting the "next"
pointer. Since there's only one caller, let's replace it completely with
a call to that helper function.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When diffing trees, we'll have a strbuf "base" containing the
slash-separted names of our parent trees, and a "path" string
representing an entry name from the current tree. We pass these
separately to path_appendnew(), which combines them to form a single
path string in the combine_diff_path struct.
Instead, let's append the path string to our base strbuf ourselves, pass
in the result, and then roll it back with strbuf_setlen(). This lets us
simplify path_appendnew() a bit, enabling further refactoring.
And while it might seem like this causes extra wasted allocations, it
does not in practice. We reuse the same strbuf for each tree entry, so
we only have to allocate it to match the largest name. Plus, in a
recursive diff we'll end up doing this same operation to extend the base
for the next level of recursion. So we're really just incurring a small
memcpy().
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When we're diffing trees, we create a list of combine_diff_path structs
that represent changed paths. We allocate each struct and add it to the
list with path_appendnew(), which we then feed to opt->pathchange().
That function tells us whether the path is of interest or not; if not,
then we can throw away the struct we allocated.
So there's an optimization to avoid extra allocations: instead of
throwing away the new entry, we try to reuse it. If it was large enough
to store the next path we care about, we can do so. And if not, we fall
back to freeing and re-allocating a new struct.
This comes from 72441af7c4 (tree-diff: rework diff_tree() to generate
diffs for multiparent cases as well, 2014-04-07), where the goal was to
have even the 2-parent diff code use the combine-diff infrastructure,
but without taking a performance hit.
The implementation causes some complexities in the interface (as we
store the allocation length inside the "next" pointer), and prevents us
from using the regular combine_diff_path_new() constructor. The
complexity is mostly contained inside two functions, but it's worth
re-evaluating how much it's helping.
That commit claims it helps ~1% on generating two-parent diffs in
linux.git. Here are the timings I get on the same command today ("old"
is the current tip of master, and "new" has this patch applied):
Benchmark 1: ./git.old log --raw --no-abbrev --no-renames v3.10..v3.11
Time (mean ± σ): 532.9 ms ± 5.8 ms [User: 472.7 ms, System: 59.6 ms]
Range (min … max): 525.9 ms … 543.3 ms 10 runs
Benchmark 2: ./git.new log --raw --no-abbrev --no-renames v3.10..v3.11
Time (mean ± σ): 538.3 ms ± 5.7 ms [User: 478.0 ms, System: 59.7 ms]
Range (min … max): 528.5 ms … 545.3 ms 10 runs
Summary
./git.old log --raw --no-abbrev --no-renames v3.10..v3.11 ran
1.01 ± 0.02 times faster than ./git.new log --raw --no-abbrev --no-renames v3.10..v3.11
So we do end up on average 1% faster, but with 2% of noise. I tried to
focus more on diff performance by running the commit traversal
separately, like:
git rev-list v3.10..v3.11 >in
and then timing just the diffs:
Benchmark 1: ./git.old diff-tree --stdin -r <in
Time (mean ± σ): 415.7 ms ± 5.8 ms [User: 357.7 ms, System: 58.0 ms]
Range (min … max): 410.9 ms … 430.3 ms 10 runs
Benchmark 2: ./git.new diff-tree --stdin -r <in
Time (mean ± σ): 418.5 ms ± 2.1 ms [User: 361.7 ms, System: 56.6 ms]
Range (min … max): 414.9 ms … 421.3 ms 10 runs
Summary
./git.old diff-tree --stdin -r <in ran
1.01 ± 0.02 times faster than ./git.new diff-tree --stdin -r <in
That gets roughly the same result.
Adding in "-c" to do multi-parent diffs doesn't change much:
Benchmark 1: ./git.old diff-tree --stdin -r -c <in
Time (mean ± σ): 525.3 ms ± 6.6 ms [User: 470.0 ms, System: 55.1 ms]
Range (min … max): 508.4 ms … 531.0 ms 10 runs
Benchmark 2: ./git.new diff-tree --stdin -r -c <in
Time (mean ± σ): 532.3 ms ± 6.2 ms [User: 469.0 ms, System: 63.1 ms]
Range (min … max): 520.3 ms … 539.4 ms 10 runs
Summary
./git.old diff-tree --stdin -r -c <in ran
1.01 ± 0.02 times faster than ./git.new diff-tree --stdin -r -c <in
And of course if you add in a lot more work by doing actual
content-level diffs, any difference is lost entirely (here the newer
version is actually faster, but that's really just noise):
Benchmark 1: ./git.old diff-tree --stdin -r --cc <in
Time (mean ± σ): 11.571 s ± 0.064 s [User: 11.287 s, System: 0.283 s]
Range (min … max): 11.497 s … 11.615 s 3 runs
Benchmark 2: ./git.new diff-tree --stdin -r --cc <in
Time (mean ± σ): 11.466 s ± 0.109 s [User: 11.108 s, System: 0.357 s]
Range (min … max): 11.346 s … 11.560 s 3 runs
Summary
./git.new diff-tree --stdin -r --cc <in ran
1.01 ± 0.01 times faster than ./git.old diff-tree --stdin -r --cc <in
So my conclusion is that it probably does help a little, but it's mostly
lost in the noise. I could see an argument for keeping it, as the
complexity is hidden away in functions that do not often need to be
touched. But it does make them more confusing than necessary (despite
some detailed explanations from the author of that commit; it just took
me a while to wrap my head around what was going on) and prevents
further refactoring of the combine_diff_path struct. So let's drop it.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We allocate a combine_diff_path struct with space for 5 parents. Why 5?
The history is not particularly enlightening. The allocation comes from
b4b1550315 (Don't instantiate structures with FAMs., 2006-06-18), which
just switched to xmalloc from a stack struct with 5 elements. That
struct changed to 5 from 4 in 2454c962fb (combine-diff: show mode
changes as well., 2006-02-06), when we also moved from storing raw sha1
bytes to the combine_diff_parent struct. But no explanation is given.
That 4 comes from the earliest code in ea726d02e9 (diff-files: -c and
--cc options., 2006-01-28).
One might guess it is for the 4 stages we can store in the index. But
this code path only ever diffs the current state against stages 2 and 3.
So we only need two slots.
And it's easy to see this is still the case. We fill the parent slots by
subtracting 2 from the ce_stage() values, ignoring values below 2. And
since ce_stage() is only 2 bits, there are 4 values, and thus we need 2
slots.
Let's use the correct value (saving a tiny bit of memory) and add a
comment explaining what's going on (saving a tiny bit of programmer
brain power).
Arguably we could use:
1 + (STAGEMASK >> STAGESHIFT) - 2
which lets the compiler enforce that we will not go out-of-bounds if we
see an unexpected value from ce_stage(). But that is more confusing to
explain, and the constant "2" is baked into other parts of the function.
It is a fundamental constant, not something where somebody might bump a
macro and forget to update this code.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We only fill in the per-parent "path" field when it differs from what's
in combine_diff_path.path (and even then only when the option is
appropriate). Let's document that.
Suggested-by: Wink Saville <wink@saville.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Commit d76ce4f734 (log,diff-tree: add --combined-all-paths option,
2019-02-07) added a "path" field to each combine_diff_parent struct.
It's defined as a strbuf, but this is overkill. We never manipulate the
buffer beyond inserting a single string into it.
And in fact there's a small bug: we zero the parent structs, including
the path strbufs. For the 0th parent, we strbuf_init() the strbuf before
adding to it. But for subsequent parents, we never do the init. This is
technically violating the strbuf API, though the code there is resilient
enough to handle this zero'd state.
This patch switches us to just store an allocated string pointer.
Zeroing it is enough to properly initialize it there (modulo the usual
assumption we make that a NULL pointer is all-zeroes).
And as a bonus, we can just check for a non-NULL value to see if it is
present, rather than repeating the combined_all_paths logic at each
site.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
All of the other functions which allocate a combine_diff_path struct
zero out the parent array, but this code path does not. There's no bug,
since our caller will fill in most of the fields. But leaving the unused
fields (like combine_diff_parent.path) uninitialized makes working with
the struct more error-prone than it needs to be.
Let's just zero the parent field to be consistent with the
combine_diff_path_new() allocator.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The combine_diff_path struct has variable size, since it embeds both the
memory allocation for the path field as well as a variable-sized parent
array. This makes allocating one a bit tricky.
We have a helper to compute the required size, but it's up to individual
sites to actually initialize all of the fields. Let's provide a
constructor function to make that a little nicer. Besides being shorter,
it also hides away tricky bits like the computation of the "path"
pointer (which is right after the "parent" flex array).
As a bonus, using the same constructor everywhere means that we'll
consistently initialize all parts of the struct. A few code paths left
the parent array unitialized. This didn't cause any bugs, but we'll be
able to simplify some code in the next few patches knowing that the
parent fields have all been zero'd.
This also gets rid of some questionable uses of "int" to store buffer
lengths. Though we do use them to allocate, I don't think there are any
integer overflow vulnerabilities here (the allocation helper promotes
them to size_t and checks arithmetic for overflow, and the actual memcpy
of the bytes is done using the possibly-truncated "int" value).
Sadly we can't use the FLEX_* macros to simplify the allocation here,
because there are two variable-sized parts to the struct (and those
macros only handle one).
Nor can we get stop publicly declaring combine_diff_path_size(). This
patch does not touch the code in path_appendnew() at all, which is not
ready to be moved to our new constructor for a few reasons:
- path_appendnew() has a memory-reuse optimization where it tries to
reuse combine_diff_path structs rather than freeing and
reallocating.
- path_appendnew() does not create the struct from a single path
string, but rather allocates and copies into the buffer from
multiple sources.
These can be addressed by some refactoring, but let's leave it as-is for
now.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
While looping over the index entries, when we see a higher level stage
the first thing we do is allocate a combine_diff_path struct for it. But
this can leak; if check_removed() returns an error, we'll continue to
the next iteration of the loop without cleaning up.
We can fix this by just delaying the allocation by a few lines.
I don't think this leak is triggered in the test suite, but it's pretty
easy to see by inspection. My ulterior motive here is that the delayed
allocation means we have all of the data needed to initialize "dpath" at
the time of malloc, making it easier to factor out a constructor
function.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2a9dfdf260 (difftool docs: de-duplicate configuration sections, 2022-09-07)
moved the difftool documentation, but missed moving this "include" line that
includes the generated list of diff tools, as referenced in the moved text.
Restore the correct position of the included list.
Signed-off-by: Adam Johnson <me@adamj.eu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Adapt the hash test functions to clar framework by using clar
assertions where necessary. Following the consensus to convert
the unit-tests scripts found in the t/unit-tests folder to clar driven by
Patrick Steinhardt. Test functions are structured as a standalone to
test individual hash string and literal case.
Mentored-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Seyi Kuforiji <kuforiji98@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The build procedure in "meson" for the "perl/" hierarchy lacked
necessary dependencies, which has been corrected.
* sj/meson-perl-build-fix:
meson: fix perl dependencies
As indicated by the `#undef malloc` line in `reftable/basics.h`, it is
quite common to use allocators other than the default one by defining
`malloc` constants and friends.
This pattern is used e.g. in Git for Windows, which uses the powerful
and performant `mimalloc` allocator.
Furthermore, in `reftable/basics.c` this `#undef malloc` is
_specifically_ disabled by virtue of defining the
`REFTABLE_ALLOW_BANNED_ALLOCATORS` constant before including
`reftable/basic.h`, to ensure that such a custom allocator is also used
in the reftable code.
However, in 8db127d43f5b (reftable: avoid leaks on realloc error,
2024-12-28) and in 2cca185e8517 (reftable: fix allocation count on
realloc error, 2024-12-28), `reftable_set_alloc()` function calls were
introduced that pass `malloc`, `realloc` and `free` function pointers as
parameters _after_ `reftable/basics.h` ensured that they were no longer
`#define`d. This would override the custom allocator and re-set it to
the default allocator provided by, say, libc or MSVCRT.
This causes problems because those calls happen after the initial
allocator has already been used to initialize an array, which is
subsequently resized using the overridden default `realloc()` allocator.
You cannot mix and match allocators like that, which leads to a
`STATUS_HEAP_CORRUPTION` (C0000374) on Windows, and when running this
unit test through shell and/or `prove` (which only support 7-bit status
codes), it surfaces as exit code 127.
It is actually unnecessary to use those function pointers to
`malloc`/`realloc`/`free`, though: The `reftable` code goes out of its
way to fall back to the initial allocator when passing `NULL` parameters
instead. So let's do that instead of causing heap corruptions.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Acked-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
`generate_perl_command` needs `depends: [git_version_file]` and the uses
in top-level meson.build were fine, but the ones in perl/ weren't, causing
parallel build failures in some cases as GIT-BUILD-OPTIONS wasn't yet
available.
Signed-off-by: Sam James <sam@gentoo.org>
Acked-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Following the `CodingGuidlines`, since these placeholders are literal
they should be typeset verbatim, so fix some that aren't.
Signed-off-by: Matthew Hughes <matthewhughes934@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Correct verb tense, add missing words, avoid double blank lines,
and rephrase things that don’t read well to me like “Turn this linkage
to relative paths”.
Signed-off-by: Kristoffer Haugsbakk <code@khaugsbakk.name>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In Git, fsck operations can ignore known broken objects via the
`fsck.skipList` configuration. This option expects a path to a file with
the list of object names. When the configuration is specified without a
path, an error message is printed, but the command continues as if the
configuration was not set. Configuring `fsck.skipList` without a value
is a misconfiguration so config parsing should be more strict and reject
it.
Update `git_fsck_config()` to no longer ignore misconfiguration of
`fsck.skipList`. The same behavior is also present for
`fetch.fsck.skipList` and `receive.fsck.skipList` so the configuration
parsers for these are updated to ensure the related operations remain
consistent.
Signed-off-by: Justin Tobler <jltobler@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The reftable library uses randomness in two call paths:
- When reading a stack in case some of the referenced tables
disappears. The randomness is used to delay the next read by a
couple of milliseconds.
- When writing a new table, where the randomness gets appended to the
table name (e.g. "0x000000000001-0x000000000002-0b1d8ddf.ref").
In neither of these cases do we need strong randomness.
Unfortunately though, we have observed test failures caused by the
former case. In t0610 we have a test that spawns a 100 processes at
once, all of which try to write a new table to the stack. And given that
all of the processes will require randomness, it can happen that these
processes make the entropy pool run dry, which will then cause us to
die:
+ test_seq 100
+ printf %s commit\trefs/heads/branch-%s\n
68d032e9edd3481ac96382786ececc37ec28709e 1
+ printf %s commit\trefs/heads/branch-%s\n
68d032e9edd3481ac96382786ececc37ec28709e 2
...
+ git update-ref refs/heads/branch-98 HEAD
+ git update-ref refs/heads/branch-97 HEAD
+ git update-ref refs/heads/branch-99 HEAD
+ git update-ref refs/heads/branch-100 HEAD
fatal: unable to get random bytes
fatal: unable to get random bytes
fatal: unable to get random bytes
fatal: unable to get random bytes
fatal: unable to get random bytes
fatal: unable to get random bytes
fatal: unable to get random bytes
The report was for NonStop, which uses OpenSSL as the backend for
randomness. In the preceding commit we have adapted that backend to also
return randomness in case the entropy pool is empty and the caller
passes the `CSPRNG_BYTES_INSECURE` flag. Do so to fix the issue.
Reported-by: Randall S. Becker <rsbecker@nexbridge.com>
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The `csprng_bytes()` function generates randomness and writes it into a
caller-provided buffer. It abstracts over a couple of implementations,
where the exact one that is used depends on the platform.
These implementations have different guarantees: while some guarantee to
never fail (arc4random(3)), others may fail. There are two significant
failures to distinguish from one another:
- Systemic failure, where e.g. opening "/dev/urandom" fails or when
OpenSSL doesn't have a provider configured.
- Entropy failure, where the entropy pool is exhausted, and thus the
function cannot guarantee strong cryptographic randomness.
While we cannot do anything about the former, the latter failure can be
acceptable in some situations where we don't care whether or not the
randomness can be predicted.
Introduce a new `CSPRNG_BYTES_INSECURE` flag that allows callers to opt
into weak cryptographic randomness. The exact behaviour of the flag
depends on the underlying implementation:
- `arc4random_buf()` never returns an error, so it doesn't change.
- `getrandom()` pulls from "/dev/urandom" by default, which never
blocks on modern systems even when the entropy pool is empty.
- `getentropy()` seems to block when there is not enough randomness
available, and there is no way of changing that behaviour.
- `GtlGenRandom()` doesn't mention anything about its specific
failure mode.
- The fallback reads from "/dev/urandom", which also returns bytes in
case the entropy pool is drained in modern Linux systems.
That only leaves OpenSSL with `RAND_bytes()`, which returns an error in
case the returned data wouldn't be cryptographically safe. This function
is replaced with a call to `RAND_pseudo_bytes()`, which can indicate
whether or not the returned data is cryptographically secure via its
return value. If it is insecure, and if the `CSPRNG_BYTES_INSECURE` flag
is set, then we ignore the insecurity and return the data regardless.
It is somewhat questionable whether we really need the flag in the first
place, or whether we wouldn't just ignore the potentially-insecure data.
But the risk of doing that is that we might have or grow callsites that
aren't aware of the potential insecureness of the data in places where
it really matters. So using a flag to opt-in to that behaviour feels
like the more secure choice.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
There are a few grep calls here that can benefit from test_grep, which
produces more user-friendly output when it fails.
One of these calls also passes "-sq", which is curious. The "-q" option
suppresses the matched output. But test output is either already
redirected to /dev/null in non-verbose mode, and in verbose mode it's
better to see the output. The "-s" option suppresses errors opening
files, but we are just grepping in the "expected" file we just
generated, so it should not be needed. Neither of these was really
hurting anything, but they are not a style we'd like to see emulated. So
get rid of them.
(It is also curious to grep in the expected file in the first place, but
that is because we are auto-generating the expectation from a Git
command. So this is double-checking it did what we wanted).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Commit b119a687d4 (test-lib: ignore leaks in the sanitizer's thread
code, 2025-01-01) added code to suppress a false positive in the leak
checker. But if you're just reading the code, the obscure grep call is a
bit of a head-scratcher. Let's add a brief comment explaining what's
going on (and anybody digging further can find this commit or that one
for all the details).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We want to know if there are any leaks logged by LSan in the results
directory, so we run "find" on the containing directory and pipe it to
xargs. We can accomplish the same thing by just globbing in the shell
and passing the result to grep, which has a few advantages:
- it's one fewer process to run
- we can glob on the TEST_RESULTS_SAN_FILE pattern, which is what we
checked at the beginning of the function, and is the same glob used
to show the logs in check_test_results_san_file_
- this correctly handles the case where TEST_OUTPUT_DIRECTORY has a
space in it. For example doing:
mkdir "/tmp/foo bar"
TEST_OUTPUT_DIRECTORY="/tmp/foo bar" make SANITIZE=leak test
would yield a lot of:
grep: /tmp/foo: No such file or directory
grep: bar/test-results/t0006-date.leak/trace.test-tool.582311: No such file or directory
when there are leaks. We could do the same thing with "xargs
--null", but that isn't portable.
We are now subject to command-line length limits, but that is also true
of the globbing cat used to show the logs themselves. This hasn't been a
problem in practice.
We do need to use "grep -s" for the case that the glob does not expand
(i.e., there are not any log files at all). This option is in POSIX, and
has been used in t7407 for several years without anybody complaining.
This also also naturally handles the case where the surrounding
directory has already been removed (in which case there are likewise no
files!), dropping the need to comment about it.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We have a function to check whether LSan logged any leaks. It returns
success for no leaks, and non-zero otherwise. This is the simplest thing
for its callers, who want to say "if no leaks then return early". But
because it's implemented as a shell pipeline, you end up with the
awkward:
! find ... |
xargs grep leaks |
grep -v false-positives
where the "!" is actually negating the final grep. Switch the return
value (and name) to return success when there are leaks. This should
make the code a little easier to read, and the negation in the callers
still reads pretty naturally.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Commit 1e0ee4087e (completion: add and use
__git_compute_first_level_config_vars_for_section, 2024-02-10) uses an
indirect variable syntax that is only valid for Bash, but the Zsh
completion code relies on the Bash completion code to function. Zsh
supports a different indirect variable expansion using ${(P)var}, but in
`emulate ksh` mode does not support Bash's ${!var}.
This manifests as completing strange config options like
"__git_first_level_config_vars_for_section_remote" as a choice for the
command line
git config set remote.
Using Zsh's C-x ? _complete_debug widget with the cursor at the end of
that command line captures a trace, in which we see (some details
elided):
+__git_complete_config_variable_name:7> __git_compute_first_level_config_vars_for_section remote
+__git_compute_first_level_config_vars_for_section:7> local section=remote
+__git_compute_first_level_config_vars_for_section:7> __git_compute_config_vars
+__git_compute_config_vars:7> test -n $'add.ignoreErrors\nadvice.addEmbeddedRepo\nadvice.addEmptyPathspec\nadvice.addIgnoredFile[…]'
+__git_compute_first_level_config_vars_for_section:7> local this_section=__git_first_level_config_vars_for_section_remote
+__git_compute_first_level_config_vars_for_section:7> test -n __git_first_level_config_vars_for_section_remote
+__git_complete_config_variable_name:7> local this_section=__git_first_level_config_vars_for_section_remote
+__git_complete_config_variable_name:7> __gitcomp_nl_append __git_first_level_config_vars_for_section_remote remote. '' ' '
+__gitcomp_nl_append:7> __gitcomp_nl __git_first_level_config_vars_for_section_remote remote. '' ' '
+__gitcomp_nl:7> emulate -L zsh
+__gitcomp_nl:7> compset -P '*[=:]'
+__gitcomp_nl:7> compadd -Q -S ' ' -p remote. -- __git_first_level_config_vars_for_section_remote
We perform the test for __git_compute_config_vars correctly, but the
${!this_section} references are not expanded as expected.
Instead, portably expand indirect references through the new
__git_indirect. Contrary to some versions you might find online [1],
this version avoids echo non-portabilities [2] [3] and correctly quotes
the indirect expansion after eval (so that the result is not split or
globbed before being handed to printf).
[1]: https://unix.stackexchange.com/a/41409/301073
[2]: https://askubuntu.com/questions/715765/mysterious-behavior-of-echo-command#comment1056038_715769
[3]: https://mywiki.wooledge.org/CatEchoLs
The following demo program demonstrates how this works:
b=1
indirect() {
eval printf '%s' "\"\$$1\""
}
f() {
# Comment this out to see that it works for globals, too. Or, use
# a value with spaces like '2 3 4' to see how it handles those.
local b=2
local a=b
test -n "$(indirect $a)" && echo nice
}
f
When placed in a file "demo", then both
bash -x demo
and
zsh -xc 'emulate ksh -c ". ./demo"' |& tail
provide traces showing that "$(indirect $a)" produces 2 (or 1, with the
global, or "2 3 4" as a single string, etc.).
Signed-off-by: D. Ben Knoble <ben.knoble+github@gmail.com>
Acked-by: Philippe Blain <levraiphilippeblain@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Prior to 0ad3d65652 (object-file: fix race in object collision check,
2024-12-30), callers could expect that a successful return from
`finalize_object_file()` means that either the file was moved into
place, or the identical bytes were already present. If neither of those
happens, we'd return an error.
Since that commit, if the destination file disappears between our
link(3p) call and the collision check, we'd return success without
actually checking the contents, and without retrying the link. This
solves the common case that the files were indeed the same, but it means
that we may corrupt the repository if they weren't (this implies a hash
collision, but the whole point of this function is protecting against
hash collisions).
We can't be pessimistic and assume they're different; that hurts the
common case that the mentioned commit was trying to fix. But after
seeing that the destination file went away, we can retry linking again.
Adapt the code to do so when we see that the destination file has racily
vanished. This should generally succeed as we have just observed that
the destination file does not exist anymore, except in the very unlikely
event that it gets recreated by another concurrent process again.
Helped-by: Jeff King <peff@peff.net>
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In 0ad3d65652 (object-file: fix race in object collision check,
2024-12-30) we have started to ignore ENOENT when opening either the
source or destination file of the collision check. This was done to
handle races more gracefully in case either of the potentially-colliding
disappears.
The fix is overly broad though: while the destination file may indeed
vanish racily, this shouldn't ever happen for the source file, which is
a temporary object file (either loose or in packfile format) that we
have just created. So if any concurrent process would have removed that
temporary file it would indicate an actual issue.
Stop treating ENOENT specially for the source file so that we always
bubble up this error.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Rename variables used in `check_collision()` to clearly identify which
file is the source and which is the destination. This will make the next
step easier to reason about when we start to treat those files different
from one another.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
9e2b7005be (fetch set_head: add warn-if-not-$branch option, 2024-12-05)
tried to expand the advice message for set_head with the new option, but
unfortunately did not manage to add the right incantation. Fix the
advice message with the correct usage of warn-if-not-$branch.
Reported-by: Teng Long <dyroneteng@gmail.com>
Signed-off-by: Bence Ferdinandy <bence@ferdinandy.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
`test -f` and `! test -f` do not provide clear error messages when they fail.
To enhance debuggability, use `test_path_is_file` and `test_path_is_missing`,
which instead provide more informative error messages.
Note that `! test -f` checks if a path is not a file, while
`test_path_is_missing` verifies that a path does not exist. In this specific
case the tests are meant to check the absence of the path, making
`test_path_is_missing` a valid replacement.
Signed-off-by: Matteo Bagnolini <matteobagnolini2003@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Commit 1bc1e94091 (doc: option value may be separate for valid reasons,
2024-11-25) added a paragraph discussing tilde-expansion of, e.g.,
~/directory/file.
The tilde character has a special meaning to asciidoc tools. In this
particular case, AsciiDoc matches up the two tildes in "e.g.
~/directory/file or ~u/d/f" and sets the text between them using
subscript. In the manpage, where subscripting is not possible, this
renders as "e.g. /directory/file oru/d/f".
These paths are literal values, which our coding guidelines want typeset
as verbatim using backticks. Do that. One effect of this is indeed that
the asciidoc tools stop interpreting tilde and other special characters.
Signed-off-by: Martin Ågren <martin.agren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The two-line heading added in 8525e92886 (Document HOME environment
variable, 2024-12-09) uses too many tilde characters, so the heading
isn't detected as such. Both AsciiDoc and Asciidoctor end up
misrendering this in different ways.
Use the correct number of tilde characters to fix this.
Signed-off-by: Martin Ågren <martin.agren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The build procedure based on meson learned to generate HTML
documention pages.
* ps/build-meson-html:
Documentation: wire up sanity checks for Meson
t/Makefile: make "check-meson" work with Dash
meson: install static files for HTML documentation
meson: generate articles
Documentation: refactor "howto-index.sh" for out-of-tree builds
Documentation: refactor "api-index.sh" for out-of-tree builds
meson: generate user manual
Documentation: inline user-manual.conf
meson: generate HTML pages for all man page categories
meson: fix generation of merge tools
meson: properly wire up dependencies for our docs
meson: wire up support for AsciiDoctor
CI jobs that run threaded programs under LSan has been giving false
positives from time to time, which has been worked around.
This is an alternative to the jk/lsan-race-with-barrier topic with
much smaller change to the production code.
* jk/lsan-race-ignore-false-positive:
test-lib: ignore leaks in the sanitizer's thread code
test-lib: check leak logs for presence of DEDUP_TOKEN
test-lib: simplify leak-log checking
test-lib: rely on logs to detect leaks
Revert barrier-based LSan threading race workaround
Our CI jobs sometimes see false positive leaks like this:
=================================================================
==3904583==ERROR: LeakSanitizer: detected memory leaks
Direct leak of 32 byte(s) in 1 object(s) allocated from:
#0 0x7fa790d01986 in __interceptor_realloc ../../../../src/libsanitizer/lsan/lsan_interceptors.cpp:98
#1 0x7fa790add769 in __pthread_getattr_np nptl/pthread_getattr_np.c:180
#2 0x7fa790d117c5 in __sanitizer::GetThreadStackTopAndBottom(bool, unsigned long*, unsigned long*) ../../../../src/libsanitizer/sanitizer_common/sanitizer_linux_libcdep.cpp:150
#3 0x7fa790d11957 in __sanitizer::GetThreadStackAndTls(bool, unsigned long*, unsigned long*, unsigned long*, unsigned long*) ../../../../src/libsanitizer/sanitizer_common/sanitizer_linux_libcdep.cpp:598
#4 0x7fa790d03fe8 in __lsan::ThreadStart(unsigned int, unsigned long long, __sanitizer::ThreadType) ../../../../src/libsanitizer/lsan/lsan_posix.cpp:51
#5 0x7fa790d013fd in __lsan_thread_start_func ../../../../src/libsanitizer/lsan/lsan_interceptors.cpp:440
#6 0x7fa790adc3eb in start_thread nptl/pthread_create.c:444
#7 0x7fa790b5ca5b in clone3 ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81
This is not a leak in our code, but appears to be a race between one
thread calling exit() while another one is in LSan's stack setup code.
You can reproduce it easily by running t0003 or t5309 with --stress
(these trigger it because of the threading in git-grep and index-pack
respectively).
This may be a bug in LSan, but regardless of whether it is eventually
fixed, it is useful to work around it so that we stop seeing these false
positives.
We can recognize it by the mention of the sanitizer functions in the
DEDUP_TOKEN line. With this patch, the scripts mentioned above should
run with --stress indefinitely.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When we check the leak logs, our original strategy was to check for any
non-empty log file produced by LSan. We later amended that to ignore
noisy lines in 370ef7e40d (test-lib: ignore uninteresting LSan output,
2023-08-28).
This makes it hard to ignore noise which is more than a single line;
we'd have to actually parse the file to determine the meaning of each
line.
But there's an easy line-oriented solution. Because we always pass the
dedup_token_length option, the output will contain a DEDUP_TOKEN line
for each leak that has been found. So if we invert our strategy to stop
ignoring useless lines and only look for useful ones, we can just count
the number of DEDUP_TOKEN lines. If it's non-zero, then we found at
least one leak (it would even give us a count of unique leaks, but we
really only care if it is non-zero).
This should yield the same outcome, but will help us build more false
positive detection on top.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We have a function to count the number of leaks found (actually, it is
the number of processes which produced a log file). Once upon a time we
cared about seeing if this number increased between runs. But we
simplified that away in 95c679ad86 (test-lib: stop showing old leak
logs, 2024-09-24), and now we only care if it returns any results or
not.
In preparation for refactoring it further, let's drop the counting
function entirely, and roll it into the "is it empty" check. The outcome
should be the same, but we'll be free to return a boolean "did we find
anything" without worrying about somebody adding a new call to the
counting function.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When we run with sanitizers, we set abort_on_error=1 so that the tests
themselves can detect problems directly (when the buggy program exits
with SIGABRT). This has one blind spot, though: we don't always check
the exit codes for all programs (e.g., helpers like upload-pack invoked
behind the scenes).
For ASan and UBSan this is mostly fine; they exit as soon as they see an
error, so the unexpected abort of the program causes the test to fail
anyway.
But for LSan, the program runs to completion, since we can only check
for leaks at the end. And in that case we could miss leak reports. And
thus we started checking LSan logs in faececa53f (test-lib: have the
"check" mode for SANITIZE=leak consider leak logs, 2022-07-28).
Originally the logs were optional, but logs are generated (and checked)
always as of 8c1d6691bc (test-lib: GIT_TEST_SANITIZE_LEAK_LOG enabled by
default, 2024-07-11). And we even check them for each test snippet, as
of cf1464331b (test-lib: check for leak logs after every test,
2024-09-24).
So now aborting on error is superfluous for LSan! We can get everything
we need by checking the logs. And checking the logs is actually
preferable, since it gives us more control over silencing false
positives (something we do not yet do, but will soon).
So let's tell LSan to just exit normally, even if it finds leaks. We can
do so with exitcode=0, which also suppresses the abort_on_error flag.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The extra "barrier" approach was too much code whose sole purpose
was to work around a race that is not even ours (i.e. in LSan's
teardown code).
In preparation for queuing a solution taking a much-less-invasive
approach, let's revert them.
CI jobs that run threaded programs under LSan has been giving false
positives from time to time, which has been worked around.
* jk/lsan-race-with-barrier:
grep: work around LSan threading race with barrier
index-pack: work around LSan threading race with barrier
thread-utils: introduce optional barrier type
Revert "index-pack: spawn threads atomically"
test-lib: use individual lsan dir for --stress runs
An earlier "csum-file checksum does not have to be computed with
sha1dc" topic had a few code paths that had initialized an
implementation of a hash function to be used by an unmatching hash
by mistake, which have been corrected.
* ps/weak-sha1-for-tail-sum-fix:
ci: exercise unsafe OpenSSL backend
builtin/fast-import: fix segfault with unsafe SHA1 backend
bulk-checkin: fix segfault with unsafe SHA1 backend
The custom allocator code in the reftable library did not handle
failing realloc() very well, which has been addressed.
* rs/reftable-realloc-errors:
t-reftable-merged: handle realloc errors
reftable: handle realloc error in parse_names()
reftable: fix allocation count on realloc error
reftable: avoid leaks on realloc error
i18n: expose substitution hint chars in functions and macros to
translators
For example (based on builtin/commit.c and shortened): the "--author"
option takes a name. In source this can be represented as:
OPT_STRING(0, "author", &force_author, N_("author"), N_("override author")),
When the command is run with "-h" (short help) option (git commit -h),
the above definition is displayed as:
--[no-]author <author> override author
Git does not use translated option names so the first part of the
above, "--[no-]author", is given as-is (it is based on the 2nd
argument of OPT_STRING). However the string "author" in the pair of
"<>", and the explanation "override author for commit" may be
translated into user's language.
The user's language may use a convention to mark a replaceable part of
the command line (called a "placeholder string") differently from
enclosing it inside a pair of "<>", but the implementation in
parse-options.c hardcodes "<%s>".
Allow translators to specify the presentation of a placeholder string
for their languages by overriding the "<%s>".
In case the translator's writing system is sufficiently different than
Latin the "<>" characters can be substituted by an empty string thus
effectively skipping them in the output. For example languages with
uppercase versions of characters can use that to deliniate
replaceability.
Alternatively a translator can decide to use characters that are
visually close to "<>" but are not interpreted by the shell.
Signed-off-by: Alexander Shopov <ash@kambanaria.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
There are a couple of backends from which the user can choose for HTTPS,
SHA1, its unsafe variant as well as SHA256. Provide a summary of the
configured values to make these more discoverable.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In 06c92dafb8 (Makefile: allow specifying a SHA-1 for non-cryptographic
uses, 2024-09-26), we have introduced a cryptographically-insecure
backend for SHA1 that can optionally be used in some contexts where the
processed data is not security relevant. This effort was in-flight with
the effort to introduce Meson, so we don't have an equivalent here.
Wire up a new build option that lets users pick an unsafe SHA1 backend.
Note that for simplicity's sake we have to drop the error condition
around an unhandled SHA1 backend. This should be fine though given that
Meson verifies the value for combo-options for us.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Most of our Meson build options end with a trailing dot, but those for
our SHA1 and SHA256 backends don't. Add it.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The conditions used to figure out whteher the Security framework or
OpenSSL library is required are a bit convoluted because they can be
pulled in via the HTTPS, SHA1 or SHA256 backends. Refactor them to be
easier to read.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The Security framework is required when we use CommonCrypto either as
HTTPS or SHA1 backend, but we only require it in case it is set up as
HTTPS backend. Fix this.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We've got a couple of repeated calls to `get_option()` for the SHA1 and
SHA256 backend options. While not an issue, it makes the code needlessly
verbose.
Fix this by consistently using a local variable.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The 'CommonCrypto' backend can be specified as HTTPS and SHA1 backends,
but the value that one needs to use is inconsistent across those two
build options. Unify it to 'CommonCrypto'.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In the preceding commit we have fixed a segfault when using an unsafe
SHA1 backend that is different from the safe one. This segfault only
went by unnoticed because we never set up an unsafe backend in our CI
systems. Fix this ommission by setting `OPENSSL_SHA1_UNSAFE` in our
TEST-vars job.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Same as with the preceding commit, git-fast-import(1) is using the safe
variant to initialize a hashfile checkpoint. This leads to a segfault
when passing the checkpoint into the hashfile subsystem because it would
use the unsafe variants instead:
++ git --git-dir=R/.git fast-import --big-file-threshold=1
AddressSanitizer:DEADLYSIGNAL
=================================================================
==577126==ERROR: AddressSanitizer: SEGV on unknown address 0x000000000040 (pc 0x7ffff7a01a99 bp 0x5070000009c0 sp 0x7fffffff5b30 T0)
==577126==The signal is caused by a READ memory access.
==577126==Hint: address points to the zero page.
#0 0x7ffff7a01a99 in EVP_MD_CTX_copy_ex (/nix/store/h1ydpxkw9qhjdxjpic1pdc2nirggyy6f-openssl-3.3.2/lib/libcrypto.so.3+0x201a99) (BuildId: 41746a580d39075fc85e8c8065b6c07fb34e97d4)
#1 0x555555ddde56 in openssl_SHA1_Clone ../sha1/openssl.h:40:2
#2 0x555555dce2fc in git_hash_sha1_clone_unsafe ../object-file.c:123:2
#3 0x555555c2d5f8 in hashfile_checkpoint ../csum-file.c:211:2
#4 0x5555559647d1 in stream_blob ../builtin/fast-import.c:1110:2
#5 0x55555596247b in parse_and_store_blob ../builtin/fast-import.c:2031:3
#6 0x555555967f91 in file_change_m ../builtin/fast-import.c:2408:5
#7 0x55555595d8a2 in parse_new_commit ../builtin/fast-import.c:2768:4
#8 0x55555595bb7a in cmd_fast_import ../builtin/fast-import.c:3614:4
#9 0x555555b1f493 in run_builtin ../git.c:480:11
#10 0x555555b1bfef in handle_builtin ../git.c:740:9
#11 0x555555b1e6f4 in run_argv ../git.c:807:4
#12 0x555555b1b87a in cmd_main ../git.c:947:19
#13 0x5555561649e6 in main ../common-main.c:64:11
#14 0x7ffff742a1fb in __libc_start_call_main (/nix/store/65h17wjrrlsj2rj540igylrx7fqcd6vq-glibc-2.40-36/lib/libc.so.6+0x2a1fb) (BuildId: bf320110569c8ec2425e9a0c5e4eb7e97f1fb6e4)
#15 0x7ffff742a2b8 in __libc_start_main@GLIBC_2.2.5 (/nix/store/65h17wjrrlsj2rj540igylrx7fqcd6vq-glibc-2.40-36/lib/libc.so.6+0x2a2b8) (BuildId: bf320110569c8ec2425e9a0c5e4eb7e97f1fb6e4)
#16 0x555555772c84 in _start (git+0x21ec84)
==577126==Register values:
rax = 0x0000511000000cc0 rbx = 0x0000000000000000 rcx = 0x000000000000000c rdx = 0x0000000000000000
rdi = 0x0000000000000000 rsi = 0x00005070000009c0 rbp = 0x00005070000009c0 rsp = 0x00007fffffff5b30
r8 = 0x0000000000000000 r9 = 0x0000000000000000 r10 = 0x0000000000000000 r11 = 0x00007ffff7a01a30
r12 = 0x0000000000000000 r13 = 0x00007fffffff6b60 r14 = 0x00007ffff7ffd000 r15 = 0x00005555563b9910
AddressSanitizer can not provide additional info.
SUMMARY: AddressSanitizer: SEGV (/nix/store/h1ydpxkw9qhjdxjpic1pdc2nirggyy6f-openssl-3.3.2/lib/libcrypto.so.3+0x201a99) (BuildId: 41746a580d39075fc85e8c8065b6c07fb34e97d4) in EVP_MD_CTX_copy_ex
==577126==ABORTING
./test-lib.sh: line 1039: 577126 Aborted git --git-dir=R/.git fast-import --big-file-threshold=1 < input
error: last command exited with $?=134
not ok 167 - R: blob bigger than threshold
The segfault is only exposed in case the unsafe and safe backends are
different from one another.
Fix the issue by initializing the context with the unsafe SHA1 variant.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In 1b9e9be8b4 (csum-file.c: use unsafe SHA-1 implementation when
available, 2024-09-26) we have converted our `struct hashfile` to use
the unsafe SHA1 backend, which results in a significant speedup. One
needs to be careful with how to use that structure now though because
callers need to consistently use either the safe or unsafe variants of
SHA1, as otherwise one can easily trigger corruption.
As it turns out, we have one inconsistent usage in our tree because we
directly initialize `struct hashfile_checkpoint::ctx` with the safe
variant of SHA1, but end up writing to that context with the unsafe
ones. This went unnoticed so far because our CI systems do not exercise
different hash functions for these two backends, and consequently safe
and unsafe variants are equivalent. But when using SHA1DC as safe and
OpenSSL as unsafe backend this leads to a crash an t1050:
++ git -c core.compression=0 add large1
AddressSanitizer:DEADLYSIGNAL
=================================================================
==1367==ERROR: AddressSanitizer: SEGV on unknown address 0x000000000040 (pc 0x7ffff7a01a99 bp 0x507000000db0 sp 0x7fffffff5690 T0)
==1367==The signal is caused by a READ memory access.
==1367==Hint: address points to the zero page.
#0 0x7ffff7a01a99 in EVP_MD_CTX_copy_ex (/nix/store/h1ydpxkw9qhjdxjpic1pdc2nirggyy6f-openssl-3.3.2/lib/libcrypto.so.3+0x201a99) (BuildId: 41746a580d39075fc85e8c8065b6c07fb34e97d4)
#1 0x555555ddde56 in openssl_SHA1_Clone ../sha1/openssl.h:40:2
#2 0x555555dce2fc in git_hash_sha1_clone_unsafe ../object-file.c:123:2
#3 0x555555c2d5f8 in hashfile_checkpoint ../csum-file.c:211:2
#4 0x555555b9905d in deflate_blob_to_pack ../bulk-checkin.c:286:4
#5 0x555555b98ae9 in index_blob_bulk_checkin ../bulk-checkin.c:362:15
#6 0x555555ddab62 in index_blob_stream ../object-file.c:2756:9
#7 0x555555dda420 in index_fd ../object-file.c:2778:9
#8 0x555555ddad76 in index_path ../object-file.c:2796:7
#9 0x555555e947f3 in add_to_index ../read-cache.c:771:7
#10 0x555555e954a4 in add_file_to_index ../read-cache.c:804:9
#11 0x5555558b5c39 in add_files ../builtin/add.c:355:7
#12 0x5555558b412e in cmd_add ../builtin/add.c:578:18
#13 0x555555b1f493 in run_builtin ../git.c:480:11
#14 0x555555b1bfef in handle_builtin ../git.c:740:9
#15 0x555555b1e6f4 in run_argv ../git.c:807:4
#16 0x555555b1b87a in cmd_main ../git.c:947:19
#17 0x5555561649e6 in main ../common-main.c:64:11
#18 0x7ffff742a1fb in __libc_start_call_main (/nix/store/65h17wjrrlsj2rj540igylrx7fqcd6vq-glibc-2.40-36/lib/libc.so.6+0x2a1fb) (BuildId: bf320110569c8ec2425e9a0c5e4eb7e97f1fb6e4)
#19 0x7ffff742a2b8 in __libc_start_main@GLIBC_2.2.5 (/nix/store/65h17wjrrlsj2rj540igylrx7fqcd6vq-glibc-2.40-36/lib/libc.so.6+0x2a2b8) (BuildId: bf320110569c8ec2425e9a0c5e4eb7e97f1fb6e4)
#20 0x555555772c84 in _start (git+0x21ec84)
==1367==Register values:
rax = 0x0000511000001080 rbx = 0x0000000000000000 rcx = 0x000000000000000c rdx = 0x0000000000000000
rdi = 0x0000000000000000 rsi = 0x0000507000000db0 rbp = 0x0000507000000db0 rsp = 0x00007fffffff5690
r8 = 0x0000000000000000 r9 = 0x0000000000000000 r10 = 0x0000000000000000 r11 = 0x00007ffff7a01a30
r12 = 0x0000000000000000 r13 = 0x00007fffffff6b38 r14 = 0x00007ffff7ffd000 r15 = 0x00005555563b9910
AddressSanitizer can not provide additional info.
SUMMARY: AddressSanitizer: SEGV (/nix/store/h1ydpxkw9qhjdxjpic1pdc2nirggyy6f-openssl-3.3.2/lib/libcrypto.so.3+0x201a99) (BuildId: 41746a580d39075fc85e8c8065b6c07fb34e97d4) in EVP_MD_CTX_copy_ex
==1367==ABORTING
./test-lib.sh: line 1023: 1367 Aborted git $config add large1
error: last command exited with $?=134
not ok 4 - add with -c core.compression=0
Fix the issue by using the unsafe variant instead.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
One of the tests in t5616 asserts that git-fetch(1) with `--refetch`
triggers repository maintenance with the correct set of arguments. This
test is flaky and causes us to fail sometimes:
++ git -c protocol.version=0 -c gc.autoPackLimit=0 -c maintenance.incremental-repack.auto=1234 -C pc1 fetch --refetch origin
error: unable to open .git/objects/pack/pack-029d08823bd8a8eab510ad6ac75c823cfd3ed31e.pack: No such file or directory
fatal: unable to rename temporary file to '.git/objects/pack/pack-029d08823bd8a8eab510ad6ac75c823cfd3ed31e.pack'
fatal: could not finish pack-objects to repack local links
fatal: index-pack failed
error: last command exited with $?=128
The error message is quite confusing as it talks about trying to rename
a temporary packfile. A first hunch would thus be that this packfile
gets written by git-fetch(1), but removed by git-maintenance(1) while it
hasn't yet been finalized, which shouldn't ever happen. And indeed, when
looking closer one notices that the file that is supposedly of temporary
nature does not have the typical `tmp_pack_` prefix.
As it turns out, the "unable to rename temporary file" fatal error is a
red herring and the real error is "unable to open". That error is raised
by `check_collision()`, which is called by `finalize_object_file()` when
moving the new packfile into place. Because t5616 re-fetches objects, we
end up with the exact same pack as we already have in the repository. So
when the concurrent git-maintenance(1) process rewrites the preexisting
pack and unlinks it exactly at the point in time where git-fetch(1)
wants to check the old and new packfiles for equality we will see ENOENT
and thus `check_collision()` returns an error, which gets bubbled up by
`finalize_object_file()` and is then handled by `rename_tmp_packfile()`.
That function does not know about the exact root cause of the error and
instead just claims that the rename has failed.
This race is thus caused by b1b8dfde69 (finalize_object_file():
implement collision check, 2024-09-26), where we have newly introduced
the collision check.
By definition, two files cannot collide with each other when one of them
has been removed. We can thus trivially fix the issue by ignoring ENOENT
when opening either of the files we're about to check for collision.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
There's a race with LSan when spawning threads and one of the threads
calls die(). We worked around one such problem with index-pack in the
previous commit, but it exists in git-grep, too. You can see it with:
make SANITIZE=leak THREAD_BARRIER_PTHREAD=YesOnLinux
cd t
./t0003-attributes.sh --stress
which fails pretty quickly with:
==git==4096424==ERROR: LeakSanitizer: detected memory leaks
Direct leak of 32 byte(s) in 1 object(s) allocated from:
#0 0x7f906de14556 in realloc ../../../../src/libsanitizer/lsan/lsan_interceptors.cpp:98
#1 0x7f906dc9d2c1 in __pthread_getattr_np nptl/pthread_getattr_np.c:180
#2 0x7f906de2500d in __sanitizer::GetThreadStackTopAndBottom(bool, unsigned long*, unsigned long*) ../../../../src/libsanitizer/sanitizer_common/sanitizer_linux_libcdep.cpp:150
#3 0x7f906de25187 in __sanitizer::GetThreadStackAndTls(bool, unsigned long*, unsigned long*, unsigned long*, unsigned long*) ../../../../src/libsanitizer/sanitizer_common/sanitizer_linux_libcdep.cpp:614
#4 0x7f906de17d18 in __lsan::ThreadStart(unsigned int, unsigned long long, __sanitizer::ThreadType) ../../../../src/libsanitizer/lsan/lsan_posix.cpp:53
#5 0x7f906de143a9 in ThreadStartFunc<false> ../../../../src/libsanitizer/lsan/lsan_interceptors.cpp:431
#6 0x7f906dc9bf51 in start_thread nptl/pthread_create.c:447
#7 0x7f906dd1a677 in __clone3 ../sysdeps/unix/sysv/linux/x86_64/clone3.S:78
As with the previous commit, we can fix this by inserting a barrier that
makes sure all threads have finished their setup before continuing. But
there's one twist in this case: the thread which calls die() is not one
of the worker threads, but the main thread itself!
So we need the main thread to wait in the barrier, too, until all
threads have gotten to it. And thus we initialize the barrier for
num_threads+1, to account for all of the worker threads plus the main
one.
If we then test as above, t0003 should run indefinitely.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We sometimes get false positives from our linux-leaks CI job because of
a race in LSan itself. The problem is that one thread is still
initializing its stack in LSan's code (and allocating memory to do so)
while anothe thread calls die(), taking down the whole process and
triggering a leak check.
The problem is described in more detail in 993d38a066 (index-pack: spawn
threads atomically, 2024-01-05), which tried to fix it by pausing worker
threads until all calls to pthread_create() had completed. But that's
not enough to fix the problem, because the LSan setup code runs in the
threads themselves. So even though pthread_create() has returned, we
have no idea if all threads actually finished their setup before letting
any of them do real work.
We can fix that by using a barrier inside the threads themselves,
waiting for all of them to hit the start of their main function before
any of them proceed.
You can test for the race by running:
make SANITIZE=leak THREAD_BARRIER_PTHREAD=YesOnLinux
cd t
./t5309-pack-delta-cycles.sh --stress
which fails quickly before this patch, and should run indefinitely
without it.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
One thread primitive we don't yet support is a barrier: it waits for all
threads to reach a synchronization point before letting any of them
continue. This would be useful for avoiding the LSan race we see in
index-pack (and other places) by having all threads complete their
initialization before any of them start to do real work.
POSIX introduced a pthread_barrier_t in 2004, which does what we want.
But if we want to rely on it:
1. Our Windows pthread emulation would need a new set of wrapper
functions. There's a Synchronization Barrier primitive there, which
was introduced in Windows 8 (which is old enough for us to depend
on).
2. macOS (and possibly other systems) has pthreads but not
pthread_barrier_t. So there we'd have to implement our own barrier
based on the mutex and cond primitives.
Those are do-able, but since we only care about avoiding races in our
LSan builds, there's an easier way: make it a noop on systems without a
native pthread barrier.
This patch introduces a "maybe_thread_barrier" API. The clunky name
(rather than just using pthread_barrier directly) should hopefully clue
people in that on some systems it will do nothing. It's wired to a
Makefile knob which has to be triggered manually, and we enable it for
the linux-leaks CI jobs (since we know we'll have it there).
There are some other possible options:
- we could turn it on all the time for Linux systems based on uname.
But we really only care about it for LSan builds, and there is no
need to add extra code to regular builds.
- we could turn it on only for LSan builds. But that would break
builds on non-Linux platforms (like macOS) that otherwise should
support sanitizers.
- we could trigger only on the combination of Linux and LSan together.
This isn't too hard to do, but the uname check isn't completely
accurate. It is really about what your libc supports, and non-glibc
systems might not have it (though at least musl seems to).
So we'd risk breaking builds on those systems, which would need to
add a new knob. Though the upside would be that running local "make
SANITIZE=leak test" would be protected automatically.
And of course none of this protects LSan runs from races on systems
without pthread barriers. It's probably OK in practice to protect only
our CI jobs, though. The race is rare-ish and most leak-checking happens
through CI.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This reverts commit 993d38a0669a8056d496797516e743e26b6b8b54.
That commit was trying to solve a race between LSan setting up the
threads stack and another thread calling exit(), by making sure that all
pthread_create() calls have finished before doing any work that might
trigger the exit().
But that isn't sufficient. The setup code actually runs in the
individual threads themselves, not in the spawning thread's call to
pthread_create(). So while it may have improved the race a bit, you can
still trigger it pretty quickly with:
make SANITIZE=leak
cd t
./t5309-pack-delta-cycles.sh --stress
Let's back out that failed attempt so we can try again.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When storing output in test-results/, we usually give each numbered run
in a --stress set its own output file. But we don't do that for storing
LSan logs, so something like:
./t0003-attributes.sh --stress
will have many scripts simultaneously creating, writing to, and deleting
the test-results/t0003-attributes.leak directory. This can cause logs
from one run to be attributed to another, spurious failures when
creation and deletion race, and so on.
This has always been broken, but nobody noticed because it's rare to do
a --stress run with LSan (since the point is for the code to run quickly
many times in order to hit races). But if you're trying to find a race
in the leak sanitizing code, it makes sense to use these together.
We can fix it by using $TEST_RESULTS_BASE, which already incorporates
the stress job suffix.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The developer documentation has been updated to give the latest
info on gitk and git-gui maintainer.
* as/gitk-git-gui-repo-update:
Update the official repo of gitk
Check reallocation errors in unit tests, like everywhere else.
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Check the final reallocation for adding the terminating NULL and handle
it just like those in the loop. Simply use REFTABLE_ALLOC_GROW instead
of keeping the REFTABLE_REALLOC_ARRAY call and adding code to preserve
the original pointer value around it.
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When realloc(3) fails, it returns NULL and keeps the original allocation
intact. REFTABLE_ALLOC_GROW overwrites both the original pointer and
the allocation count variable in that case, simultaneously leaking the
original allocation and misrepresenting the number of storable items.
parse_names() avoids the leak by keeping the original pointer if
reallocation fails, but still increase the allocation count in such a
case as if it succeeded. That's OK, because the error handling code
just frees everything and doesn't look at names_cap anymore.
reftable_buf_add() does the same, but here it is a problem as it leaves
the reftable_buf in a broken state, with ->alloc being roughly twice as
big as the actually allocated memory, allowing out-of-bounds writes in
subsequent calls.
Reimplement REFTABLE_ALLOC_GROW to avoid leaks, keep allocation counts
in sync and still signal failures to callers while avoiding code
duplication in callers. Make it an expression that evaluates to 0 if no
reallocation is needed or it succeeded and 1 on failure while keeping
the original pointer and allocation counter values.
Adjust REFTABLE_ALLOC_GROW_OR_NULL to the new calling convention for
REFTABLE_ALLOC_GROW, but keep its support for non-size_t alloc variables
for now.
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When realloc(3) fails, it returns NULL and keeps the original allocation
intact. REFTABLE_ALLOC_GROW overwrites both the original pointer and
the allocation count variable in that case, simultaneously leaking the
original allocation and misrepresenting the number of storable items.
parse_names() and reftable_buf_add() avoid leaking by restoring the
original pointer value on failure, but all other callers seem to be OK
with losing the old allocation. Add a new variant of the macro,
REFTABLE_ALLOC_GROW_OR_NULL, which plugs the leak and zeros the
allocation counter. Use it for those callers.
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Wire up sanity checks for Meson to verify that no man pages are missing.
This check is similar to the same check we already have for our tests.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The "check-meson" target uses process substitution to check whether
extracted contents from "meson.build" match expected contents. Process
substitution is unportable though and thus the target will fail when
using for example Dash.
Fix this by writing data into a temporary directory.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Now that we generate man pages, articles and user manual with Meson the
only thing that is still missing in an installation of HTML documents is
a couple of static files. Wire these up to finalize Meson's support for
generating HTML documentation.
Diffing an installation that uses our Makefile with an installation that
uses Meson only surfaces a couple of discepancies now:
- Meson doesn't install "everyday.html" and "git-remote-helpers.html".
These files are marked as obsolete and don't contain any useful
information anymore: they simply point to their modern equivalents.
- Meson doesn't install "*.txt" files when asking for HTML docs. I'm
not sure why our Makefiles do this in the first place, and it does
seem like the resulting installation is fully functional even
without those files.
Other than that, both layout and file contents are the exact same.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
While the Meson build system already knows to generate man pages and our
user manual, it does not yet generate the random assortment of articles
that we have. Plug this gap.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The "howto-index.sh" is used to generate an index of our how-to docs. It
receives as input the paths to these documents, which would typically be
relative to the "Documentation/" directory in Makefile-based builds. In
an out-of-tree build though it will get relative that may be rooted
somewhere else entirely.
The file paths do end up in the generated index, and the expectation is
that they should always start with "howto/". But for out-of-tree builds
we would populate it with the paths relative to the build directory,
which is wrong.
Fix the issue by using `$(basename "$file")` to generate the path. While
at it, move the script into "howto/" to align it with the location of
the comparable "api-index.sh" script.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The "api-index.sh" script generates an index of API-related
documentation. The script does not handle out-of-tree builds and thus
cannot be used easily by Meson.
Refactor it to be independent of locations by both accepting a source
directory where the API docs live as well as a path to an output file.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Our documentation contains a user manual that gives people a short
introduction to Git. Our Makefile knows to generate the manual into
three different formats: an HTML page, a PDF and an info page. The Meson
build instructions don't yet generate any of these.
While wiring up all these formats I hit a couple of road blocks with how
we generate our info pages. Even though I eventually resolved these, it
made me question whether anybody actually uses info pages in the first
place. Checking through a couple of downstream consumers I couldn't find
a single user of either the info pages nor of our PDF manual in Arch
Linux, Debian, Fedora, Ubuntu, FreeBSD or OpenBSDFedora. So it's rather
safe to assume that there aren't really any users out there, and thus
the added complexity does not seem worth it.
Wire up support for building the user manual in HTML format and
conciously skip over the other two formats. This is basically a form of
silent deprecation: if people out there use the other two formats they
will eventually complain about them missing in Meson, which means we can
wire them up at a later point. If they don't we can phase out these
formats eventually.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When generating our user manual we set up a bit of extra configuration
compared to our normal configuration. This is done by having an extra
"user-manual.conf" file that Asciidoc seems to pull in automatically due
to matching filenames with "user-manual.txt". This dependency is quite
hidden though and thus easy to miss. Furthermore, it seems that Asciidoc
does not know to pull it in for out-of-tree builds where we use relative
paths.
The setup in AsciiDoctor is somewhat different: instead of having two
sets of configuration, we condition the use of manual-specific configs
based on whether the document type is "book". And as we only build our
user manual with that type this is sufficient.
Use the same trick for our user manual by inlining the configuration
into "asciidoc.conf.in" and making it conditional on whether or not
"doctype-book" is defined.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When generating HTML pages for our man pages we only generate them for
category 1 in Meson, which are the pages corresponding to our built-in
commands. I cannot tell why I added this filter though: our Makefile
installs all man pages, so a Meson-based build misses out on many of
them.
Fix this by removing the filter.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Our buildsystems generate a list of diff and merge tools that ultimately
end up in our documentation. And while Meson does wire up the logic, it
tries to use the TOOL_MODE environment variable to set up the mode. This
is wrong though: the mode is set via an argument that we have fixed to
'diff' mode by accident.
Fix this such that merge tools are properly generated.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
A couple of Meson documentation targets use `meson.current_source_dir()`
to resolve inputs. This has the downside that it does not automagically
make Meson track these inputs as a dependency. After all, string
arguments really can be anything, even if they happen to match an actual
filesystem path.
Adapt these build targets to instead use inputs.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
While our Makefile supports both Asciidoc and AsciiDoctor, our Meson
build instructions only support the former. Wire up support for the
latter, as well.
Our Makefile always favors Asciidoc, but Meson will automatically figure
out which of both to use based on whether they are installed or not. To
keep compatibility with our Makefile it favors Asciidoc over Asciidoctor
in case both are available.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In 7d549fe317 (meson: skip gitweb build when Perl is disabled,
2024-12-20) we have started to conditionally enable "gitweb" based on
whether or not Perl is enabled. By accident though that change causes us
to not build gitweb in case its feature flag is set to "auto" even if
autoconfiguration determines that it could be built. This is because we
use "gitweb_option.enabled()", which only checks whether the feature has
been explicitly enabled.
Fix the issue by using `gitweb_option.allowed()` instead, which returns
true in case it is either explicitly enabled or set to "auto". This also
works for the case where the feature becomes auto-disabled due to Perl
not being present because we use `disable_auto_if(not perl.found())`.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Building our "gitweb" interface is optional in our Makefile and in Meson
and not wired up at all with CMake, but disabling it causes a couple of
tests in the t950* range that pull in "t/lib-gitweb.sh". This is because
the test library knows to execute gitweb-tests based on whether or not
Perl is available, but we may have Perl available and still end up not
building gitweb e.g. with `make test NO_GITWEB=YesPlease`.
Fix this issue by wiring up a new "NO_GITWEB" build option so that we
can skip these tests in case gitweb is not built.
Note that this new build option requires us to move the configuration of
GIT-BUILD-OPTIONS to a later point in our Meson build instructions. But
as that file is only consumed by our tests at runtime this change does
not cause any issues.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The variables declared and substituted in GIT-BUILD-OPTIONS are not
ordered in any obvious way. Sort them alphabetically so that it becomes
obvious where new variables should go.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Replace `test -f` and `test ! -f` with `test_path_is_file` and
`test_path_is_missing` for better debuggability.
While `test -f` ensures that the file exists and is a regular file,
`test_path_is_file` provides clearer error messages on failure.
On the other hand, `test ! -f` checks either the absence of a regular
file or the presence of any other filesystem object, but looking at
them in the test individually, all of them should've said `test ! -e`,
i.e. "there shouldn't be anything at given path on filesystem."
Replace these cases with `test_path_is_missing` for better
debuggability.
Helped-by: karthik nayak <karthik.188@gmail.com>
Helped-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Meet Soni <meetsoni3017@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The functions `repo_get_merge_bases_many()` and friends accepts an array
of commits as well as a parameter that indicates how large that array
is. This parameter is using a signed integer, which leads to a couple of
warnings with -Wsign-compare.
Refactor the code to use `size_t` to track indices instead and adapt
callers accordingly. While most callers are trivial, there are two
callers that require a bit more scrutiny:
- builtin/merge-base.c:show_merge_base() subtracts `1` from the
`rev_nr` before calling `repo_get_merge_bases_many_dirty()`, so if
the variable was `0` it would wrap. This code is fine though because
its only caller will execute that code only when `argc >= 2`, and it
follows that `rev_nr >= 2`, as well.
- bisect.ccheck_merge_bases() similarly subtracts `1` from `rev_nr`.
Again, there is only a single caller that populates `rev_nr` with
`good_revs.nr`. And because a bisection always requires at least one
good revision it follws that `rev_nr >= 1`.
Mark the file as -Wsign-compare-clean.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Fix a couple of -Wsign-compare issues in "shallow.c" and mark the file
as -Wsign-compare-clean. This change prepares the code for a refactoring
of `repo_in_merge_bases_many()`, which will be adapted to accept the
number of commits as `size_t` instead of `int`.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Fix remaining -Wsign-compare warnings in "builtin/log.c" and mark the
file as -Wsign-compare-clean. While most of the fixes are obvious, one
fix requires us to use `cast_size_t_to_int()`, which will cause us to
die in case the `size_t` cannot be represented as `int`. This should be
fine though, as the data would typically be set either via a config key
or via the command line, neither of which should ever exceed a couple of
kilobytes of data.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Similar as with the preceding commit, adapt "builtin/log.c" so that it
tracks array indices via `size_t` instead of using signed integers. This
fixes a couple of -Wsign-compare warnings and prepares the code for
a similar refactoring of `repo_get_merge_bases_many()` in a subsequent
commit.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Similar as with the preceding commit, adapt `get_reachable_subset()` so
that it tracks array indices via `size_t` instead of using signed
integers to fix a couple of -Wsign-compare warnings. Adapt callers
accordingly.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The function `remove_redundant()` gets as input an array of commits as
well as the size of that array and then drops redundant commits from
that array. It then returns either `-1` in case an error occurred, or
the new number of items in the array.
The function receives and returns these sizes with a signed integer,
which causes several warnings with -Wsign-compare. Fix this issue by
consistently using `size_t` to track array indices and splitting up
the returned value into a returned error code and a separate out pointer
for the new computed size.
Note that `get_merge_bases_many()` and related functions still track
array sizes as a signed integer. This will be fixed in a subsequent
commit.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The `can_all_from_reach_with_flag()` function accepts a parameter that
allows callers to cut off traversal at a specified commit date. This
parameter is of type `time_t`, which is a signed type, while we end up
comparing it to a commit's `date` field, which is of the unsigned type
`timestamp_t`.
Fix the parameter to be of type `timestamp_t`. There is only a single
caller in "upload-pack.c" that sets this parameter, and that caller
knows to pass in a `timestamp_t` already.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In 62e745ced2 (prio-queue: use size_t rather than int for size,
2024-12-20), we refactored `struct prio_queue` to track the number of
contained entries via a `size_t`. While the refactoring adapted one of
the users of that variable, it forgot to also adapt "commit-reach.c"
accordingly. This was missed because that file has -Wsign-conversion
disabled.
Fix the issue by using a `size_t` to iterate through entries.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In 62e745ced2 (prio-queue: use size_t rather than int for size,
2024-12-20), we have converted `struct prio_queue` to use `size_t` to
track the number of entries in the queue as well as the allocated size
of the underlying array. There is one more counter though, namely the
insertion counter, that is still using an `unsigned` instead of a
`size_t`. This is unlikely to ever be a problem, but it makes one wonder
why some indices use `size_t` while others use `unsigned`. Furthermore,
the mentioned commit stated the intent to also adapt these variables,
but seemingly forgot to do so.
Fix the issue by converting those counters to use `size_t`, as well.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Fix compiler warings from msvc in date.c for value truncation from 64
bit to 32 bit integers.
Also switch from int to size_t for all variables with result of strlen()
which cannot become negative.
Signed-off-by: Sören Krecker <soekkle@freenet.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Point out:
- current maintaner
- contribution flow is via the mailing list
Signed-off-by: Alexander Shopov <ash@kambanaria.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Let's wait for git-gui, gitk, and possibly po/ and delay the tagging
of the -rc1. Many people are already offline for the end-of-year
holidays and it is a slow week, and 'master' front has too many new
things graduated from 'next' a bit too early for me to feel
comfortable.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git refs migrate" learned to also migrate the reflog data across
backends.
* kn/reflog-migration:
refs: mark invalid refname message for translation
refs: add support for migrating reflogs
refs: allow multiple reflog entries for the same refname
refs: introduce the `ref_transaction_update_reflog` function
refs: add `committer_info` to `ref_transaction_add_update()`
refs: extract out refname verification in transactions
refs/files: add count field to ref_lock
refs: add `index` field to `struct ref_udpate`
refs: include committer info in `ref_update` struct
A topic to optionally build with meson, which has graduated to
'master' recently, broke Documentation pipeline with asciidoctor
for the normal Makefile build as well as meson-based one, which
have been corrected.
* ma/asciidoctor-build-fixes:
asciidoctor-extensions.rb.in: inject GIT_DATE
asciidoctor-extensions.rb.in: add missing word
asciidoctor-extensions.rb.in: delete existing <refmiscinfo/>
A topic to optionally build with meson, which has graduated to
'master' recently, has regressed the normal Makefile build, which
is being corrected.
* ps/build-hotfix:
meson: add options to override build information
GIT-VERSION-GEN: fix overriding GIT_BUILT_FROM_COMMIT and GIT_DATE
GIT-VERSION-GEN: fix overriding GIT_VERSION
Makefile: introduce template for GIT-VERSION-GEN
Makefile: drop unneeded indirection for GIT-VERSION-GEN outputs
Makefile: stop including "GIT-VERSION-FILE" in docs
The meson-build procedure is integrated into CI to catch and
prevent bitrotting.
* ps/ci-meson:
ci: wire up Meson builds
t: introduce compatibility options to clar-based tests
t: fix out-of-tree tests for some git-p4 tests
Makefile: detect missing Meson tests
meson: detect missing tests at configure time
t/unit-tests: rename clar-based unit tests to have a common prefix
Makefile: drop -DSUPPRESS_ANNOTATED_LEAKS
ci/lib: support custom output directories when creating test artifacts
Code to reuse objects based on bitmap contents have been tightened
to avoid race condition even when multiple packs are involved.
* tb/bitmap-fix-pack-reuse:
pack-bitmap.c: ensure pack validity for all reuse packs
meson-based build still tried to build and install gitweb even when
Perl is disabled, which has been corrected.
* ps/build-meson-gitweb:
meson: skip gitweb build when Perl is disabled
"git range-diff" learned to optionally show and compare merge
commits in the ranges being compared, with the --diff-merges
option.
* js/range-diff-diff-merges:
range-diff: introduce the convenience option `--remerge-diff`
range-diff: optionally include merge commits' diffs in the analysis
Update the way rename() emulation on Windows handle directories to
correct an earlier attempt to do the same.
* js/mingw-rename-fix:
mingw_rename: do support directory renames
Revert recent changes to the way windows environment is set up for
GitHub CI.
* js/github-windows-setup-fix:
GitHub ci(windows): speed up initializing Git for Windows' minimal SDK again
Build fixes for Windows.
* js/ps-build-cmake-fixup:
cmake/vcxproj: stop special-casing `remote-ext`
cmake: put the Perl modules into the correct location again
cmake: use the correct file name for the Perl header
cmake(mergetools): better support for out-of-tree builds
cmake: better support for out-of-tree builds follow-up
Regression fix for 'show-index' when run outside of a repository.
* as/show-index-uninitialized-hash:
t5300: add test for 'show-index --object-format'
show-index: fix uninitialized hash function
Start working to make the codebase buildable with -Wsign-compare.
* ps/build-sign-compare:
t/helper: don't depend on implicit wraparound
scalar: address -Wsign-compare warnings
builtin/patch-id: fix type of `get_one_patchid()`
builtin/blame: fix type of `length` variable when emitting object ID
gpg-interface: address -Wsign-comparison warnings
daemon: fix type of `max_connections`
daemon: fix loops that have mismatching integer types
global: trivial conversions to fix `-Wsign-compare` warnings
pkt-line: fix -Wsign-compare warning on 32 bit platform
csum-file: fix -Wsign-compare warning on 32-bit platform
diff.h: fix index used to loop through unsigned integer
config.mak.dev: drop `-Wno-sign-compare`
global: mark code units that generate warnings with `-Wsign-compare`
compat/win32: fix -Wsign-compare warning in "wWinMain()"
compat/regex: explicitly ignore "-Wsign-compare" warnings
git-compat-util: introduce macros to disable "-Wsign-compare" warnings
Reftable backend adds check for upper limit of log's update_index.
* kn/reftable-writer-log-write-verify:
reftable/writer: ensure valid range for log's update_index
GitLab CI updates.
* ps/ci-gitlab-update:
ci/lib: fix "CI setup" sections with GitLab CI
ci/lib: do not interpret escape sequences in `group ()` arguments
ci/lib: remove duplicate trap to end "CI setup" group
gitlab-ci: update macOS images to Sonoma
Recent reftable updates mistook a NULL return from a request for
0-byte allocation as OOM and died unnecessarily, which has been
corrected.
* ps/reftable-alloc-failures-zalloc-fix:
reftable/basics: return NULL on zero-sized allocations
reftable/stack: fix zero-sized allocation when there are no readers
reftable/merged: fix zero-sized allocation when there are no readers
reftable/stack: don't perform auto-compaction with less than two tables
In the preceding commits we have fixed a couple of issues when
allocating zero-sized objects. These issues were masked by
implementation-defined behaviour. Quoting malloc(3p):
If size is 0, either:
* A null pointer shall be returned and errno may be set to an
implementation-defined value, or
* A pointer to the allocated space shall be returned. The
application shall ensure that the pointer is not used to access an
object.
So it is perfectly valid that implementations of this function may or
may not return a NULL pointer in such a case.
Adapt both `reftable_malloc()` and `reftable_realloc()` so that they
return NULL pointers on zero-sized allocations. This should remove any
implementation-defined behaviour in our allocators and thus allows us to
detect such platform-specific issues more easily going forward.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Similar as the preceding commit, we may try to do a zero-sized
allocation when reloading a reftable stack that ain't got any tables.
It is implementation-defined whether malloc(3p) returns a NULL pointer
in that case or a zero-sized object. In case it does return a NULL
pointer though it causes us to think we have run into an out-of-memory
situation, and thus we return an error.
Fix this by only allocating arrays when they have at least one entry.
Reported-by: Randall S. Becker <rsbecker@nexbridge.com>
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
It was reported [1] that Git started to fail with an out-of-memory error
when initializing repositories with the reftable backend on NonStop
platforms. A bisect led to 802c0646ac (reftable/merged: handle
allocation failures in `merged_table_init_iter()`, 2024-10-02), which
changed how we allocate memory when initializing a merged table.
The root cause of this seems to be that NonStop returns a `NULL` pointer
when doing a zero-sized allocation. This would've already happened
before the above change, but we never noticed because we did not check
the result. Now we do notice and thus return an out-of-memory error to
the caller.
Fix the issue by skipping the allocation altogether in case there are no
readers.
[1]: <00ad01db5017$aa9ce340$ffd6a9c0$@nexbridge.com>
Reported-by: Randall S. Becker <rsbecker@nexbridge.com>
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In order to compact tables we need at least two tables. Bail out early
from `reftable_stack_auto_compact()` in case we have less than two
tables.
In the original, `stack_table_sizes_for_compaction()` yields an array
that has the same length as the number of tables. This array is then
passed on to `suggest_compaction_segment()`, which returns an empty
segment in case we have less than two tables. The segment is then passed
to `segment_size()`, which will return `0` because both start and end of
the segment are `0`. And because we only call `stack_compact_range()` in
case we have a positive segment size we don't perform auto-compaction at
all. Consequently, this change does not result in a user-visible change
in behaviour when called with a single table.
But when called with no tables this protects us against a potential
out-of-memory error: `stack_table_sizes_for_compaction()` would try to
allocate a zero-byte object when there aren't any tables, and that may
lead to a `NULL` pointer on some platforms like NonStop which causes us
to bail out with an out-of-memory error.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
After a38edab7c8 (Makefile: generate doc versions via GIT-VERSION-GEN,
2024-12-06), we no longer inject GIT_DATE when building with
Asciidoctor.
Replace the <date/> tag in the XML to inject the value of GIT_DATE.
Unlike <refmiscinfo/> as handled in a recent commit, we have no reason
to expect that this tag might be missing, so there's no need for "maybe
remove, then add" and we can just outright replace the one that
Asciidoctor has generated based on the mtime of the source file.
Compared to pre-a38edab7c8, we now end up injecting this also in the
build of Git.3pm, which until now has been using the mtime of Git.pm.
That is arguably even a good change since it results in more
reproducible builds.
Signed-off-by: Martin Ågren <martin.agren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Commit a38edab7c8 (Makefile: generate doc versions via GIT-VERSION-GEN,
2024-12-06) stopped providing an attribute value "Git $(GIT_VERSION)" to
asciidoc/Asciidoctor over the command line. Instead, we now provide the
attribute to asciidoc through a generated asciidoc.conf, where the value
is generated as "Git @GIT_VERSION@".
In the similar mechanism for Asciidoctor, we forgot the "Git" prefix.
Restore it.
Signed-off-by: Martin Ågren <martin.agren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
After the recent a38edab7c8 (Makefile: generate doc versions via
GIT-VERSION-GEN, 2024-12-06), building with Asciidoctor results in
manpages where the headers no longer contain "Git Manual" and the
footers no longer identify the built Git version.
Before a38edab7c8, we used to just provide a few attributes to
Asciidoctor (and asciidoc). Commit 7a30134358 (asciidoctor-extensions:
provide `<refmiscinfo/>`, 2019-09-16) noted that older versions of
Asciidoctor didn't propagate those attributes into the built XML files,
so we started injecting them ourselves from this script. With newer
versions of Asciidoctor, we'd end up with some harmless duplication
among the tags in the final XML.
Post-a38edab7c8, we don't provide these attributes and Asciidoctor
inserts empty-ish values. After our additions from 7a30134358, we get
<refmiscinfo class="source"> </refmiscinfo>
<refmiscinfo class="manual"> </refmiscinfo>
<refmiscinfo class="source">2.47.1.[...]</refmiscinfo>
<refmiscinfo class="manual">Git Manual</refmiscinfo>
When these are handled, it appears to be first come first served,
meaning that our additions have no effect and we regress as described in
the first paragraph.
Remove existing "source" or "manual" <refmiscinfo/> tags before adding
ours. I considered removing all <refmiscinfo/> to get a nice clean
slate, instead of just those two that we want to replace to be a bit
more precise. I opted for the latter. Maybe one day, Asciidoctor learns
to insert something useful there which `xmlto` can pick up and make good
use of -- let's not interfere.
Signed-off-by: Martin Ågren <martin.agren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* ps/build-hotfix:
meson: add options to override build information
GIT-VERSION-GEN: fix overriding GIT_BUILT_FROM_COMMIT and GIT_DATE
GIT-VERSION-GEN: fix overriding GIT_VERSION
Makefile: introduce template for GIT-VERSION-GEN
Makefile: drop unneeded indirection for GIT-VERSION-GEN outputs
Makefile: stop including "GIT-VERSION-FILE" in docs
The short help text given by "git show-index -h" says
$ git show-index -h
usage: git show-index [--object-format=<hash-algorithm>]
--[no-]object-format <hash-algorithm>
specify the hash algorithm to use
The command takes a pack .idx file from its standard input. The
user has to _know_ this, as there is no indication from this output.
Give a hint that the data to work on is fed from its standard input.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We inject various different kinds of build information into build
artifacts, like the version string or the commit from which Git was
built. Add options to let users explicitly override this information
with Meson.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Same as with the preceding commit, neither GIT_BUILT_FROM_COMMIT nor
GIT_DATE can be overridden via the environment. Especially the latter is
of importance given that we set it in our own "Documentation/doc-diff"
script.
Make the values of both variables overridable. Luckily we don't pull in
these values via any included Makefiles, so the fix is trivial compared
to the fix for GIT_VERSON.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
GIT-VERSION-GEN tries to derive the version that Git is being built from
via multiple different sources in the following order:
1. A file called "version" in the source tree's root directory, if it
exists.
2. The current commit in case Git is built from a Git repository.
3. Otherwise, we use a fallback version stored in a variable which is
bumped whenever a new Git version is getting tagged.
It used to be possible to override the version by overriding the
`GIT_VERSION` Makefile variable (e.g. `make GIT_VERSION=foo`). This
worked somewhat by chance, only: `GIT-VERSION-GEN` would write the
actual Git version into `GIT-VERSION-FILE`, not the overridden value,
but when including the file into our Makefile we would not override the
`GIT_VERSION` variable because it has already been set by the user. And
because our Makefile used the variable to propagate the version to our
build tools instead of using `GIT-VERSION-FILE` the resulting build
artifacts used the overridden version.
But that subtle mechanism broke with 4838deab65 (Makefile: refactor
GIT-VERSION-GEN to be reusable, 2024-12-06) and subsequent commits
because the version information is not propagated via the Makefile
variable anymore, but instead via the files that `GIT-VERSION-GEN`
started to write. And as the script never knew about the `GIT_VERSION`
environment variable in the first place it uses one of the values listed
above instead of the overridden value.
Fix this issue by making `GIT-VERSION-GEN` handle the case where
`GIT_VERSION` has been set via the environment.
Note that this requires us to introduce a new GIT_VERSION_OVERRIDE
variable that stores a potential user-provided value, either via the
environment or via "config.mak". Ideally we wouldn't need it and could
just continue to use GIT_VERSION for this. But unfortunately, Makefiles
will first include all sub-Makefiles before figuring out whether it
needs to re-make any of them [1]. Consequently, if there already is a
GIT-VERSION-FILE, we would have slurped in its value of GIT_VERSION
before we call GIT-VERSION-GEN, and because GIT-VERSION-GEN now uses
that value as an override it would mean that the first generated value
for GIT_VERSION will remain unchanged.
Furthermore we have to move the include for "GIT-VERSION-FILE" after the
includes for "config.mak" and related so that GIT_VERSION_OVERRIDE can
be set to the value provided by "config.mak".
[1]: https://www.gnu.org/software/make/manual/html_node/Remaking-Makefiles.html
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Introduce a new template to call GIT-VERSION-GEN. This will allow us to
iterate on how exactly the script is called in subsequent commits
without having to adapt all call sites every time.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Some of the callsites of GIT-VERSION-GEN generate the target file with a
"+" suffix first and then move the file into place when the new contents
are different compared to the old contents. This allows us to avoid a
needless rebuild by not updating timestamps of the target file when its
contents will remain unchanged anyway.
In fact though, this exact logic is already handled in GIT-VERSION-GEN,
so doing this manually is pointless. This is a leftover from an earlier
version of 4838deab65 (Makefile: refactor GIT-VERSION-GEN to be
reusable, 2024-12-06), where the script didn't handle that logic for us.
Drop the needless indirection.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We include "GIT-VERSION-FILE" in our docs Makefile, but don't actually
use the "GIT_VERSION" variable that it provides. This is a leftover from
the conversion to make "GIT-VERSION-GEN" generate version information
in-place by substituting placeholders in 4838deab65 (Makefile: refactor
GIT-VERSION-GEN to be reusable, 2024-12-06) and subsequent commits,
where all usages of the variable were removed.
Stop including the file.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* Fix an occurrence of "dpuis" to "depuis".
* Add some entries in the translation index at the beginning of the
file.
* Harmonize the spelling of various items based on how common each
spelling or translation is throughout the file.
* superproject -> super-projet
* patch -> rustine
* regex / regexp -> regex
* regular expression -> expression régulière
* loose object -> objet esseulé
* directory -> répertoire
* Fix various typos (e.g.: trailing ".<" or ".", "mêm" -> "même")
* Fix minor grammatical errors (e.g: "le valeur" -> "la valeur")
* Remove old translations
Signed-off-by: Florian Sabourin <ethiraric@gmail.com>
It is possible to configure a Git build without Perl when disabling both
our test suite and all Perl-based features. In Meson, this can be
achieved with `meson setup -Dperl=disabled -Dtests=false`.
It was reported by a user that this breaks the Meson build because
gitweb gets built even if Perl was not discovered in such a build:
$ meson setup .. -Dtests=false -Dperl=disabled
...
../gitweb/meson.build:2:43: ERROR: Unable to get the path of a not-found external program
Fix this issue by introducing a new feature-option that allows the user
to configure whether or not to build Gitweb. The feature is set to
'auto' by default and will be disabled automatically in case Perl was
not found on the system.
Reported-by: Daniel Engberg <daniel.engberg.lists@pyret.net>
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The path-walk API currently uses a stack-based approach to recursing
through the list of paths within the repository. This guarantees that
after a tree path is explored, all paths contained within that tree path
will be explored before continuing to explore siblings of that tree
path.
The initial motivation of this depth-first approach was to minimize
memory pressure while exploring the repository. A breadth-first approach
would have too many "active" paths being stored in the paths_to_lists
map.
We can take this approach one step further by making sure that blob
paths are visited before tree paths. This allows the API to free the
memory for these blob objects before continuing to perform the
depth-first search. This modifies the order in which we visit siblings,
but does not change the fact that we are performing depth-first search.
To achieve this goal, use a priority queue with a custom sorting method.
The sort needs to handle tags, blobs, and trees (commits are handled
slightly differently). When objects share a type, we can sort by path
name. This will keep children of the latest path to leave the stack be
preferred over the rest of the paths in the stack, since they agree in
prefix up to and including a directory separator. When the types are
different, we can prefer tags over other types and blobs over trees.
This causes significant adjustments to t6601-path-walk.sh to rearrange
the order of the visited paths.
Signed-off-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When the input rev_info has UNINTERESTING starting points, we want to be
sure that the UNINTERESTING flag is passed appropriately through the
objects. To match how this is done in places such as 'git pack-objects', we
use the mark_edges_uninteresting() method.
This method has an option for using the "sparse" walk, which is similar in
spirit to the path-walk API's walk. To be sure to keep it independent, add a
new 'prune_all_uninteresting' option to the path_walk_info struct.
To check how the UNINTERSTING flag is spread through our objects, extend the
'test-tool path-walk' command to output whether or not an object has that
flag. This changes our tests significantly, including the removal of some
objects that were previously visited due to the incomplete implementation.
Signed-off-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The rev_info that is specified for a path-walk traversal may specify
visiting tag refs (both lightweight and annotated) and also may specify
indexed objects (blobs and trees). Update the path-walk API to walk
these objects as well.
When walking tags, we need to peel the annotated objects until reaching
a non-tag object. If we reach a commit, then we can add it to the
pending objects to make sure we visit in the commit walk portion. If we
reach a tree, then we will assume that it is a root tree. If we reach a
blob, then we have no good path name and so add it to a new list of
"tagged blobs".
When the rev_info includes the "--indexed-objects" flag, then the
pending set includes blobs and trees found in the cache entries and
cache-tree. The cache entries are usually blobs, though they could be
trees in the case of a sparse index. The cache-tree stores
previously-hashed tree objects but these are cleared out when staging
objects below those paths. We add tests that demonstrate this.
The indexed objects come with a non-NULL 'path' value in the pending
item. This allows us to prepopulate the 'path_to_lists' strmap with
lists for these paths.
The tricky thing about this walk is that we will want to combine the
indexed objects walk with the commit walk, especially in the future case
of walking objects during a command like 'git repack'.
Whenever possible, we want the objects from the index to be grouped with
similar objects in history. We don't want to miss any paths that appear
only in the index and not in the commit history.
Thus, we need to be careful to let the path stack be populated initially
with only the root tree path (and possibly tags and tagged blobs) and go
through the normal depth-first search. Afterwards, if there are other
paths that are remaining in the paths_to_lists strmap, we should then
iterate through the stack and visit those objects recursively.
Signed-off-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We add the ability to filter the object types in the path-walk API so
the callback function is called fewer times.
This adds the ability to ask for the commits in a list, as well. We
re-use the empty string for this set of objects because these are passed
directly to the callback function instead of being part of the
'path_stack'.
Future changes will add the ability to visit annotated tags.
Signed-off-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add some tests based on the current behavior, doing interesting checks
for different sets of branches, ranges, and the --boundary option. This
sets a baseline for the behavior and we can extend it as new options are
introduced.
Store and output a 'batch_nr' value so we can demonstrate that the paths are
grouped together in a batch and not following some other ordering. This
allows us to test the depth-first behavior of the path-walk API. However, we
purposefully do not test the order of the objects in the batch, so the
output is compared to the expected output through a sort.
It is important to mention that the behavior of the API will change soon as
we start to handle UNINTERESTING objects differently, but these tests will
demonstrate the change in behavior.
Signed-off-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This test helper will be helpful to reduce repeated logic in
t6601-path-walk.sh, but may be helpful elsewhere, too.
Signed-off-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In anticipation of a few planned applications, introduce the most basic form
of a path-walk API. It currently assumes that there are no UNINTERESTING
objects, and does not include any complicated filters. It calls a function
pointer on groups of tree and blob objects as grouped by path. This only
includes objects the first time they are discovered, so an object that
appears at multiple paths will not be included in two batches.
These batches are collected in 'struct type_and_oid_list' objects, which
store an object type and an oid_array of objects.
The data structures are documented in 'struct path_walk_context', but in
summary the most important are:
* 'paths_to_lists' is a strmap that connects a path to a
type_and_oid_list for that path. To avoid conflicts in path names,
we make sure that tree paths end in "/" (except the root path with
is an empty string) and blob paths do not end in "/".
* 'path_stack' is a string list that is added to in an append-only
way. This stores the stack of our depth-first search on the heap
instead of using recursion.
* 'path_stack_pushed' is a strmap that stores path names that were
already added to 'path_stack', to avoid repeating paths in the
stack. Mostly, this saves us from quadratic lookups from doing
unsorted checks into the string_list.
The coupling of 'path_stack' and 'path_stack_pushed' is protected by the
push_to_stack() method. Call this instead of inserting into these
structures directly.
The walk_objects_by_path() method initializes these structures and
starts walking commits from the given rev_info struct. The commits are
used to find the list of root trees which populate the start of our
depth-first search.
The core of our depth-first search is in a while loop that continues
while we have not indicated an early exit and our 'path_stack' still has
entries in it. The loop body pops a path off of the stack and "visits"
the path via the walk_path() method.
The walk_path() method gets the list of OIDs from the 'path_to_lists'
strmap and executes the callback method on that list with the given path
and type. If the OIDs correspond to tree objects, then iterate over all
trees in the list and run add_children() to add the child objects to
their own lists, adding new entries to the stack if necessary.
In testing, this depth-first search approach was the one that used the
least memory while iterating over the object lists. There is still a
chance that repositories with too-wide path patterns could cause memory
pressure issues. Limiting the stack size could be done in the future by
limiting how many objects are being considered in-progress, or by
visiting blob paths earlier than trees.
There are many future adaptations that could be made, but they are left for
future updates when consumers are ready to take advantage of those features.
Signed-off-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The error message produced by `transaction_refname_valid()` changes based
on whether the update is a ref update or a reflog update, with the use
of a ternary operator. This breaks translation since the sub-msg is not
marked for translation. Fix this by setting the entire message using a
`if {} else {}` block and marking each message for translation.
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The alloc and nr fields of a prio-queue tell us how much memory is
allocated and used in the array. So the natural type for them is size_t,
which prevents overflow on 64-bit systems where "int" is still 32 bits.
This is unlikely to happen in practice, as we typically use it for
storing commits, and having 2^31 of those is rather a lot. But it's good
to keep our generic data structures as flexible as possible. And as we
start to enforce -Wsign-compare, it means that callers need to use
"int", too, and the problem proliferates. Let's fix it at the source.
The changes here can be put into a few groups:
1. Changing the alloc/nr fields in the struct to size_t. This requires
swapping out int for size_t in negotiator/skipping.c, as well as
in prio_queue_get(), because those all iterate over the array.
Building with -Wsign-compare complains about these.
2. Other code that assigns or passes around indexes into the array
(e.g., the swap() and compare() functions) won't trigger
-Wsign-compare because we are simply truncating the values. These
are caught by -Wconversion, but I've adjusted them here to
future-proof us.
3. In prio_queue_reverse() we compute "queue->nr - 1" without checking
if anything is in the queue, which underflows now that nr is
unsigned. We can fix that by returning early when the queue is
empty (there is nothing to reverse).
4. The insertion_ctr variable is currently unsigned, but can likewise
grow (it is actually worse, because adding and removing an element
many times will keep increasing the counter, even though "nr" does
not). I've bumped that to size_t here, as well.
But -Wconversion notes that computing the "cmp" result by
subtracting the counters and assigning to "int" is a potential
problem. And that's true even before this patch, since we use an
unsigned counter (imagine comparing "2^32-1" and "0", which should
be a high positive value, but instead is "-1" as a signed int).
Since we only care about the sign (and not the magnitude) of the
result, we could fix this by swapping out the subtraction for a
ternary comparison. Probably the performance impact would be
negligible, since we just called into a custom compare function and
branched on its result anyway. But it's easy enough to do a
branchless version by subtracting the comparison results.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git bundle create" with an annotated tag on the positive end of
the revision range had a workaround code for older limitation in
the revision walker, which has become unnecessary.
* tc/bundle-with-tag-remove-workaround:
bundle: remove unneeded code
"git fetch" honors "remote.<remote>.followRemoteHEAD" settings to
tweak the remote-tracking HEAD in "refs/remotes/<remote>/HEAD".
* bf/fetch-set-head-config:
remote set-head: set followRemoteHEAD to "warn" if "always"
fetch set_head: add warn-if-not-$branch option
fetch set_head: move warn advice into advise_if_enabled
fetch: add configuration for set_head behaviour
"git fetch" from a configured remote learned to update a missing
remote-tracking HEAD but it asked the remote about their HEAD even
when it did not need to, which has been corrected. Incidentally,
this also corrects "git fetch --tags $URL" which was broken by the
new feature in an unspecified way.
* jc/set-head-symref-fix:
fetch: do not ask for HEAD unnecessarily
When "git fetch $remote" notices that refs/remotes/$remote/HEAD is
missing and discovers what branch the other side points with its
HEAD, refs/remotes/$remote/HEAD is updated to point to it.
* bf/set-head-symref:
fetch set_head: handle mirrored bare repositories
fetch: set remote/HEAD if it does not exist
refs: add create_only option to refs_update_symref_extended
refs: add TRANSACTION_CREATE_EXISTS error
remote set-head: better output for --auto
remote set-head: refactor for readability
refs: atomically record overwritten ref in update_symref
refs: standardize output of refs_read_symbolic_ref
t/t5505-remote: test failure of set-head
t/t5505-remote: set default branch to main
Stop using `the_repository` in the "match-trees" subsystem by passing
down the already-available repository parameters to internal functions
as required.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Stop using `the_repository` in the "graph" subsystem by reusing the
repository we already have available via `struct rev_info`.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Stop using `the_repository` in the "add-interactive" subsystem by
reusing the repository we already have available via parameters or in
the `add_i_state` structure.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Stop using `the_repository` in the "tmp-objdir" subsystem by passing
in the repostiroy when creating a new temporary object directory.
While we could trivially update the caller to pass in the hash algorithm
used by the index itself, we instead pass in `the_hash_algo`. This is
mostly done to stay consistent with the rest of the code in that file,
which isn't prepared to handle arbitrary repositories, either.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Stop using `the_repository` in the "resolve-undo" subsystem by passing
in the hash algorithm when reading or writing resolve-undo information.
While we could trivially update the caller to pass in the hash algorithm
used by the index itself, we instead pass in `the_hash_algo`. This is
mostly done to stay consistent with the rest of the code in that file,
which isn't prepared to handle arbitrary repositories, either.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Stop using `the_repository` in the "credential" subsystem by passing in
a repository when filling, approving or rejecting credentials.
Adjust callers accordingly by using `the_repository`. While there may be
some callers that have a repository available in their context, this
trivial conversion allows for easier verification and bubbles up the use
of `the_repository` by one level.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Stop using `the_repository` in the "mailinfo" subsystem by passing in
a repository when setting up the mailinfo structure.
Adjust callers accordingly by using `the_repository`. While there may be
some callers that have a repository available in their context, this
trivial conversion allows for easier verification and bubbles up the use
of `the_repository` by one level.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Stop using `the_repository` in the "diagnose" subsystem by passing in a
repository when generating a diagnostics archive.
Adjust callers accordingly by using `the_repository`. While there may be
some callers that have a repository available in their context, this
trivial conversion allows for easier verification and bubbles up the use
of `the_repository` by one level.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Stop using `the_repository` in the "server-info" subsystem by passing in
a repository when updating server info and storing the repository in the
`update_info_ctx` structure to make it accessible to other functions.
Adjust callers accordingly by using `the_repository`. While there may be
some callers that have a repository available in their context, this
trivial conversion allows for easier verification and bubbles up the use
of `the_repository` by one level.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Stop using `the_repository` in the "send-pack" subsystem by passing in a
repository when sending a packfile.
Adjust callers accordingly by using `the_repository`. While there may be
some callers that have a repository available in their context, this
trivial conversion allows for easier verification and bubbles up the use
of `the_repository` by one level.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Stop using `the_repository` in the "serve" subsystem by passing in a
repository when advertising capabilities or serving requests.
Adjust callers accordingly by using `the_repository`. While there may be
some callers that have a repository available in their context, this
trivial conversion allows for easier verification and bubbles up the use
of `the_repository` by one level.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Stop using `the_repository` in the "trace" subsystem by passing in a
repository when setting up tracing.
Adjust the only caller accordingly.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Stop using `the_repository` in the "pager" subsystem by passing in a
repository when setting up the pager and when configuring it.
Adjust callers accordingly by using `the_repository`. While there may be
some callers that have a repository available in their context, this
trivial conversion allows for easier verification and bubbles up the use
of `the_repository` by one level.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Stop using `the_repository` in the "progress" subsystem by passing in a
repository when initializing `struct progress`. Furthermore, store a
pointer to the repository in that struct so that we can pass it to the
trace2 API when logging information.
Adjust callers accordingly by using `the_repository`. While there may be
some callers that have a repository available in their context, this
trivial conversion allows for easier verification and bubbles up the use
of `the_repository` by one level.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* ps/build-sign-compare:
t/helper: don't depend on implicit wraparound
scalar: address -Wsign-compare warnings
builtin/patch-id: fix type of `get_one_patchid()`
builtin/blame: fix type of `length` variable when emitting object ID
gpg-interface: address -Wsign-comparison warnings
daemon: fix type of `max_connections`
daemon: fix loops that have mismatching integer types
global: trivial conversions to fix `-Wsign-compare` warnings
pkt-line: fix -Wsign-compare warning on 32 bit platform
csum-file: fix -Wsign-compare warning on 32-bit platform
diff.h: fix index used to loop through unsigned integer
config.mak.dev: drop `-Wno-sign-compare`
global: mark code units that generate warnings with `-Wsign-compare`
compat/win32: fix -Wsign-compare warning in "wWinMain()"
compat/regex: explicitly ignore "-Wsign-compare" warnings
git-compat-util: introduce macros to disable "-Wsign-compare" warnings
Commit 44f9fd6496 (pack-bitmap.c: check preferred pack validity when
opening MIDX bitmap, 2022-05-24) prevents a race condition whereby the
preferred pack disappears between opening the MIDX bitmap and attempting
verbatim reuse out of its packs.
That commit forces open_midx_bitmap_1() to ensure the validity of the
MIDX's preferred pack, meaning that we have an open file handle on the
*.pack, ensuring that we can reuse bytes out of verbatim later on in the
process[^1].
But 44f9fd6496 was not extended to cover multi-pack reuse, meaning that
this same race condition exists for non-preferred packs during verbatim
reuse. Work around that race in the same way by only marking valid packs
as reuse-able. For packs that aren't reusable, skip over them but
include the number of objects they have to ensure we allocate a large
enough 'reuse' bitmap (e.g. if a pack in the middle of the MIDX
disappeared but we still want to reuse later packs).
Since we're ensuring the validity of these packs within the verbatim
reuse code, we no longer have to special-case the preferred pack and
open it within the open_midx_bitmap_1() function.
An alternative approach to the one taken here would be to open all
MIDX'd packs from within open_midx_bitmap_1(). But that would be both
slower and make the bitmaps less useful, since we can still perform some
pack reuse among the packs that still exist when the *.bitmap is opened.
After applying this patch, we can simulate the new behavior after
instrumenting Git like so:
diff --git a/packfile.c b/packfile.c
index 9560f0a33c..aedce72524 100644
--- a/packfile.c
+++ b/packfile.c
@@ -557,6 +557,11 @@ static int open_packed_git_1(struct packed_git *p)
; /* nothing */
p->pack_fd = git_open(p->pack_name);
+ {
+ const char *delete = getenv("GIT_RACILY_DELETE");
+ if (delete && !strcmp(delete, pack_basename(p)))
+ return -1;
+ }
if (p->pack_fd < 0 || fstat(p->pack_fd, &st))
return -1;
pack_open_fds++;
and adding the following test:
test_expect_success 'disappearing packs' '
git init disappearing-packs &&
(
cd disappearing-packs &&
git config pack.allowPackReuse multi &&
test_commit A &&
test_commit B &&
test_commit C &&
A="$(echo "A" | git pack-objects --revs $packdir/pack-A)" &&
B="$(echo "A..B" | git pack-objects --revs $packdir/pack-B)" &&
C="$(echo "B..C" | git pack-objects --revs $packdir/pack-C)" &&
git multi-pack-index write --bitmap --preferred-pack=pack-A-$A.idx &&
test_pack_objects_reused_all 9 3 &&
test_env GIT_RACILY_DELETE=pack-A-$A.pack \
test_pack_objects_reused_all 6 2 &&
test_env GIT_RACILY_DELETE=pack-B-$B.pack \
test_pack_objects_reused_all 6 2 &&
test_env GIT_RACILY_DELETE=pack-C-$C.pack \
test_pack_objects_reused_all 6 2
)
'
Note that we could relax the single-pack version of this which was most
recently addressed in dc1daacdcc (pack-bitmap: check pack validity when
opening bitmap, 2021-07-23), but only partially. Because we still need
to know the object count in the pack, we'd still have to open the pack's
*.idx, so the savings there are marginal.
Note likewise that we add a new "if (!packs_nr)" early return in the
pack reuse code to avoid a potentially expensive allocation on the
'reuse' bitmap in the case that no packs are available for reuse.
[^1]: Unless we run out of open file handles. If that happens and we are
forced to close the only open file handle of a file that has been
removed from underneath us, there is nothing we can do.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Commit a38edab7c8 (Makefile: generate doc versions via GIT-VERSION-GEN,
2024-12-06) moved these variables from the Makefile to asciidoc.conf.in.
When doing so, some extraneous quotes were added; these are visible in
the generated .xml files, at least, and possibly in other locations:
--- a/tmp/orig-git-bisect.xml
+++ b/Documentation/git-bisect.xml
@@ -5,14 +5,14 @@
<refentry lang="en">
<refentryinfo>
<title>git-bisect(1)</title>
- <date>2024-12-06</date>
-<revhistory><revision><date>2024-12-06</date></revision></revhistory>
+ <date>'2024-12-06'</date>^M
+<revhistory><revision><date>'2024-12-06'</date></revision></revhistory>^M
</refentryinfo>
<refmeta>
<refentrytitle>git-bisect</refentrytitle>
<manvolnum>1</manvolnum>
-<refmiscinfo class="source">Git 2.47.1.409.g9bb10d27e7</refmiscinfo>
-<refmiscinfo class="manual">Git Manual</refmiscinfo>
+<refmiscinfo class="source">'Git 2.47.1.410.ga38edab7c8'</refmiscinfo>^M
+<refmiscinfo class="manual">'Git Manual'</refmiscinfo>^M
</refmeta>
<refnamediv>
<refname>git-bisect</refname>
Signed-off-by: Kyle Lippincott <spectral@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* 'master' of https://github.com/j6t/gitk:
gitk: offer "Copy commit ID to X11 selection" only on X11
gitk: support auto-copy comit ID to primary clipboard
gitk: prefs dialog: refine Auto-select UI
gitk: UI text: change "SHA1 ID" to "Commit ID"
gitk: add text wrapping preferences
gitk: make headings of preferences bold
gitk: check main window visibility before waiting for it to show
gitk: sv.po: Update Swedish translation (323t)
* ah/commit-id-to-clipboard:
gitk: offer "Copy commit ID to X11 selection" only on X11
gitk: support auto-copy comit ID to primary clipboard
gitk: prefs dialog: refine Auto-select UI
gitk: UI text: change "SHA1 ID" to "Commit ID"
Signed-off-by: Johannes Sixt <j6t@kdbg.org>
When the `vcxproj` target was introduced in `config.mak.uname` to allow
building Git with the Visual C toolchain, the `git remote-ext` command
was always executed in its dashed form. Therefore, it was impossible to
pass the test suite unless that command existed in its dashed form, and
we had to special-case this.
Later, when the `vcxproj` target got out of fashion because Visual
Studio gained native support for CMake builds, this special-casing was
copied without questioning it.
But as of 675df192c5f (transport-helper: do not run git-remote-ext etc.
in dashed form, 2020-08-26), the reason for this special-casing no
longer exists. So let's just drop it.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In ccfba9e0c45 (Makefile: use "generate-perl.sh" to massage Perl
library, 2024-12-06), the previous strategy (which avoided spawning a
shell script to transform the files) was replaced by the same
`generate-perl.sh` invocation as for the Makefile-based build.
The only difference is that now the transformation tries to handle the
Perl modules in-place (which ends up in empty files because the same
file is used as input and output via stdin/stdout redirection), and the
Perl script cannot find them anymore because they are not in the
expected place.
Let's put them into the expected place again, i.e. into
`perl/build/lib/` instead of `perl/`.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In e4b488049a5 (Makefile: extract script to massage Perl scripts,
2024-12-06), the code was refactored that is used to transform the Perl
scripts/modules to their final form.
Even the CMake-based build was adjusted, but the change used the file
name `PERL-HEADER` instead of the file name used by the Makefile-based
build (same name but with the `GIT-` prefix). Let's adjust the former to
the latter.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In 7e0730c8baa (t: better support for out-of-tree builds, 2024-12-06)
the strategy was changed from letting `t7609-mergetool--lib.sh`
hard-code the directory where it expects to find the merge tools to
hard-coding that value in the placeholder `@GIT_TEST_MERGE_TOOLS_DIR@`
that is replaced during the build.
However, likely due to a copy/paste mistake (and reviewers missed this,
too), the CMake-based build was adjusted incorrectly, replacing that
placeholder not with the path to the merge tools, but with a Boolean
indicating whether to use a runtime-generated path prefix or not.
Let's fix that, addressing this CMake-build's symptom:
Initialized empty Git repository in D:/a/git/git/t/trash directory.t7609-mergetool--lib/.git/
++ . true/vimdiff
./test-lib.sh: line 1021: true/vimdiff: No such file or directory
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In 7e0730c8baa (t: better support for out-of-tree builds, 2024-12-06),
the `bin-wrappers/` strategy was changed so that it no longer hard-codes
the template directory to be `@BUILD_DIR@/templates/blt`, but instead
interpolates the `@TEMPLATE_DIR@` placeholder during the build.
However, this commit only adjusted the `Makefile`-based build.
Let's adjust the CMake-based build as well. This fixes t0000.15 which
would otherwise fail with:
++ echo ''\''t1234-verbose/err'\'' is not empty, it contains:'
't1234-verbose/err' is not empty, it contains:
++ cat t1234-verbose/err
warning: templates not found in @TEMPLATE_DIR@
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
It used to be the case that initializing the minimal SDK (i.e. a
radically slimmed-down subset of Git for Windows' development
environment intended to perform the CI builds and little else) took
a bit over one minute, would then be cached, and subsequent jobs would
take at most half a dozen seconds to initialize said minimal SDK.
It is important that this step is fast because we have to run the test
suite in parallel, in a set of matrix jobs, to offset the slowness of
the shell-based test suite, and each and every job has to initialize the
very same minimal SDK.
While it may sound as if parallelizing the jobs might only waste the
generously-provided build minutes but at least the _wallclock_ time
would pass quick, in reality it matters a lot: Frequently Git for
Windows' or GitGitGadget PRs get stuck waiting for quite a while before
CI builds start because other PRs' builds still spend substantial
amounts of time to run, blocking due to the concurrency limit being
reached.
Since 91839a88277 (ci: create script to set up Git for Windows SDK,
2024-10-09), the situation has worsened: every job that requires the
minimal Git for Windows SDK spends roughly two-and-a-half minutes doing
so.
With the switch away from the GitHub Action `setup-git-for-windows-sdk`,
we incurred more downsides:
- It is no longer possible for said Action to fix problems independently
from the Git repository, e.g. when new rules about GitHub Actions
require changes in the way the minimal SDK is initialized.
- The minimal SDK was installed specifically outside of the worktree so
as not to clutter it nor incur an additional cost to verify that the
worktree is clean.
Therefore, even if it would be nice to have a shared process between
GitHub and GitLab based CI builds, let's switch the GitHub-based CI back
to the tried-and-tested `setup-git-for-windows-sdk` Action.
This commit partially reverts 91839a88277 (ci: create script to set up
Git for Windows SDK, 2024-10-09).
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In 391bceae435 (compat/mingw: support POSIX semantics for atomic
renames, 2024-10-27), we taught the `mingw_rename()` function to respect
POSIX semantics, but we did so only as a fallback after `_wrename()`
fails.
This hid a bug in the implementation that was not caught by Git's test
suite: The `CreateFileW()` function _can_ open handles to directories,
but not when asked to use the `FILE_ATTRIBUTE_NORMAL` flag, as that flag
only is allowed for files.
Let's fix this by using the common `FILE_FLAG_BACKUP_SEMANTICS` flag
that can be used for opening handles to directories, too.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The `git refs migrate` command was introduced in
25a0023f28 (builtin/refs: new command to migrate ref storage formats,
2024-06-06) to support migrating from one reference backend to another.
One limitation of the command was that it didn't support migrating
repositories which contained reflogs. A previous commit, added support
for adding reflog updates in ref transactions. Using the added
functionality bake in reflog support for `git refs migrate`.
To ensure that the order of the reflogs is maintained during the
migration, we add the index for each reflog update as we iterate over
the reflogs from the old reference backend. This is to ensure that the
order is maintained in the new backend.
Helped-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The reference transaction only allows a single update for a given
reference to avoid conflicts. This, however, isn't an issue for reflogs.
There are no conflicts to be resolved in reflogs and when migrating
reflogs between backends we'd have multiple reflog entries for the same
refname.
So allow multiple reflog updates within a single transaction. Also the
reflog creation logic isn't exposed to the end user. While this might
change in the future, currently, this reduces the scope of issues to
think about.
In the reftable backend, the writer sorts all updates based on the
update_index before writing to the block. When there are multiple
reflogs for a given refname, it is essential that the order of the
reflogs is maintained. So add the `index` value to the `update_index`.
The `index` field is only set when multiple reflog entries for a given
refname are added and as such in most scenarios the old behavior
remains.
This is required to add reflog migration support to `git refs migrate`.
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Introduce a new function `ref_transaction_update_reflog`, for clients to
add a reflog update to a transaction. While the existing function
`ref_transaction_update` also allows clients to add a reflog entry, this
function does a few things more, It:
- Enforces that only a reflog entry is added and does not update the
ref itself.
- Allows the users to also provide the committer information. This
means clients can add reflog entries with custom committer
information.
The `transaction_refname_valid()` function also modifies the error
message selectively based on the type of the update. This change also
affects reflog updates which go through `ref_transaction_update()`.
A follow up commit will utilize this function to add reflog support to
`git refs migrate`.
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The `ref_transaction_add_update()` creates the `ref_update` struct. To
facilitate addition of reflogs in the next commit, the function needs to
accommodate setting the `committer_info` field in the struct. So modify
the function to also take `committer_info` as an argument and set it
accordingly.
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Unless the `REF_SKIP_REFNAME_VERIFICATION` flag is set for an update,
the refname of the update is verified for:
- Ensuring it is not a pseudoref.
- Checking the refname format.
These checks will also be needed in a following commit where the
function to add reflog updates to the transaction is introduced. Extract
the code out into a new static function.
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When refs are updated in the files-backend, a lock is obtained for the
corresponding file path. This is the case even for reflogs, i.e. a lock
is obtained on the reference path instead of the reflog path. This
works, since generally, reflogs are updated alongside the ref.
The upcoming patches will add support for reflog updates in ref
transaction. This means, in a particular transaction we want to have ref
updates and reflog updates. For a given ref in a given transaction there
can be at most one update. But we can theoretically have multiple reflog
updates for a given ref in a given transaction. A great example of this
would be when migrating reflogs from one backend to another. There we
would batch all the reflog updates for a given reference in a single
transaction.
The current flow does not support this, because currently refs & reflogs
are treated as a single entity and capture the lock together. To
separate this, add a count field to ref_lock. With this, multiple
updates can hold onto a single ref_lock and the lock will only be
released when all of them release the lock.
This patch only adds the `count` field to `ref_lock` and adds the logic
to increment and decrement the lock. In a follow up commit, we'll
separate the reflog update logic from ref updates and utilize this
functionality.
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The reftable backend, sorts its updates by refname before applying them,
this ensures that the references are stored sorted. When migrating
reflogs from one backend to another, the order of the reflogs must be
maintained. Add a new `index` field to the `ref_update` struct to
facilitate this.
This field is used in the reftable backend's sort comparison function
`transaction_update_cmp`, to ensure that indexed fields maintain their
order.
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The reference backends obtain the committer information from
`git_committer_info(0)` when adding a reflog. The upcoming patches
introduce support for migrating reflogs between the reference backends.
This requires an interface to creating reflogs, including custom
committer information.
Add a new field `committer_info` to the `ref_update` struct, which is
then used by the reference backends. If there is no `committer_info`
provided, the reference backends default to using
`git_committer_info(0)`. The field itself cannot be set to
`git_committer_info(0)` since the values are dynamic and must be
obtained right when the reflog is being committed.
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Just like `git log`, now also `git range-diff` has that option as a
shortcut for the common operation that would otherwise require the quite
unwieldy (if theoretically "more correct") `--diff-mode=remerge` option.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The `git log` command already offers support for including diffs for
merges, via the `--diff-merges=<format>` option.
Let's add corresponding support for `git range-diff`, too. This makes it
more convenient to spot differences between commit ranges that contain
merges.
This is especially true in scenarios with non-trivial merges, i.e.
merges introducing changes other than, or in addition to, what merge ORT
would have produced. Merging a topic branch that changes a function
signature into a branch that added a caller of that function, for
example, would require the merge commit itself to adjust that caller to
the modified signature.
In my code reviews, I found the `--diff-merges=remerge` option
particularly useful.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Build procedure update plus introduction of Meson based builds.
* ps/build: (24 commits)
Introduce support for the Meson build system
Documentation: add comparison of build systems
t: allow overriding build dir
t: better support for out-of-tree builds
Documentation: extract script to generate a list of mergetools
Documentation: teach "cmd-list.perl" about out-of-tree builds
Documentation: allow sourcing generated includes from separate dir
Makefile: simplify building of templates
Makefile: write absolute program path into bin-wrappers
Makefile: allow "bin-wrappers/" directory to exist
Makefile: refactor generators to be PWD-independent
Makefile: extract script to generate gitweb.js
Makefile: extract script to generate gitweb.cgi
Makefile: extract script to massage Python scripts
Makefile: extract script to massage Shell scripts
Makefile: use "generate-perl.sh" to massage Perl library
Makefile: extract script to massage Perl scripts
Makefile: consistently use PERL_PATH
Makefile: generate doc versions via GIT-VERSION-GEN
Makefile: generate "git.rc" via GIT-VERSION-GEN
...
Fix performance regression of a recent "fatten promisor pack with
local objects" protection against an unwanted gc.
* jt/fix-fattening-promisor-fetch:
index-pack --promisor: also check commits' trees
index-pack --promisor: don't check blobs
index-pack --promisor: dedup before checking links
The syntax ":/<text>" to name the latest commit with the matching
text was broken with a recent change, which has been corrected.
* ps/commit-with-message-syntax-fix:
object-name: fix reversed ordering with ":/<text>" revisions
The advice messages now tell the newer 'git config set' command to
set the advice.token configuration variable to squelch a message.
* bf/explicit-config-set-in-advice-messages:
advice: suggest using subcommand "git config set"
"git tag" has been taught to refuse to create refs/tags/HEAD
as such a tag will be confusing in the context of UI provided by
the Git Porcelain commands.
* jc/forbid-head-as-tagname:
tag: "git tag" refuses to use HEAD as a tagname
t5604: do not expect that HEAD can be a valid tagname
refs: drop strbuf_ prefix from helpers
refs: move ref name helpers around
"git describe" optimization.
* jk/describe-perf:
describe: split "found all tags" and max_candidates logic
describe: stop traversing when we run out of names
describe: stop digging for max_candidates+1
t/perf: add tests for git-describe
t6120: demonstrate weakness in disjoint-root handling
This option is only useful where a selection clipboard is available, which
is only the case on X11. Do not clutter the UI in other environments.
Signed-off-by: Johannes Sixt <j6t@kdbg.org>
The --ancestry-path option is designed to be given a commit that is
on the path, which was not documented, which has been corrected.
* kk/doc-ancestry-path:
doc: mention rev-list --ancestry-path restrictions
Yet another "pass the repository through the callchain" topic.
* kn/midx-wo-the-repository:
midx: inline the `MIDX_MIN_SIZE` definition
midx: pass down `hash_algo` to functions using global variables
midx: pass `repository` to `load_multi_pack_index`
midx: cleanup internal usage of `the_repository` and `the_hash_algo`
midx-write: pass down repository to `write_midx_file[_only]`
write-midx: add repository field to `write_midx_context`
midx-write: use `revs->repo` inside `read_refs_snapshot`
midx-write: pass down repository to static functions
packfile.c: remove unnecessary prepare_packed_git() call
midx: add repository to `multi_pack_index` struct
config: make `packed_git_(limit|window_size)` non-global variables
config: make `delta_base_cache_limit` a non-global variable
packfile: pass down repository to `for_each_packed_object`
packfile: pass down repository to `has_object[_kept]_pack`
packfile: pass down repository to `odb_pack_name`
packfile: pass `repository` to static function in the file
packfile: use `repository` from `packed_git` directly
packfile: add repository to struct `packed_git`
Backport oss-fuzz tests for us to our codebase.
* es/oss-fuzz:
fuzz: port fuzz-url-decode-mem from OSS-Fuzz
fuzz: port fuzz-parse-attr-line from OSS-Fuzz
fuzz: port fuzz-credential-from-url-gently from OSS-Fuzz
"git fast-import" learned to reject paths with ".." and "." as
their components to avoid creating invalid tree objects.
* en/fast-import-verify-path:
t9300: test verification of renamed paths
fast-import: disallow more path components
fast-import: disallow "." and ".." path components
"git bundle --unbundle" and "git clone" running on a bundle file
both learned to trigger fsck over the new objects with configurable
fck check levels.
* jt/bundle-fsck:
transport: propagate fsck configuration during bundle fetch
fetch-pack: split out fsck config parsing
bundle: support fsck message configuration
bundle: add bundle verification options type
To show a remerge diff, the merge needs to be recreated. For that to
work, the merge base(s) need to be found, which means that the commits'
parents have to be traversed until common ancestors are found (if any).
However, one optimization that hails all the way back to cb115748ec0d
(Some more memory leak avoidance, 2006-06-17) is to release the commit's
list of parents immediately after showing it _and to set that parent
list to `NULL`_. This can break the merge base computation.
This problem is most obvious when traversing the commits in reverse: In
that instance, if a parent of a merge commit has been shown as part of
the `git log` command, by the time the merge commit's diff needs to be
computed, that parent commit's list of parent commits will have been set
to `NULL` and as a result no merge base will be found (even if one
should be found).
Traversing commits in reverse is far from the only circumstance in which
this problem occurs, though. There are many avenues to traversing at
least one commit in the revision walk that will later be part of a merge
base computation, for example when not even walking any revisions in
`git show <merge1> <merge2>` where `<merge1>` is part of the commit
graph between the parents of `<merge2>`.
Another way to force a scenario where a commit is traversed before it
has to be traversed again as part of a merge base computation is to
start with two revisions (where the first one is reachable from the
second but not in a first-parent ancestry) and show the commit log with
`--topo-order` and `--first-parent`.
Let's fix this by special-casing the `remerge_diff` mode, similar to
what we did with reflogs in f35650dff6a4 (log: do not free parents when
walking reflog, 2017-07-07).
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Wire up CI builds for both GitLab and GitHub that use the Meson build
system.
While the setup is mostly trivial, one gotcha is the test output
directory used to be in "t/", but now it is contained in the build
directory. To unify the logic across Makefile- and Meson-based builds we
explicitly set up the `TEST_OUTPUT_DIRECTORY` variable so that it is the
same for both build systems.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Our unit tests that don't yet use the clar unit testing framework ignore
any option that they do not understand. It is thus fine to just pass
test options we set up globally to those unit tests as they are simply
ignored. This makes our life easier because we don't have to special
case those options with Meson, where test options are set up globally
via `meson test --test-args=`.
But our clar-based unit testing framework is way stricter here and will
fail in case it is passed an unknown option. Stub out these options with
no-ops to make our life a bit easier.
Note that this also requires us to remove the `-x` short option for
`--exclude`. This is because `-x` has another meaning in our integration
tests, as it enables shell tracing. I doubt there are a lot of people
out there using it as we only got a small hand full of clar tests in the
first place. So better change it now so that we can in the long run
improve compatibility between the two different test drivers.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Both t9835 and t9836 exercise git-p4, but one exercises Python 2 whereas
the other one uses Python 3. These tests do not exercise "git p4", but
instead they use "git p4.py". This calls the unbuilt version of
"git-p4.py" that still has the "#!/usr/bin/env python" shebang, which
allows the test to modify which Python version comes first in $PATH,
making it possible to force a Python version.
But "git-p4.py" is not in our PATH during out-of-tree builds, and thus
we cannot locate "git-p4.py". The tests thus break with CMake and Meson.
Fix this by instead manually setting up script wrappers that invoke the
respective Python interpreter directly.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In the preceding commit, we have introduced consistency checks to Meson
to detect any discrepancies with missing or extraneous tests in its
build instructions. These checks only get executed in Meson though, so
any users of our Makefiles wouldn't be alerted of the fact that they
have to modify the Meson build instructions in case they add or remove
any tests.
Add a comparable test target to our Makefile to plug this gap.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
It is quite easy for the list of integration tests to go out-of-sync
without anybody noticing. Introduce a new configure-time check that
verifies that all tests are wired up properly.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
All of the code files for unit tests using the self-grown unit testing
framework have a "t-" prefix to their name. This makes it easy to
identify them and use globbing in our Makefile and in other places. On
the other hand though, our clar-based unit tests have no prefix at all
and thus cannot easily be discerned from other files in the unit test
directory.
Introduce a new "u-" prefix for clar-based unit tests. This prefix will
be used in a subsequent commit to easily identify such tests.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The -DSUPPRESS_ANNOTATED_LEAKS preprocessor directive was used to enable
our `UNLEAK()` macro in the past, which marks memory as still-reachable
so that the leak sanitizer does not complain. Starting with 52c7dbd036
(git-compat-util: drop now-unused `UNLEAK()` macro, 2024-11-20) this
macro has been removed, and thus the preprocessor directive is not
required anymore, either.
Drop it.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Update `create_failed_test_artifacts ()` so that it can handle arbitrary
test output directories. This fixes creation of these artifacts for
macOS on GitLab CI, which uses a separate output directory already. This
will also be used by our out-of-tree builds with Meson.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Auto-select ("Copy commit ID to X11 selection") is useful when a
selection cliboard exists, but otherwise generally meaningless, for
instance on Windows.
Add a similar pref and behavior which copies the commit ID to the
primary clipboard - for platforms without a selection clipboard, but
which can also be useful additionally on platforms with selection.
Note that while autoselect is enabled by default, autocopy isn't.
That's because the selection clipboard is typically dispensable, while
the primary clipboard can be considered a more precious resource,
which we don't want to (clear and) overwrite by default.
Signed-off-by: Avi Halachmi (:avih) <avihpit@yahoo.com>
Tl;DR: change Auto-select text, move the length input to a new line.
The Auto-select preference auto-selects [part of] the commit ID text
at the respective widget on startup, and when the current commit at
the graph changes.
Its real premise, however, is to populate the selection clipboard
with the commit ID. Consider, for instance, how meaningless it is on
platforms without a selection clipboard - like Windows or macOS (on
Windows the selection is not even visible with the default Tk theme,
because it's only visible in focused widgets - which the commit ID
widget is not during normal application of this selection).
So rename the Auto-select label to "Copy commit ID to X11 selection",
to reflect better the ultimate outcome of its application
Note that there exists other, non-X11 platforms with a selection
clipboard, like Wayland, and if a native Tk client exists on such
platforms, then the description will not be accurate, but hopefully
it's not too misleading either.
Additionally, move the length input widget to a new line, because:
- This length applies to both Auto-select and "Copy commit reference"
context menu item, so it's not exclusive to the selection length.
- The next commit will add support for primary clipboard as well,
where this length will also be used.
Also, move the "Hide remotes" item above these selection prefs, to
keep the selection prefs semi-grouped before the spacing of the
following title "Diff display options".
Signed-off-by: Avi Halachmi (:avih) <avihpit@yahoo.com>
SHA1 might not stay forever, and plans to use SHA256 already exist,
so use the official name for it - "Commit ID".
Only visible UI texts are modified to reduce the noise when using
git-blame, while comments and variable names still contain SHA1/sha1.
Signed-off-by: Avi Halachmi (:avih) <avihpit@yahoo.com>
The changes in commit c06793a4ed (allow git-bundle to create bottomless
bundle, 2007-08-08) ensure annotated tags are properly preserved when
creating a bundle using a revision range operation.
At the time the range notation would peel the ends to their
corresponding commit, meaning ref v2.0 would point to the v2.0^0 commit.
So the above workaround was introduced. This code looks up the ref
before it's written to the bundle, and if the ref doesn't point to the
object we expect (for tags this would be a tag object), we skip the ref
from the bundle. Instead, when the ref is a tag that's the positive end
of the range (e.g. v2.0 from the range "v1.0..v2.0"), then that ref is
written to the bundle instead.
Later, in 895c5ba3c1 (revision: do not peel tags used in range notation,
2013-09-19), the behavior of parsing ranges was changed and the problem
was fixed at the cause. But the workaround in bundle.c was not reverted.
Now it seems this workaround can cause a race condition. git-bundle(1)
uses setup_revisions() to parse the input into `struct rev_info`. Later,
in write_bundle_refs(), it uses this info to write refs to the bundle.
As mentioned at this point each ref is looked up again and checked
whether it points to the object we expect. If not, the ref is not
written to the bundle. But, when creating a bundle in a heavy traffic
repository (a repo with many references, and frequent ref updates) it's
possible a branch ref was updated between setup_revisions() and
write_bundle_refs() and thus the extra check causes the ref to be
skipped.
The workaround was originally added to deal with tags, but the code path
also gets hit by non-tag refs, causing this race condition. Because it's
no longer needed, remove it and fix the possible race condition.
Signed-off-by: Toon Claes <toon@iotcl.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Whenever we source "ci/lib.sh" we wrap the directives in a separate
group so that they can easily be collapsed in the web UI. And as we
source the script multiple times during a single CI run we thus end up
with the same section name reused multiple times, as well.
This is broken on GitLab CI though, where reusing the same group name is
not supported. The consequence is that only the last of these sections
can be collapsed.
Fix this issue by including the name of the sourcing script in the
group's name.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We use printf to set up sections with GitLab CI, which requires us to
print a bunch of escape sequences via printf. The group name is
controlled by the user and is expanded directly into the formatting
string, which may cause problems in case the argument contains escape
sequences or formatting directives.
Fix this potential issue by using formatting directives to pass variable
data.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We exlicitly trap on EXIT in order to end the "CI setup" group. This
isn't necessary though given that `begin_group ()` already sets up the
trap for us.
Remove the duplicate trap.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The macOS Ventura images we use for GitLab CI runners have been
deprecated. Update them to macOS 14, aka Sonoma.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* ps/build: (24 commits)
Introduce support for the Meson build system
Documentation: add comparison of build systems
t: allow overriding build dir
t: better support for out-of-tree builds
Documentation: extract script to generate a list of mergetools
Documentation: teach "cmd-list.perl" about out-of-tree builds
Documentation: allow sourcing generated includes from separate dir
Makefile: simplify building of templates
Makefile: write absolute program path into bin-wrappers
Makefile: allow "bin-wrappers/" directory to exist
Makefile: refactor generators to be PWD-independent
Makefile: extract script to generate gitweb.js
Makefile: extract script to generate gitweb.cgi
Makefile: extract script to massage Python scripts
Makefile: extract script to massage Shell scripts
Makefile: use "generate-perl.sh" to massage Perl library
Makefile: extract script to massage Perl scripts
Makefile: consistently use PERL_PATH
Makefile: generate doc versions via GIT-VERSION-GEN
Makefile: generate "git.rc" via GIT-VERSION-GEN
...
* ps/build: (24 commits)
Introduce support for the Meson build system
Documentation: add comparison of build systems
t: allow overriding build dir
t: better support for out-of-tree builds
Documentation: extract script to generate a list of mergetools
Documentation: teach "cmd-list.perl" about out-of-tree builds
Documentation: allow sourcing generated includes from separate dir
Makefile: simplify building of templates
Makefile: write absolute program path into bin-wrappers
Makefile: allow "bin-wrappers/" directory to exist
Makefile: refactor generators to be PWD-independent
Makefile: extract script to generate gitweb.js
Makefile: extract script to generate gitweb.cgi
Makefile: extract script to massage Python scripts
Makefile: extract script to massage Shell scripts
Makefile: use "generate-perl.sh" to massage Perl library
Makefile: extract script to massage Perl scripts
Makefile: consistently use PERL_PATH
Makefile: generate doc versions via GIT-VERSION-GEN
Makefile: generate "git.rc" via GIT-VERSION-GEN
...
Every switch and option which is passed to git-submodule.sh has a
corresponding variable which is set accordingly; by convention, the name
of the variable is the option name (for example, "--jobs" and "$jobs").
Rename "$custom_name", "$deinit_all" and "$nofetch", for consistency.
Signed-off-by: Roy Eldar <royeldar0@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When git-submodule.sh parses various options and switches, it sets some
variables to values; the variables in turn affect the options given to
git-submodule--helper.
Currently, variables which correspond to switches have boolean values
(for example, whenever "--force" is passed, force=1), while variables
which correspond to options which take arguments have string values that
sometimes contain the option name and sometimes only the option value.
Set all of the variables to strings which contain the option name (e.g.
force="--force" rather than force=1); this has a couple of advantages:
it improves consistency, readability and debuggability.
Suggested-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Roy Eldar <royeldar0@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add a couple of comments in a few functions where they were missing.
Signed-off-by: Roy Eldar <royeldar0@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Remove the variable "$diff_cmd" which is no longer used.
Signed-off-by: Roy Eldar <royeldar0@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
It's entirely unnecessary to check whether the argument given to an
option (i.e. --summary-limit) is valid in the shell wrapper, since it's
already done when parsing the various options in git-submodule--helper.
Remove this check from the script; this both improves consistency
throughout the script, and the error message shown to the user in case
some invalid non-numeric argument was passed to "--summary-limit" is
more informative as well.
Signed-off-by: Roy Eldar <royeldar0@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Some command-line options have a short form which takes an argument; for
example, "--jobs" has the form "-j", and it takes a numerical argument.
When parsing short options, support the case where there is no space
between the flag and the option argument, in order to improve
consistency with the rest of the builtin git commands.
Signed-off-by: Roy Eldar <royeldar0@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Some command-line options have a long form which takes an argument. In
this case, the argument can be given right after `='; for example,
"--depth" takes a numerical argument, which can be given as "--depth=X".
Support the case where the argument is given right after `=' for all
long options, in order to improve consistency throughout the script.
Signed-off-by: Roy Eldar <royeldar0@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Optimize reading random references out of the reftable backend by
allowing reuse of iterator objects.
* ps/reftable-iterator-reuse:
refs/reftable: reuse iterators when reading refs
reftable/merged: drain priority queue on reseek
reftable/stack: add mechanism to notify callers on reload
refs/reftable: refactor reflog expiry to use reftable backend
refs/reftable: refactor reading symbolic refs to use reftable backend
refs/reftable: read references via `struct reftable_backend`
refs/reftable: figure out hash via `reftable_stack`
reftable/stack: add accessor for the hash ID
refs/reftable: handle reloading stacks in the reftable backend
refs/reftable: encapsulate reftable stack
Isolates the reftable subsystem from the rest of Git's codebase by
using fewer pieces of Git's infrastructure.
* ps/reftable-detach:
reftable/system: provide thin wrapper for lockfile subsystem
reftable/stack: drop only use of `get_locked_file_path()`
reftable/system: provide thin wrapper for tempfile subsystem
reftable/stack: stop using `fsync_component()` directly
reftable/system: stop depending on "hash.h"
reftable: explicitly handle hash format IDs
reftable/system: move "dir.h" to its only user
Loosen overly strict ownership check introduced in the recent past,
to keep the promise "cloning a suspicious repository is a safe
first step to inspect it".
* bc/allow-upload-pack-from-other-people:
Allow cloning from repositories owned by another user
End-user experience of "git mergetool" when the command errors out
has been improved.
* pb/mergetool-errors:
git-difftool--helper.sh: exit upon initialize_merge_tool errors
git-mergetool--lib.sh: add error message for unknown tool variant
git-mergetool--lib.sh: add error message if 'setup_user_tool' fails
git-mergetool--lib.sh: use TOOL_MODE when erroring about unknown tool
completion: complete '--tool-help' in 'git mergetool'
Describe a case where an option value needs to be spelled as a
separate argument, i.e. "--opt val", not "--opt=val".
* jc/doc-opt-tilde-expand:
doc: option value may be separate for valid reasons
Drop support for ancient environments in various CI jobs.
* bc/ancient-ci:
Add additional CI jobs to avoid accidental breakage
ci: remove clause for Ubuntu 16.04
gitlab-ci: switch from Ubuntu 16.04 to 20.04
We use a singleton empty array to initialize a `struct strvec`;
similar to the empty string singleton we use to initialize a `struct
strbuf`.
Note that an empty strvec instance (with zero elements) does not
necessarily need to be an instance initialized with the singleton.
Let's refer to strvec instances initialized with the singleton as
"empty-singleton" instances.
As a side note, this is the current `strvec_pop()`:
void strvec_pop(struct strvec *array)
{
if (!array->nr)
return;
free((char *)array->v[array->nr - 1]);
array->v[array->nr - 1] = NULL;
array->nr--;
}
So, with `strvec_pop()` an instance can become empty but it does
not going to be the an "empty-singleton".
This "empty-singleton" circumstance requires us to be careful when
adding elements to instances. Specifically, when adding the first
element: when we detach the strvec instance from the singleton and
set the internal pointer in the instance to NULL. After this point we
apply `realloc()` on the pointer. We do this in
`strvec_push_nodup()`, for example.
The recently introduced `strvec_splice()` API is expected to be
normally used with non-empty strvec's. However, it can also end up
being used with "empty-singleton" strvec's:
struct strvec arr = STRVEC_INIT;
int a = 0, b = 0;
... no modification to arr, a or b ...
const char *rep[] = { "foo" };
strvec_splice(&arr, a, b, rep, ARRAY_SIZE(rep));
So, we'll try to add elements to an "empty-singleton" strvec instance.
Avoid misapplying `realloc()` to the singleton in `strvec_splice()` by
adding a special case for strvec's initialized with the singleton.
Signed-off-by: Rubén Justo <rjusto@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Commit c08589efdc (index-pack: repack local links into promisor packs,
2024-11-01) seems to contain an oversight in that the tree of a commit
is not checked. Teach git to check these trees.
The fix slows down a fetch from a certain repo at $DAYJOB from 2m2.127s
to 2m45.052s, but in order to make the fetch correct, it seems worth it.
In order to test this, we could create server and client repos as
follows...
C S
\ /
O
(O and C are commits both on the client and server. S is a commit
only on the server. C and S have the same tree but different commit
messages. The diff between O and C is non-zero.)
...and then, from the client, fetch S from the server.
In theory, the client declares "have C" and the server can use this
information to exclude S's tree (since it knows that the client has C's
tree, which is the same as S's tree). However, it is also possible for
the server to compute that it needs to send S and not O, and proceed
from there; therefore the objects of C are not considered at all when
determining what to send in the packfile. In order to prevent a test of
client functionality from having such a dependence on server behavior, I
have not included such a test.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
As a follow-up to the parent of this commit, it was found that not
checking for the existence of blobs linked from trees sped up the fetch
from 24m47.815s to 2m2.127s. Teach Git to do that.
The tradeoff of not checking blobs is documented in a code comment.
(Blobs may also be linked from tag objects, but it is impossible to know
the type of an object linked from a tag object without looking it up in
the object database, so the code for that is untouched.)
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Commit c08589efdc (index-pack: repack local links into promisor packs,
2024-11-01) fixed a bug with what was believed to be a negligible
decrease in performance [1] [2]. But at $DAYJOB, with at least one repo,
it was found that the decrease in performance was very significant.
Looking at the patch, whenever we parse an object in the packfile to
be indexed, we check the targets of all its outgoing links for its
existence. However, this could be optimized by first collecting all such
targets into an oidset (thus deduplicating them) before checking. Teach
Git to do that.
On a certain fetch from the aforementioned repo, this improved
performance from approximately 7 hours to 24m47.815s. This number will
be further reduced in a subsequent patch.
[1] https://lore.kernel.org/git/CAG1j3zGiNMbri8rZNaF0w+yP+6OdMz0T8+8_Wgd1R_p1HzVasg@mail.gmail.com/
[2] https://lore.kernel.org/git/20241105212849.3759572-1-jonathantanmy@google.com/
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Git documentation refers to $HOME and $XDG_CONFIG_HOME often, but does
not specify how or where these values come from on Windows where neither
is set by default. The new documentation reflects the behavior of
setup_windows_environment() in compat/mingw.c.
Signed-off-by: Alejandro Barreto <alejandro.barreto@ni.com>
Signed-off-by: M Hickford <mirth.hickford@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add a new preference "wrapdefault" which allows enabling char/word wrap.
Impacts all text in the ctext widget for which no other preference exists.
Also make the (existing) preference "wrapcomment" configurable graphically.
Its setting impacts only the "comment" part of the ctext widget.
Signed-off-by: Christoph Sommer <sommer@cms-labs.org>
Signed-off-by: Johannes Sixt <j6t@kdbg.org>
Make preference groups like "Diff display options" stand out more.
Signed-off-by: Christoph Sommer <sommer@cms-labs.org>
Signed-off-by: Johannes Sixt <j6t@kdbg.org>
This change makes non-ascii console output (eg server messages in the
`git push` command output) properly render in the git gui windows.
Fixes: https://github.com/prati0100/git-gui/issues/68
Signed-off-by: Yuri Konotopov <ykonotopov@gnome.org>
Recently it was reported [1] that "look for the youngest commit
reachable from any ref with log message that match the given
pattern" syntax (i.e. ':/<text>') started to return results in
reverse recency order. This regression was introduced in Git v2.47.0
and is caused by a memory leak fix done in 57fb139b5e (object-name:
fix leaking commit list items, 2024-08-01).
The intent of the identified commit is to stop modifying the commit list
provided by the caller such that the caller can properly free all commit
list items, including those that the called function might potentially
remove from the list. This was done by creating a copy of the passed-in
commit list and modifying this copy instead of the caller-provided list.
We already knew to create such a copy beforehand with the `backup` list,
which was used to clear the `ONELINE_SEEN` commit mark after we were
done. So the refactoring simply renamed that list to `copy` and started
to operate on that list instead. There is a gotcha though: the backup
list, and thus now also the copied list, is always being prepended to,
so the resulting list is in reverse order! The end result is that we
pop commits from the wrong end of the commit list, returning commits in
reverse recency order.
Fix the bug by appending to the list instead.
[1]: <CAKOEJdcPYn3O01p29rVa+xv=Qr504FQyKJeSB-Moze04ViCGGg@mail.gmail.com>
Reported-by: Aarni Koskela <aarni@valohai.com>
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In 3f763ddf28 (fetch: set remote/HEAD if it does not exist,
2024-11-22), git-fetch learned to opportunistically set $REMOTE/HEAD
when fetching by always asking for remote HEAD, in the hope that it
will help setting refs/remotes/<name>/HEAD if missing.
But it is not needed to always ask for remote HEAD. When we are
fetching from a remote, for which we have remote-tracking branches,
we do need to know about HEAD. But if we are doing one-shot fetch,
e.g.,
$ git fetch --tags https://github.com/git/git
we do not even know what sub-hierarchy of refs/remotes/<remote>/
we need to adjust the remote HEAD for. There is no need to ask for
HEAD in such a case.
Incidentally, because the unconditional request to list "HEAD"
affected the number of ref-prefixes requested in the ls-remote
request, this affected how the requests for tags are added to the
same ls-remote request, breaking "git fetch --tags $URL" performed
against a URL that is not configured as a remote.
Reported-by: Josh Steadmon <steadmon@google.com>
[jc: tests are also borrowed from Josh's patch]
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Each reftable addition has an associated update_index. While writing
refs, the update_index is verified to be within the range of the
reftable writer, i.e. `writer.min_update_index <= ref.update_index` and
`writer.max_update_index => ref.update_index`.
The corresponding check for reflogs in `reftable_writer_add_log` is
however missing. Add a similar check, but only check for the upper
limit. This is because reflogs are treated a bit differently than refs.
Each reflog entry in reftable has an associated update_index and we also
allow expiring entries in the middle, which is done by simply writing a
new reflog entry with the same update_index. This means, writing reflog
entries with update_index lesser than the writer's update_index is an
expected scenario.
Add a new unit test to check for the limits and fix some of the existing
tests, which were setting arbitrary values for the update_index by
ensuring they stay within the now checked limits.
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Introduce support for the Meson build system, a "modern" meta build
system that supports many different platforms, including Linux, macOS,
Windows and BSDs. Meson supports different backends, including Ninja,
Xcode and Microsoft Visual Studio. Several common IDEs provide an
integration with it.
The biggest contender compared to Meson is probably CMake as outlined in
our "Documentation/technical/build-systems.txt" file. Based on my own
personal experience from working with both build systems extensively I
strongly favor Meson over CMake. In my opinion, it feels significantly
easier to use with a syntax that feels more like a "real" programming
language. The second big reason is that Meson supports Rust natively,
which may prove to be important given that the project may pick up Rust
as another language eventually.
Using Meson is rather straight-forward. An example:
```
# Meson uses out-of-tree builds. You can set up multiple build
# directories, how you name them is completely up to you.
$ mkdir build
$ cd build
$ meson setup .. -Dprefix=/tmp/git-installation
# Build the project. This also provides several other targets like
e.g. `install` or `test`.
$ ninja
# Meson has been wired up to support execution of our test suites.
# Both our unit tests and our integration tests are supported.
# Running `meson test` without any arguments will execute all tests,
# but the syntax supports globbing to select only some tests.
$ meson test 't-*'
# Execute single test interactively to allow for debugging.
$ meson test 't0000-*' --interactive --test-args=-ix
```
The build instructions have been successfully tested on the following
systems, tests are passing:
- Apple macOS 10.15.
- FreeBSD 14.1.
- NixOS 24.11.
- OpenBSD 7.6.
- Ubuntu 24.04.
- Windows 10 with Cygwin.
- Windows 10 with MinGW64, except for t9700, which is also broken with
our Makefile.
- Windows 10 with Visual Studio 2022 toolchain, using the Native Tools
Command Prompt with `meson setup --vsenv`. Tests pass, except for
t9700.
- Windows 10 with Visual Studio 2022 solution, using the Native Tools
Command Prompt with `meson setup --backend vs2022`. Tests pass,
except for t9700.
- Windows 10 with VS Code, using the Meson plug-in.
It is expected that there will still be rough edges in the current
version. If this patch lands the expectation is that it will coexist
with our other build systems for a while. Like this, distributions can
slowly migrate over to Meson and report any findings they have to us
such that we can continue to iterate. A potential cutoff date for other
build systems may be Git 3.0.
Some notes:
- The installed distribution is structured somewhat differently than
how it used to be the case. All of our binaries are installed into
`$libexec/git-core`, while all binaries part of `$bindir` are now
symbolic links pointing to the former. This rule is consistent in
itself and thus easier to reason about.
- We do not install dashed binaries into `$libexec/git-core` anymore,
so there won't e.g. be a symlink for git-add(1). These are not
required by modern Git and there isn't really much of a use case for
those anymore. By not installing those symlinks we thus start the
deprecation of this layout.
- We're targeting Meson 1.3.0, which has been released relatively
recently November 2023. The only feature we use from that version is
`fs.relative_to()`, which we could replace if necessary. If so, we
could start to target Meson 1.0.0 and newer, released in December
2022.
- The whole build instructions count around 3300 lines, half of which
is listing all of our code and test files. Our Makefiles are around
5000 lines, autoconf adds another 1300 lines. CMake in comparison
has only 1200 linescode, but it avoids listing individual files and
does not wire up auto-configuration as extensively as the Meson
instructions do.
- We bundle a set of subproject wrappers for curl, expat, openssl,
pcre2 and zlib. This allows developers to build Git without these
dependencies preinstalled, and Meson will fetch and build them
automatically. This is especially helpful on Windows.
Helped-by: Eli Schwartz <eschwartz@gentoo.org>
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We're contemplating whether to eventually replace our build systems with
a build system that is easier to use. Add a comparison of build systems
to our technical documentation as a baseline for discussion.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Our "test-lib.sh" assumes that our build directory is the parent
directory of "t/". While true when using our Makefile, it's not when
using build systems that support out-of-tree builds.
In commit ee9e66e4e7 (cmake: avoid editing t/test-lib.sh, 2022-10-18),
we have introduce support for overriding the GIT_BUILD_DIR by creating
the file "$GIT_BUILD_DIR/GIT-BUILD-DIR" with its contents pointing to
the location of the build directory. The intent was to stop modifying
"t/test-lib.sh" with the CMake build systems while allowing out-of-tree
builds. But "$GIT_BUILD_DIR" is somewhat misleadingly named, as it in
fact points to the _source_ directory. So while that commit solved part
of the problem for out-of-tree builds, CMake still has to write files
into the source tree.
Solve the second part of the problem, namely not having to write any
data into the source directory at all, by also supporting an environment
variable that allows us to point to a different build directory. This
allows us to perform properly self-contained out-of-tree builds.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Our in-tree builds used by the Makefile use various different build
directories scattered around different locations. The paths to those
build directories have to be propagated to our tests such that they can
find the contained files. This is done via a mixture of hardcoded paths
in our test library and injected variables in our bin-wrappers or
"GIT-BUILD-OPTIONS".
The latter two mechanisms are preferable over using hardcoded paths. For
one, we have all paths which are subject to change stored in a small set
of central files instead of having the knowledge of build paths in many
files. And second, it allows build systems which build files elsewhere
to adapt those paths based on their own needs. This is especially nice
in the context of build systems that use out-of-tree builds like CMake
or Meson.
Remove hardcoded knowledge of build paths from our test library and move
it into our bin-wrappers and "GIT-BUILD-OPTIONS".
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We include the list of available mergetools into our manpages. Extract
the script that performs this logic such that we can reuse it in other
build systems.
While at it, refactor the Makefile targets such that we don't create
"mergetools-list.made" anymore. It shouldn't be necessary, as we can
instead have other targets depend on "mergetools-{diff,merge}.txt"
directly.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The "cmd-list.perl" script generates a list of commands that can be
included into our manpages. The script doesn't know about out-of-tree
builds and instead writes resulting files into the source directory.
Adapt it such that we can read data from the source directory and write
data into the build directory.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Our documentation uses "include::" directives to include parts that are
either reused across multiple documents or parts that we generate at
build time. Unfortunately, top-level includes are only ever resolved
relative to the base directory, which is typically the directory of the
including document. Most importantly, it is not possible to have either
asciidoc or asciidoctor search multiple directories.
It follows that both kinds of includes must live in the same directory.
This is of course a bummer for out-of-tree builds, because here the
dynamically-built includes live in the build directory whereas the
static includes live in the source directory.
Introduce a `build_dir` attribute and prepend it to all of our includes
for dynamically-built files. This attribute gets set to the build
directory and thus converts the include path to an absolute path, which
asciidoc and asciidoctor know how to resolve.
Note that this change also requires us to update "build-docdep.perl",
which tries to figure out included files such our Makefile can set up
proper build-time dependencies. This script simply scans through the
source files for any lines that match "^include::" and treats the
remainder of the line as included file path. But given that those may
now contain the "{build_dir}" variable we have to teach the script to
replace that attribute with the actual build directory.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When we install Git we also install a set of default templates that both
git-init(1) and git-clone(1) populate into our build directories. The
way the pristine templates are laid out in our source directory is
somewhat weird though: instead of reconstructing the actual directory
hierarchy in "templates/", we represent directory separators with "--".
The only reason I could come up with for why we have this is the
"branches/" directory, which is supposed to be empty when installing it.
And as Git famously doesn't store empty directories at all we have to
work around this limitation.
Now the thing is that the "branches/" directory is a leftover to how
branches used to be stored in the dark ages. gitrepository-layout(5)
lists this directory as "slightly deprecated", which I would claim is a
strong understatement. I have never encountered anybody using it today
and would be surprised if it even works as expected. So having the "--"
hack in place for an item that is basically unused, unmaintained and
deprecated doesn't only feel unreasonable, but installing that entry by
default may also cause confusion for users that do not know what this is
supposed to be in the first place.
Remove this directory from our templates and, now that we do not require
the workaround anymore, restructure the templates to form a proper
hierarchy. This makes it way easier for build systems to install these
templates into place.
We should likely think about removing support for "branch/" altogether,
but that is outside of the scope of this patch series.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Write the absolute program path into our bin-wrappers. This allows us to
simplify the Meson build instructions we are about to introduce a bit.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The "bin-wrappers/" directory gets created by our build system and is
populated with one script for each of our binaries. There isn't anything
inherently wrong with the current layout, but it is somewhat hard to
adapt for out-of-tree build systems.
Adapt the layout such that our "bin-wrappers/" directory always exists
and contains our "wrap-for-bin.sh" script to make things a little bit
easier for subsequent steps.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We have multiple scripts that generate headers from other data. All of
these scripts have the assumption built-in that they are executed in the
current source directory, which makes them a bit unwieldy to use during
out-of-tree builds.
Refactor them to instead take the source directory as well as the output
file as arguments.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Similar to the preceding commit, also extract the script to generate the
"gitweb.js" file. While the logic itself is trivial, it helps us avoid
duplication of logic across build systems and ensures that the build
systems will remain in sync with each other in case the logic ever needs
to change.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In order to generate "gitweb.cgi" we have to replace various different
placeholders. This is done ad-hoc and is thus not easily reusable across
different build systems.
Introduce a new GITWEB-BUILD-OPTIONS.in template that we populate at
configuration time with the expected options. This script is then used
as input for a new "generate-gitweb.sh" script that generates the final
"gitweb.cgi" file. While this requires us to repeat the options multiple
times, it is in line to how we generate other build options like our
GIT-BUILD-OPTIONS file.
While at it, refactor how we replace the GITWEB_PROJECT_MAXDEPTH. Even
though this variable is supposed to be an integer, the source file has
the value quoted. The quotes are eventually stripped via sed(1), which
replaces `"@GITWEB_PROJECT_MAXDEPTH@"` with the actual value, which is
rather nonsensical. This is made clearer by just dropping the quotes in
the source file.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Extract a script that massages Python scripts. This provides a couple of
benefits:
- The build logic is deduplicated across Make, CMake and Meson.
- CMake learns to rewrite scripts as-needed at build time instead of
only writing them at configure time.
Furthermore, we will use this script when introducing Meson to
deduplicate the logic across build systems.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Same as in the preceding commits, extract a script that allows us to
unify how we massage shell scripts.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Extend "generate-perl.sh" such that it knows to also massage the Perl
library files. There are two major differences:
- We do not read in the Perl header. This is handled by matching on
whether or not we have a Perl shebang.
- We substitute some more variables, which we read in via our
GIT-BUILD-OPTIONS.
Adapt both our Makefile and the CMake build instructions to use this.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The documentation we generate embeds information for the exact Git
version used as well as the date of the commit. This information is
injected by injecting attributes into the build process via command line
argument.
Refactor the logic so that we write the information into "asciidoc.conf"
and "asciidoctor-extensions.rb" via `GIT-VERSION-GEN` for AsciiDoc and
AsciiDoctor, respectively.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Extract the script to inject various build-time parameters into our Perl
scripts into a standalone script. This is done such that we can reuse it
in other build systems.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The "git.rc" is used on Windows to embed information like the project
name and version into the resulting executables. As such we need to
inject the version information, which we do by using preprocessor
defines. The logic to do so is non-trivial and needs to be kept in sync
with the different build systems.
Refactor the logic so that we generate "git.rc" via `GIT-VERSION-GEN`.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When injecting the Perl path into our scripts we sometimes use '@PERL@'
while we othertimes use '@PERL_PATH@'. Refactor the code use the latter
consistently, which makes it easier to reuse the same logic for multiple
scripts.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We set up a couple of preprocessor macros when compiling Git that
propagate the version that Git was built from to `git version` et al.
The way this is set up makes it harder than necessary to reuse the
infrastructure across the different build systems.
Refactor this such that we generate a "version-def.h" header via
`GIT-VERSION-GEN` instead.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Our "GIT-VERSION-GEN" script always writes the "GIT-VERSION-FILE" into
the current directory, where the expectation is that it should exist in
the source directory. But other build systems that support out-of-tree
builds may not want to do that to keep the source directory pristine,
even though CMake currently doesn't care.
Refactor the script such that it won't write the "GIT-VERSION-FILE"
directly anymore, but instead knows to replace @PLACEHOLDERS@ in an
arbitrary input file. This allows us to simplify the logic in CMake to
determine the project version, but can also be reused later on in order
to generate other files that need to contain version information like
our "git.rc" file.
While at it, change the format of the version file by removing the
spaces around the equals sign. Like this we can continue to include the
file in our Makefiles, but can also start to source it in shell scripts
in subsequent steps.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We have a bunch of placeholders in our scripts that we replace at build
time, for example by using sed(1). These placeholders come in three
different formats: @PLACEHOLDER@, @@PLACEHOLDER@@ and ++PLACEHOLDER++.
Next to being inconsistent it also creates a bit of a problem with
CMake, which only supports the first syntax in its `configure_file()`
function. To work around that we instead manually replace placeholders
via string operations, which is a hassle and removes safeguards that
CMake has to verify that we didn't forget to replace any placeholders.
Besides that, other build systems like Meson also support the CMake
syntax.
Unify our codebase to consistently use the syntax supported by such
build systems.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The "GIT-BUILD-OPTIONS" file is generated by our build systems to
propagate built-in features and paths to our tests. The generation is
done ad-hoc, where both our Makefile and the CMake build instructions
simply echo a bunch of strings into the file. This makes it very hard to
figure out what variables are expected to exist and what format they
have, and the written variables can easily get out of sync between build
systems.
Introduce a new "GIT-BUILD-OPTIONS.in" template to address this issue.
This has multiple advantages:
- It demonstrates which built options exist in the first place.
- It can serve as a spot to document the build options.
- Some build systems complain when not all variables could be
substituted, alerting us of mismatches. Others don't, but if we
forgot to substitute such variables we now have a bogus string that
will likely cause our tests to fail, if they have any meaning in the
first place.
Backfill values that we didn't yet set in our CMake build instructions.
While at it, remove the `SUPPORTS_SIMPLE_IPC` variable that we only set
up in CMake as it isn't used anywhere.
This change requires us to adapt the setup of TEST_OUTPUT_DIRECTORY in
"test-lib.sh" such that it does not get overwritten after sourcing when
it has been set up via the environment. This is the only instance I
could find where we rely on ordering on variables.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In our test helpers we have two cases where we assign -1 to an `unsigned
long`. The intent is to essentially mean "unbounded output", which is
achieved via implicit wraparound of the value.
This pattern causes warnings with -Wsign-compare though. Adapt it and
instead use `ULONG_MAX` explicitly.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
There are two -Wsign-compare warnings in "scalar.c", both of which are
trivial:
- We mistakenly use a signed integer to loop towards an upper unsigned
bound in `cmd_reconfigure()`.
- We subtract `path_sep - enlistment->buf`, which results in a signed
integer, and use the value in a ternary expression where second
value is unsigned. But as `path_sep` is being assigned the result of
`find_last_dir_sep(enlistment->buf + offset)` we know that it must
always be bigger than or equal to `enlistment->buf`, and thus the
result will be positive.
Address both of these warnings.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In `get_one_patchid()` we assign either the result of `strlen()` or
`remove_space()` to `len`. But while the former correctly returns a
`size_t`, the latter returns an `int` to indicate the length of the
stripped string even though it cannot ever return a negative value. This
causes a warning with "-Wsign-conversion".
In fact, even `get_one_patchid()` itself is also using an integer as
return value even though it always returns the length of the patch, and
this bubbles up to other callers.
Adapt the function and its helpers to use `size_t` for string lengths
consistently.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The `length` variable is used to store how many bytes we wish to emit
from an object ID. This value will either be the full hash algorithm's
length, or the abbreviated hash that can be set via `--abbrev` or the
"core.abbrev" option. The former is of type `size_t`, whereas the latter
is of type `int`, which causes a warning with "-Wsign-compare".
The reason why `abbrev` is using a signed type is mostly that it is
initialized with `-1` to indicate that we have to compute the minimum
abbreviation length. This length is computed via `find_alignment()`,
which always gets called before `emit_other()`, and thus we can assume
that the value would never be negative in `emit_other()`.
In fact, we can even assume that the value will always be at least
`MINIMUM_ABBREV`, which is enforced by both `git_default_core_config()`
and `parse_opt_abbrev_cb()`. We implicitly rely on this by subtracting
up to 3 without checking for whether the value becomes negative. We then
pass the value to printf(3p) to print the prefix of our object's ID, so
if that assumption was violated we may end up with undefined behaviour.
Squelch the warning by asserting this invariant and casting the value of
`abbrev` to `size_t`. This allows us to store the whole length as an
unsigned integer, which we can then pass to `fwrite()`.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
There are a couple of -Wsign-comparison warnings in "gpg-interface.c".
Most of them are trivial and simply using signed integers to loop
towards an upper unsigned bound.
But in `parse_signed_buffer()` we have one case where the different
signedness of the two values of a ternary expression results in a
warning. Given that:
- `size` will always be bigger than `len` due to the loop condition.
- `eol` will always be after `buf + len` because it is found via
memchr(3p) starting from `buf + len`.
We know that both values will always be natural integers.
Squelch the warning by casting the left-hand side to `size_t`.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The `max_connections` type tracks how many children git-daemon(1) would
spawn at the same time. This value can be controlled via a command line
switch: if given a positive value we'll set that up as the limit. But
when given either zero or a negative value we don't enforce any limit at
all.
But even when being passed a negative value we won't actually store it,
but normalize it to 0. Still, the variable used to store the config is
using a signed integer, which causes warnings when comparing the number
of accepted connections (`max_connections`) with the number of current
connections being handled (`live_children`).
Adapt the type of `max_connections` such that the types of both
variables match.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We have several loops in "daemon.c" that use a signed integer to loop
through a `size_t`. Adapt them to instead use a `size_t` as counter
value.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We have a bunch of loops which iterate up to an unsigned boundary using
a signed index, which generates warnigs because we compare a signed and
unsigned value in the loop condition. Address these sites for trivial
cases and enable `-Wsign-compare` warnings for these code units.
This patch only adapts those code units where we can drop the
`DISABLE_SIGN_COMPARE_WARNINGS` macro in the same step.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Similar to the preceding commit, we get a warning in `get_packet_data()`
on 32 bit platforms due to our lenient use of `ssize_t`. This function
is kind of curious though: we accept an `unsigned size` of bytes to
read, then store the actual number of bytes read in an `ssize_t` and
return it as an `int`. This is a whole lot of integer conversions, and
in theory these can cause us to overflow when the passed-in size is
larger than `ssize_t`, which on 32 bit platforms is implemented as an
`int`.
None of the callers of that function even care about the number of bytes
we have read, so returning that number is moot anyway. Refactor the
function such that it only returns an error code, which plugs the
potential overflow. While at it, convert the passed-in size parameter to
be of type `size_t`.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
On 32-bit platforms, ssize_t may be "int" while size_t may be
"unsigned int". At times we compare the number of bytes we read
stored in a ssize_t variable with "unsigned int", but that is done
after we check that we did not get an error return (which is
negative---and that is the whole reason why we used ssize_t and not
size_t), so these comparisons are safe.
But compilers may not realize that. Cast these to size_t to work
around the false positives. On platforms with size_t/ssize_t wider
than a normal int, this won't be an issue.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The `struct diff_flags` structure is essentially an array of flags, all
of which have the same type. We can thus use `sizeof()` to iterate
through all of the flags, which we do in `diff_flags_or()`. But while
the statement returns an unsigned integer, we used a signed integer to
iterate through the flags, which generates a warning.
Fix this by using `size_t` for the index instead.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
There is no need anymore to disable `-Wsign-compare` now that all files
that cause warnings have been marked accordingly. Drop the option.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Mark code units that generate warnings with `-Wsign-compare`. This
allows for a structured approach to get rid of all such warnings over
time in a way that can be easily measured.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
GCC generates a warning in "headless.c" because we compare `slash` with
`size`, where the former is an `int` and the latter is a `size_t`. Fix
the warning by storing `slash` as a `size_t`, as well.
This commit is being singled out because the file does not include the
"git-compat-util.h" header, and consequently, we cannot easily mark it
with the `DISABLE_SIGN_COMPARE_WARNING` macro.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Explicitly ignore "-Wsign-compare" warnings in our bundled copy of the
regcomp implementation. We don't use the macro introduced in the
preceding commit because this code does not include "git-compat-util.h"
in the first place.
Note that we already directly use "#pragma GCC diagnostic ignored" in
"regcomp.c", so it shouldn't be an issue to use it directly in the new
spot, either.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When compiling with DEVELOPER=YesPlease, we explicitly disable the
"-Wsign-compare" warning. This is mostly because our code base is full
of cases where we don't bother at all whether something should be signed
or unsigned, and enabling the warning would thus cause tons of warnings
to pop up.
Unfortunately, disabling this warning also masks real issues. There have
been multiple CVEs in the Git project that would have been flagged by
this warning (e.g. CVE-2022-39260, CVE-2022-41903 and several fixes in
the vicinity of these CVEs). Furthermore, the final audit report by
X41 D-Sec, who are the ones who have discovered some of the CVEs, hinted
that it might be a good idea to become more strict in this context.
Now simply enabling the warning globally does not fly due to the stated
reason above that we simply have too many sites where we use the wrong
integer types. Instead, introduce a new set of macros that allow us to
mark a file as being free of warnings with "-Wsign-compare". The
mechanism is similar to what we do with `USE_THE_REPOSITORY_VARIABLE`:
every file that is not marked with `DISABLE_SIGN_COMPARE_WARNINGS` will
be compiled with those warnings enabled.
These new markings will be wired up in the subsequent commits.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Commit a30154187a (describe: stop traversing when we run out of names,
2024-10-31) taught git-describe to automatically reduce the
max_candidates setting to match the total number of possible names. This
lets us break out of the traversal rather than fruitlessly searching for
more candidates when there are no more to be found.
However, setting max_candidates to 0 (e.g., if the repo has no tags)
overlaps with the --exact-match option, which explicitly uses the same
value. And this causes a regression with --always, which is ignored in
exact-match mode. We used to get this in a repo with no tags:
$ git describe --always HEAD
b2f0a7f
and now we get:
$ git describe --always HEAD
fatal: no tag exactly matches 'b2f0a7f47f5f2aebe1e7fceff19a57de20a78c06'
The reason is that we bail early in describe_commit() when
max_candidates is set to 0. This logic goes all the way back to
2c33f75754 (Teach git-describe --exact-match to avoid expensive tag
searches, 2008-02-24).
We should obviously fix this regression, but there are two paths,
depending on what you think:
$ git describe --always --exact-match
and
$ git describe --always --candidates=0
should do. Since the "--always" option was added, it has always been
ignored in --exact-match (or --candidates=0) mode. I.e., we treat
--exact-match as a true exact match of a tag, and never fall back to
using --always, even if it was requested.
If we think that's a bug (or at least a misfeature), then the right
solution is to fix it by removing the early bail-out from 2c33f75754,
letting the noop algorithm run and then hitting the --always fallback
output. And then our regression naturally goes away, because it follows
the same path.
If we think that the current "--exact-match --always" behavior is the
right thing, then we have to differentiate the case where we
automatically reduced max_candidates to 0 from the case where the user
asked for it specifically. That's possible to do with a flag, but we can
also just reimplement the logic from a30154187a to explicitly break out
of the traversal when we run out of candidates (rather than relying on
the existing max_candidates check).
My gut feeling is along the lines of option 1 (it's a bug, and people
would be happy for "--exact-match --always" to give the fallback rather
than ignoring "--always"). But the documentation can be interpreted in
the other direction, and we've certainly lived with the existing
behavior for many years. So it's possible that changing it now is the
wrong thing.
So this patch fixes the regression by taking the second option,
retaining the "--exact-match" behavior as-is. There are two new tests.
The first shows that the regression is fixed (we don't even need a new
repo without tags; a restrictive --match is enough to create the
situation that there are no candidate names).
The second test confirms that the "--exact-match --always" behavior
remains unchanged and continues to die when there is no tag pointing at
the specified commit. It's possible we may reconsider this in the
future, but this shows that the approach described above is implemented
faithfully.
We can also run the perf tests in p6100 to see that we've retained the
speedup that a30154187a was going for:
Test HEAD^ HEAD
--------------------------------------------------------------------------------------
6100.2: describe HEAD 0.72(0.64+0.07) 0.72(0.66+0.06) +0.0%
6100.3: describe HEAD with one max candidate 0.01(0.00+0.00) 0.01(0.00+0.00) +0.0%
6100.4: describe HEAD with one tag 0.01(0.01+0.00) 0.01(0.01+0.00) +0.0%
Reported-by: Josh Steadmon <steadmon@google.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
A double-free that may not trigger in practice by luck has been
corrected in the reference resolution code.
* sj/refs-symref-referent-fix:
ref-cache: fix invalid free operation in `free_ref_entry`
* bf/set-head-symref:
fetch set_head: handle mirrored bare repositories
fetch: set remote/HEAD if it does not exist
refs: add create_only option to refs_update_symref_extended
refs: add TRANSACTION_CREATE_EXISTS error
remote set-head: better output for --auto
remote set-head: refactor for readability
refs: atomically record overwritten ref in update_symref
refs: standardize output of refs_read_symbolic_ref
t/t5505-remote: test failure of set-head
t/t5505-remote: set default branch to main
The advice message currently suggests using "git config advice..." to
disable advice messages, but since
00bbdde141 (builtin/config: introduce "set" subcommand, 2024-05-06)
we have the "set" subcommand for config. Since using the subcommand is
more in-line with the modern interface, any advice should be promoting
its usage. Change the disable advice message to use the subcommand
instead. Change all uses of "git config advice" in the tests to use the
subcommand.
Signed-off-by: Bence Ferdinandy <bence@ferdinandy.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When running "remote set-head" manually it is unlikely, that the user
would actually like to have "fetch" always update the remote/HEAD. On
the contrary, it is more likely, that the user would expect remote/HEAD
to stay the way they manually set it, and just forgot about having
"followRemoteHEAD" set to "always".
When "followRemoteHEAD" is set to "always" make running "remote
set-head" change the config to "warn".
Signed-off-by: Bence Ferdinandy <bence@ferdinandy.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Currently if we want to have a remote/HEAD locally that is different
from the one on the remote, but we still want to get a warning if remote
changes HEAD, our only option is to have an indiscriminate warning with
"follow_remote_head" set to "warn". Add a new option
"warn-if-not-$branch", where $branch is a branch name we do not wish to
get a warning about. If the remote HEAD is $branch do not warn,
otherwise, behave as "warn".
E.g. let's assume, that our remote origin has HEAD
set to "master", but locally we have "git remote set-head origin seen".
Setting 'remote.origin.followRemoteHEAD = "warn"' will always print
a warning, even though the remote has not changed HEAD from "master".
Setting 'remote.origin.followRemoteHEAD = "warn-if-not-master" will
squelch the warning message, unless the remote changes HEAD from
"master". Note, that should the remote change HEAD to "seen" (which we
have locally), there will still be no warning.
Improve the advice message in report_set_head to also include silencing
the warning message with "warn-if-not-$branch".
Signed-off-by: Bence Ferdinandy <bence@ferdinandy.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Advice about what to do when getting a warning is typed out explicitly
twice and is printed as regular output. The output is also tested for.
Extract the advice message into a single place and use a wrapper
function, so if later the advice is made more chatty the signature only
needs to be changed in once place. Remove the testing for the advice
output in the tests.
Signed-off-by: Bence Ferdinandy <bence@ferdinandy.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The `MIDX_MIN_SIZE` definition is used to check the midx_size in
`local_multi_pack_index_one`. This definition relies on the
`the_hash_algo` global variable. Inline this and remove the global
variable usage.
With this, remove `USE_THE_REPOSITORY_VARIABLE` usage from `midx.c`.
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The functions `get_split_midx_filename_ext()`, `get_midx_filename()` and
`get_midx_filename_ext()` use `hash_to_hex()` which internally uses the
`the_hash_algo` global variable.
Remove this dependency on global variables by passing down the
`hash_algo` through to the functions mentioned and instead calling
`hash_to_hex_algop()` along with the obtained `hash_algo`.
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The `load_multi_pack_index` function in midx uses `the_repository`
variable to access the `repository` struct. Modify the function and its
callee's to send the `repository` field.
This moves usage of `the_repository` to the `test-read-midx.c` file.
While that is not optimal, it is okay, since the upcoming commits will
slowly move the usage of `the_repository` up the layers and remove it
eventually.
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In the `midx.c` file, there are multiple usages of `the_repository` and
`the_hash_algo` within static functions of the file. Some of the usages
can be simply swapped out with the available `repository` struct. While
some of them can be swapped out by passing the repository to the
required functions.
This leaves out only some other usages of `the_repository` and
`the_hash_algo` in the file in non-static functions, which we'll tackle
in upcoming commits.
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In a previous commit, we passed the repository field to all
subcommands in the `builtin/` directory. Utilize this to pass the
repository field down to the `write_midx_file[_only]` functions to
remove the usage of `the_repository` global variables.
With this, all usage of global variables in `midx-write.c` is removed,
hence, remove the `USE_THE_REPOSITORY_VARIABLE` guard from the file.
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The struct `write_midx_context` is used to pass context for creating
MIDX files. Add the repository field here to ensure that most functions
within `midx-write.c` have access to the field and can use that instead
of the global `the_repository` variable.
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The function `read_refs_snapshot()` uses `parse_oid_hex()`, which relies
on the global `the_hash_algo` variable. Let's instead use
`parse_oid_hex_algop()` and provide the hash algo via `revs->repo`.
Also, while here, fix a missing newline after the function's definition.
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In 'midx-write.c' there are a lot of static functions which use global
variables `the_repository` or `the_hash_algo`. In a follow up commit,
the repository variable will be added to `write_midx_context`, which
some of the functions can use. But for functions which do not have
access to this struct, pass down the required information from
non-static functions `write_midx_file` and `write_midx_file_only`.
This requires that the function `hash_to_hex` is also replaced with
`hash_to_hex_algop` since the former internally accesses the
`the_hash_algo` global variable.
This ensures that the usage of global variables is limited to these
non-static functions, which will be cleaned up in a follow up commit.
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* kn/pass-repo-to-builtin-sub-sub-commands:
builtin: pass repository to sub commands
Git 2.47.1
Makefile(s): avoid recipe prefix in conditional statements
doc: switch links to https
doc: update links to current pages
The eleventh batch
pack-objects: only perform verbatim reuse on the preferred pack
t5332-multi-pack-reuse.sh: demonstrate duplicate packing failure
test-lib: move malloc-debug setup after $PATH setup
builtin/difftool: intialize some hashmap variables
refspec: store raw refspecs inside refspec_item
refspec: drop separate raw_nr count
fetch: adjust refspec->raw_nr when filtering prefetch refspecs
test-lib: check malloc debug LD_PRELOAD before using
* kn/the-repository:
packfile.c: remove unnecessary prepare_packed_git() call
midx: add repository to `multi_pack_index` struct
config: make `packed_git_(limit|window_size)` non-global variables
config: make `delta_base_cache_limit` a non-global variable
packfile: pass down repository to `for_each_packed_object`
packfile: pass down repository to `has_object[_kept]_pack`
packfile: pass down repository to `odb_pack_name`
packfile: pass `repository` to static function in the file
packfile: use `repository` from `packed_git` directly
packfile: add repository to struct `packed_git`
Documentation mark-up updates.
* ja/git-diff-doc-markup:
doc: git-diff: apply format changes to config part
doc: git-diff: apply format changes to diff-generate-patch
doc: git-diff: apply format changes to diff-format
doc: git-diff: apply format changes to diff-options
doc: git-diff: apply new documentation guidelines
Drop support for older libcURL and Perl.
* bc/drop-ancient-libcurl-and-perl:
gitweb: make use of s///r
Require Perl 5.26.0
INSTALL: document requirement for libcurl 7.61.0
git-curl-compat: remove check for curl 7.56.0
git-curl-compat: remove check for curl 7.53.0
git-curl-compat: remove check for curl 7.52.0
git-curl-compat: remove check for curl 7.44.0
git-curl-compat: remove check for curl 7.43.0
git-curl-compat: remove check for curl 7.39.0
git-curl-compat: remove check for curl 7.34.0
git-curl-compat: remove check for curl 7.25.0
git-curl-compat: remove check for curl 7.21.5
Built-in Git subcommands are supplied the repository object to work
with; they learned to do the same when they invoke sub-subcommands.
* kn/pass-repo-to-builtin-sub-sub-commands:
builtin: pass repository to sub commands
Work around Coverity warning that would not trigger in practice.
* ps/bisect-double-free-fix:
bisect: address Coverity warning about potential double free
"git fsck" learned to issue warnings on "curiously formatted" ref
contents that have always been taken valid but something Git
wouldn't have written itself (e.g., missing terminating end-of-line
after the full object name).
* sj/ref-contents-check:
ref: add symlink ref content check for files backend
ref: check whether the target of the symref is a ref
ref: add basic symref content check for files backend
ref: add more strict checks for regular refs
ref: port git-fsck(1) regular refs check for files backend
ref: support multiple worktrees check for refs
ref: initialize ref name outside of check functions
ref: check the full refname instead of basename
ref: initialize "fsck_ref_report" with zero
The migration procedure between two ref backends has been optimized.
* ps/ref-backend-migration-optim:
reftable: rename scratch buffer
refs: adapt `initial_transaction` flag to be unsigned
reftable/block: optimize allocations by using scratch buffer
reftable/block: rename `block_writer::buf` variable
reftable/writer: optimize allocations by using a scratch buffer
refs: don't normalize log messages with `REF_SKIP_CREATE_REFLOG`
refs: skip collision checks in initial transactions
refs: use "initial" transaction semantics to migrate refs
refs/files: support symbolic and root refs in initial transaction
refs: introduce "initial" transaction flag
refs/files: move logic to commit initial transaction
refs: allow passing flags when setting up a transaction
Leakfixes.
* ps/leakfixes-part-10: (27 commits)
t: remove TEST_PASSES_SANITIZE_LEAK annotations
test-lib: unconditionally enable leak checking
t: remove unneeded !SANITIZE_LEAK prerequisites
t: mark some tests as leak free
t5601: work around leak sanitizer issue
git-compat-util: drop now-unused `UNLEAK()` macro
global: drop `UNLEAK()` annotation
t/helper: fix leaking commit graph in "read-graph" subcommand
builtin/branch: fix leaking sorting options
builtin/init-db: fix leaking directory paths
builtin/help: fix leaks in `check_git_cmd()`
help: fix leaking return value from `help_unknown_cmd()`
help: fix leaking `struct cmdnames`
help: refactor to not use globals for reading config
builtin/sparse-checkout: fix leaking sanitized patterns
split-index: fix memory leak in `move_cache_to_base_index()`
git: refactor builtin handling to use a `struct strvec`
git: refactor alias handling to use a `struct strvec`
strvec: introduce new `strvec_splice()` function
line-log: fix leak when rewriting commit parents
...
Give a bit of advice/hint message when "git maintenance" stops finding a
lock file left by another instance that still is potentially running.
* ps/gc-stale-lock-warning:
t7900: fix host-dependent behaviour when testing git-maintenance(1)
builtin/gc: provide hint when maintenance hits a stale schedule lock
Commit da91a90c2f (fast-import: disallow more path components,
2024-11-30) added two separate verify_path() calls (one for
added/modified files, and one for renames/copies). But our tests only
exercise the first one. Let's protect ourselves against regressions by
tweaking one of the tests to rename into the bad path. There are
adjacent tests that will stay as additions, so now both calls are
covered.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The rev-list documentation doesn't mention that the given
commit must be in the specified commit range, leading
to unexpected results.
Signed-off-by: Kai Koponen <kaikopone@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In 454ea2e4d7 (treewide: use get_all_packs, 2018-08-20) we converted
existing calls to both:
- get_packed_git(), as well as
- the_repository->objects->packed_git
, to instead use the new get_all_packs() function.
In the instance that this commit addresses, there was a preceding call
to prepare_packed_git(), which dates all the way back to 660c889e46
(sha1_file: add for_each iterators for loose and packed objects,
2014-10-15) when its caller (for_each_packed_object()) was first
introduced.
This call could have been removed in 454ea2e4d7, since get_all_packs()
itself calls prepare_packed_git(). But the translation in 454ea2e4d7 was
(to the best of my knowledge) a find-and-replace rather than inspecting
each individual caller.
Having an extra prepare_packed_git() call here is harmless, since it
will notice that we have already set the 'packed_git_initialized' field
and the call will be a noop. So we're only talking about a few dozen CPU
cycles to set up and tear down the stack frame.
But having a lone prepare_packed_git() call immediately before a call to
get_all_packs() confused me, so let's remove it as redundant to avoid
more confusion in the future.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The `multi_pack_index` struct represents the MIDX for a repository.
Here, we add a pointer to the repository in this struct, allowing direct
use of the repository variable without relying on the global
`the_repository` struct.
With this addition, we can determine the repository associated with a
`bitmap_index` struct. A `bitmap_index` points to either a `packed_git`
or a `multi_pack_index`, both of which have direct repository
references. To support this, we introduce a static helper function,
`bitmap_repo`, in `pack-bitmap.c`, which retrieves a repository given a
`bitmap_index`.
With this, we clear up all usages of `the_repository` within
`pack-bitmap.c` and also remove the `USE_THE_REPOSITORY_VARIABLE`
definition. Bringing us another step closer to remove all global
variable usage.
Although this change also opens up the potential to clean up `midx.c`,
doing so would require additional refactoring to pass the repository
struct to functions where the MIDX struct is created: a task better
suited for future patches.
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The variables `packed_git_window_size` and `packed_git_limit` are global
config variables used in the `packfile.c` file. Since it is only used in
this file, let's change it from being a global config variable to a
local variable for the subsystem.
With this, we rid `packfile.c` from all global variable usage and this
means we can also remove the `USE_THE_REPOSITORY_VARIABLE` guard from
the file.
Helped-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The `delta_base_cache_limit` variable is a global config variable used
by multiple subsystems. Let's make this non-global, by adding this
variable independently to the subsystems where it is used.
First, add the setting to the `repo_settings` struct, this provides
access to the config in places where the repository is available. Use
this in `packfile.c`.
In `index-pack.c` we add it to the `pack_idx_option` struct and its
constructor. While the repository struct is available here, it may not
be set because `git index-pack` can be used without a repository.
In `gc.c` add it to the `gc_config` struct and also the constructor
function. The gc functions currently do not have direct access to a
repository struct.
These changes are made to remove the usage of `delta_base_cache_limit`
as a global variable in `packfile.c`. This brings us one step closer to
removing the `USE_THE_REPOSITORY_VARIABLE` definition in `packfile.c`
which we complete in the next patch.
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The function `for_each_packed_object` currently relies on the global
variable `the_repository`. To eliminate global variable usage in
`packfile.c`, we should progressively shift the dependency on
the_repository to higher layers. Let's remove its usage from this
function and closely related function `is_promisor_object`.
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The functions `has_object[_kept]_pack` currently rely on the global
variable `the_repository`. To eliminate global variable usage in
`packfile.c`, we should progressively shift the dependency on
the_repository to higher layers. Let's remove its usage from these
functions and any related ones.
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The function `odb_pack_name` currently relies on the global variable
`the_repository`. To eliminate global variable usage in `packfile.c`, we
should progressively shift the dependency on the_repository to higher
layers.
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Some of the static functions in the `packfile.c` access global
variables, which can simply be avoided by passing the `repository`
struct down to them. Let's do that.
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In the previous commit, we introduced the `repository` structure inside
`packed_git`. This provides an alternative route instead of using the
global `the_repository` variable. Let's modify `packfile.c` now to use
this field wherever possible instead of relying on the global state.
There are still a few instances of `the_repository` usage in the file,
where there is no struct `packed_git` locally available, which will be
fixed in the following commits.
Helped-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The struct `packed_git` holds information regarding a packed object
file. Let's add the repository variable to this object, to represent the
repository that this packfile belongs to. This helps remove dependency
on the global `the_repository` object in `packfile.c` by simply using
repository information now readily available in the struct.
We do need to consider that a packfile could be part of the alternates
of a repository, but considering that we only have one repository struct
and also that we currently anyways use 'the_repository', we should be
OK with this change.
We also modify `alloc_packed_git` to ensure that the repository is added
to newly created `packed_git` structs. This requires modifying the
function and all its callee to pass the repository object down the
levels.
Helped-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Even though the plumbing level allows you to create refs/tags/HEAD
and refs/heads/HEAD, doing so makes it confusing within the context
of the UI Git Porcelain commands provides. Just like we prevent a
branch from getting called "HEAD" at the Porcelain layer (i.e. "git
branch" command), teach "git tag" to refuse to create a tag "HEAD".
With a few new tests, we make sure that
- "git tag HEAD" and "git tag -a HEAD" are rejected
- "git update-ref refs/tags/HEAD" is still allowed (this is a
deliberate design decision to allow others to create their own UI
on top of Git infrastructure that may be different from our UI).
- "git tag -d HEAD" can remove refs/tags/HEAD to recover from an
mistake.
Helped-by: Jeff King <peff@peff.net>
Helped-by: Rubén Justo <rjusto@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
09116a1c (refs: loosen over-strict "format" check, 2011-11-16)
introduced a test piece (originally in t5700) that expects to be
able to create a tag named "HEAD" and then a local clone using the
repository as its own reference works correctly. Later, another
test piece started using this tag starting at acede2eb (t5700:
document a failure of alternates to affect fetch, 2012-02-11).
But the breakage 09116a1c fixed was not specific to the tagname
HEAD. It would have failed exactly the same way if the tag used
were foo instead of HEAD.
Before forbidding "git tag" from creating "refs/tags/HEAD", update
these tests to use 'foo', not 'HEAD', as the name of the test tag.
Note that the test piece that uses the tag learned the value of the
tag in unnecessarily inefficient and convoluted way with for-each-ref.
Just use "rev-parse" instead.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The helper functions (strbuf_branchname, strbuf_check_branch_ref,
and strbuf_check_tag_ref) are about handling branch and tag names,
and it is a non-essential fact that these functions use strbuf to
hold these names. Rename them to make it clarify that these are
more about "ref".
Signed-off-by: Junio C Hamano <gitster@pobox.com>
strbuf_branchname(), strbuf_check_{branch,tag}_ref() are helper
functions to deal with branch and tag names, and the fact that they
happen to use strbuf to hold the name of a branch or a tag is not
essential. These functions fit better in the refs API than strbuf
API, the latter of which is about string manipulations.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* ps/leakfixes-part-10: (49 commits)
t: remove TEST_PASSES_SANITIZE_LEAK annotations
test-lib: unconditionally enable leak checking
t: remove unneeded !SANITIZE_LEAK prerequisites
t: mark some tests as leak free
t5601: work around leak sanitizer issue
git-compat-util: drop now-unused `UNLEAK()` macro
global: drop `UNLEAK()` annotation
t/helper: fix leaking commit graph in "read-graph" subcommand
builtin/branch: fix leaking sorting options
builtin/init-db: fix leaking directory paths
builtin/help: fix leaks in `check_git_cmd()`
help: fix leaking return value from `help_unknown_cmd()`
help: fix leaking `struct cmdnames`
help: refactor to not use globals for reading config
builtin/sparse-checkout: fix leaking sanitized patterns
split-index: fix memory leak in `move_cache_to_base_index()`
git: refactor builtin handling to use a `struct strvec`
git: refactor alias handling to use a `struct strvec`
strvec: introduce new `strvec_splice()` function
line-log: fix leak when rewriting commit parents
...
Add missing word “that” in the phrase “after verifying that”, like
what was done in 1b2dfb70504 (Documentation/git-update-ref.txt: drop
“flag”, 2024-10-21)
Signed-off-by: Kristoffer Haugsbakk <code@khaugsbakk.name>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Instead of just disallowing '.' and '..', make use of verify_path() to
ensure that fast-import will disallow anything we wouldn't allow into
the index, such as anything under .git/, .gitmodules as a symlink, or
a dos drive prefix on Windows.
Since a few fast-export and fast-import tests that tried to stress-test
the correct handling of quoting relied on filenames that fail
is_valid_win32_path(), such as spaces or periods at the end of filenames
or backslashes within the filename, turn off core.protectNTFS for those
tests to ensure they keep passing.
Helped-by: Jeff King <peff@peff.net>
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In the current implementation, if refs/remotes/$remote/HEAD does not
exist, running fetch will create it, but if it does exist it will not do
anything, which is a somewhat safe and minimal approach. Unfortunately,
for users who wish to NOT have refs/remotes/$remote/HEAD set for any
reason (e.g. so that `git rev-parse origin` doesn't accidentally point
them somewhere they do not want to), there is no way to remove this
behaviour. On the other side of the spectrum, users may want fetch to
automatically update HEAD or at least give them a warning if something
changed on the remote.
Introduce a new setting, remote.$remote.followRemoteHEAD with four
options:
- "never": do not ever do anything, not even create
- "create": the current behaviour, now the default behaviour
- "warn": print a message if remote and local HEAD is different
- "always": silently update HEAD on every change
Signed-off-by: Bence Ferdinandy <bence@ferdinandy.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This refactors `repair_worktree_after_gitdir_move()` to use the new
`write_worktree_linking_files` function. It also preserves the
relativity of the linking files; e.g., if an existing worktree used
absolute paths then the repaired paths will be absolute (and visa-versa).
`repair_worktree_after_gitdir_move()` is used to repair both sets of
worktree linking files if the `.git` directory is moved during a
re-initialization using `git init`.
This also adds a test case for reinitializing a repository that has
relative worktrees.
Signed-off-by: Caleb White <cdwhite3@pm.me>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This teaches the `worktree repair` command to respect the
`--[no-]relative-paths` CLI option and `worktree.useRelativePaths`
config setting. If an existing worktree with an absolute path is repaired
with `--relative-paths`, the links will be replaced with relative paths,
even if the original path was correct. This allows a user to covert
existing worktrees between absolute/relative as desired.
To simplify things, both linking files are written when one of the files
needs to be repaired. In some cases, this fixes the other file before it
is checked, in other cases this results in a correct file being written
with the same contents.
Signed-off-by: Caleb White <cdwhite3@pm.me>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This teaches the `worktree move` command to respect the
`--[no-]relative-paths` CLI option and `worktree.useRelativePaths`
config setting. If an existing worktree is moved with `--relative-paths`
the new path will be relative (and visa-versa).
Signed-off-by: Caleb White <cdwhite3@pm.me>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This introduces the `--[no-]relative-paths` CLI option and
`worktree.useRelativePaths` configuration setting to the `worktree add`
command. When enabled these options allow worktrees to be linked using
relative paths, enhancing portability across environments where absolute
paths may differ (e.g., containerized setups, shared network drives).
Git still creates absolute paths by default, but these options allow
users to opt-in to relative paths if desired.
The t2408 test file is removed and more comprehensive tests are
written for the various worktree operations in their own files.
Signed-off-by: Caleb White <cdwhite3@pm.me>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
A new helper function, `write_worktree_linking_files()`, centralizes
the logic for computing and writing either relative or absolute
paths, based on the provided configuration. This function accepts
`strbuf` pointers to both the worktree’s `.git` link and the
repository’s `gitdir`, and then writes the appropriate path to each.
The `relativeWorktrees` extension is automatically set when a worktree
is linked with relative paths.
Signed-off-by: Caleb White <cdwhite3@pm.me>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The previous round[1] was merged a bit early before reviewer feedback
could be applied. This correctly indents a code block and updates the
`infer_backlink` function to return `-1` on failure and strbuf.len on
success.
[1]: https://lore.kernel.org/git/20241007-wt_relative_paths-v3-0-622cf18c45eb@pm.me
Signed-off-by: Caleb White <cdwhite3@pm.me>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
A new extension, `relativeWorktrees`, is added to indicate that at least
one worktree in the repository has been linked with relative paths.
This ensures older Git versions do not attempt to automatically prune
worktrees with relative paths, as they would not not recognize the
paths as being valid.
Suggested-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Caleb White <cdwhite3@pm.me>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When reinitializing a repository, Git does not account for extensions
other than `objectformat` and `refstorage` when determining the
repository version. This can lead to a repository being downgraded to
version 0 if extensions are set, causing Git future operations to fail.
This patch teaches Git to check if other extensions are defined in the
config to ensure that the repository version is set correctly.
Signed-off-by: Caleb White <cdwhite3@pm.me>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
It is more efficient to have something in the coding guidelines
document to point at, when we want to review and comment on a new
message in the codebase to make sure it "fits" in the set of
existing messages.
Let's write down established best practice we are aware of.
Helped-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When fetching directly from a bundle, fsck message severity
configuration is not propagated to the underlying git-index-pack(1). It
is only capable of enabling or disabling fsck checks entirely. This does
not align with the fsck behavior for fetches through git-fetch-pack(1).
Use the fsck config parsing from fetch-pack to populate fsck message
severity configuration and wire it through to `unbundle()` to enable the
same fsck verification as done through fetch-pack.
Signed-off-by: Justin Tobler <jltobler@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When `fetch_pack_config()` is invoked, fetch-pack configuration is
parsed from the config. As part of this operation, fsck message severity
configuration is assigned to the `fsck_msg_types` global variable. This
is optionally used to configure the downstream git-index-pack(1) when
the `--strict` option is specified.
The same parsed fsck message severity configuration is also needed
outside of fetch-pack. Instead of exposing/relying on the existing
global state, split out the fsck config parsing logic into
`fetch_pack_fsck_config()` and expose it. In a subsequent commit, this
is used to provide fsck configuration when invoking `unbundle()`.
For `fetch_pack_fsck_config()` to discern between errors and unhandled
config variables, the return code when `git_config_path()` errors is
changed to a different value also indicating success. This frees up the
previous return code to now indicate the provided config variable
was unhandled. The behavior remains functionally the same.
Signed-off-by: Justin Tobler <jltobler@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
If the `VERIFY_BUNDLE_FLAG` is set during `unbundle()`, the
git-index-pack(1) spawned is configured with the `--fsck-options` flag
to perform fsck verification. With this flag enabled, there is not a way
to configure fsck message severity though.
Extend the `unbundle_opts` type to store fsck message severity
configuration and update `unbundle()` to conditionally append it to the
`--fsck-objects` flag if provided. This enables `unbundle()` call sites
to support optionally setting the severity for specific fsck messages.
Signed-off-by: Justin Tobler <jltobler@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When `unbundle()` is invoked, fsck verification may be configured by
passing the `VERIFY_BUNDLE_FSCK` flag. This mechanism allows fsck checks
on the bundle to be enabled or disabled entirely. To facilitate more
fine-grained fsck configuration, additional context must be provided to
`unbundle()`.
Introduce the `unbundle_opts` type, which wraps the existing
`verify_bundle_flags`, to facilitate future extension of `unbundle()`
configuration. Also update `unbundle()` and its call sites to accept
this new options type instead of the flags directly. The end behavior is
functionally the same, but allows for the set of configurable options to
be extended. This is leveraged in a subsequent commit to enable fsck
message severity configuration.
Signed-off-by: Justin Tobler <jltobler@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* bf/set-head-symref:
fetch set_head: handle mirrored bare repositories
fetch: set remote/HEAD if it does not exist
refs: add create_only option to refs_update_symref_extended
refs: add TRANSACTION_CREATE_EXISTS error
remote set-head: better output for --auto
remote set-head: refactor for readability
refs: atomically record overwritten ref in update_symref
refs: standardize output of refs_read_symbolic_ref
t/t5505-remote: test failure of set-head
t/t5505-remote: set default branch to main
The ref-transaction hook triggered for reflog updates, which has
been corrected.
* kn/ref-transaction-hook-with-reflog:
refs: don't invoke reference-transaction hook for reflogs
We now ensure "index-pack" is used with the "--promisor" option
only during a "git fetch".
* jt/index-pack-allow-promisor-only-while-fetching:
index-pack: teach --promisor to forbid pack name
"git fast-import" can be tricked into a replace ref that maps an
object to itself, which is a useless thing to do.
* en/fast-import-avoid-self-replace:
fast-import: avoid making replace refs point to themselves
GCC 15 compatibility updates.
* jk/gcc15:
object-file: inline empty tree and blob literals
object-file: treat cached_object values as const
object-file: drop oid field from find_cached_object() return value
object-file: move empty_tree struct into find_cached_object()
object-file: drop confusing oid initializer of empty_tree struct
object-file: prefer array-of-bytes initializer for hash literals
Fix for clar unit tests to support CMake build.
* ps/clar-build-improvement:
Makefile: let clar header targets depend on their scripts
cmake: use verbatim arguments when invoking clar commands
cmake: use SH_EXE to execute clar scripts
t/unit-tests: convert "clar-generate.awk" into a shell script
Documentation for "git bundle" saw improvements to more prominently
call out the use of '--all' when creating bundles.
* kh/bundle-docs:
Documentation/git-bundle.txt: discuss naïve backups
Documentation/git-bundle.txt: mention --all in spec. refs
Documentation/git-bundle.txt: remove old `--all` example
Documentation/git-bundle.txt: mention full backup example
* maint-2.46:
Git 2.46.3
Git 2.45.3
Git 2.44.3
Git 2.43.6
Git 2.42.4
Git 2.41.3
Git 2.40.4
credential: disallow Carriage Returns in the protocol by default
credential: sanitize the user prompt
credential_format(): also encode <host>[:<port>]
t7300: work around platform-specific behaviour with long paths on MinGW
compat/regex: fix argument order to calloc(3)
mingw: drop bogus (and unneeded) declaration of `_pgmptr`
ci: remove 'Upload failed tests' directories' step from linux32 jobs
* maint-2.45:
Git 2.45.3
Git 2.44.3
Git 2.43.6
Git 2.42.4
Git 2.41.3
Git 2.40.4
credential: disallow Carriage Returns in the protocol by default
credential: sanitize the user prompt
credential_format(): also encode <host>[:<port>]
t7300: work around platform-specific behaviour with long paths on MinGW
compat/regex: fix argument order to calloc(3)
mingw: drop bogus (and unneeded) declaration of `_pgmptr`
ci: remove 'Upload failed tests' directories' step from linux32 jobs
* maint-2.44:
Git 2.44.3
Git 2.43.6
Git 2.42.4
Git 2.41.3
Git 2.40.4
credential: disallow Carriage Returns in the protocol by default
credential: sanitize the user prompt
credential_format(): also encode <host>[:<port>]
t7300: work around platform-specific behaviour with long paths on MinGW
compat/regex: fix argument order to calloc(3)
mingw: drop bogus (and unneeded) declaration of `_pgmptr`
ci: remove 'Upload failed tests' directories' step from linux32 jobs
* maint-2.43:
Git 2.43.6
Git 2.42.4
Git 2.41.3
Git 2.40.4
credential: disallow Carriage Returns in the protocol by default
credential: sanitize the user prompt
credential_format(): also encode <host>[:<port>]
t7300: work around platform-specific behaviour with long paths on MinGW
compat/regex: fix argument order to calloc(3)
mingw: drop bogus (and unneeded) declaration of `_pgmptr`
ci: remove 'Upload failed tests' directories' step from linux32 jobs
* maint-2.42:
Git 2.42.4
Git 2.41.3
Git 2.40.4
credential: disallow Carriage Returns in the protocol by default
credential: sanitize the user prompt
credential_format(): also encode <host>[:<port>]
t7300: work around platform-specific behaviour with long paths on MinGW
compat/regex: fix argument order to calloc(3)
mingw: drop bogus (and unneeded) declaration of `_pgmptr`
ci: remove 'Upload failed tests' directories' step from linux32 jobs
* maint-2.41:
Git 2.41.3
Git 2.40.4
credential: disallow Carriage Returns in the protocol by default
credential: sanitize the user prompt
credential_format(): also encode <host>[:<port>]
t7300: work around platform-specific behaviour with long paths on MinGW
compat/regex: fix argument order to calloc(3)
mingw: drop bogus (and unneeded) declaration of `_pgmptr`
ci: remove 'Upload failed tests' directories' step from linux32 jobs
* maint-2.40:
Git 2.40.4
credential: disallow Carriage Returns in the protocol by default
credential: sanitize the user prompt
credential_format(): also encode <host>[:<port>]
t7300: work around platform-specific behaviour with long paths on MinGW
compat/regex: fix argument order to calloc(3)
mingw: drop bogus (and unneeded) declaration of `_pgmptr`
ci: remove 'Upload failed tests' directories' step from linux32 jobs
This addresses two vulnerabilities:
- CVE-2024-50349:
Printing unsanitized URLs when asking for credentials made the
user susceptible to crafted URLs (e.g. in recursive clones) that
mislead the user into typing in passwords for trusted sites that
would then be sent to untrusted sites instead.
- CVE-2024-52006
Git may pass on Carriage Returns via the credential protocol to
credential helpers which use line-reading functions that
interpret said Carriage Returns as line endings, even though Git
did not intend that.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
In cfd971520e (refs: keep track of unresolved reference value in
iterators, 2024-08-09), we added a new field "referent" into the "struct
ref" structure. In order to free the "referent", we unconditionally
freed the "referent" by simply adding a "free" statement.
However, this is a bad usage. Because when ref entry is either directory
or loose ref, we will always execute the following statement:
free(entry->u.value.referent);
This does not make sense. We should never access the "entry->u.value"
field when "entry" is a directory. However, the change obviously doesn't
break the tests. Let's analysis why.
The anonymous union in the "ref_entry" has two members: one is "struct
ref_value", another is "struct ref_dir". On a 64-bit machine, the size
of "struct ref_dir" is 32 bytes, which is smaller than the 48-byte size
of "struct ref_value". And the offset of "referent" field in "struct
ref_value" is 40 bytes. So, whenever we create a new "ref_entry" for a
directory, we will leave the offset from 40 bytes to 48 bytes untouched,
which means the value for this memory is zero (NULL). It's OK to free a
NULL pointer, but this is merely a coincidence of memory layout.
To fix this issue, we now ensure that "free(entry->u.value.referent)" is
only called when "entry->flag" indicates that it represents a loose
reference and not a directory to avoid the invalid memory operation.
Signed-off-by: shejialuo <shejialuo@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
While Git has documented that the credential protocol is line-based,
with newlines as terminators, the exact shape of a newline has not been
documented.
From Git's perspective, which is firmly rooted in the Linux ecosystem,
it is clear that "a newline" means a Line Feed character.
However, even Git's credential protocol respects Windows line endings
(a Carriage Return character followed by a Line Feed character, "CR/LF")
by virtue of using `strbuf_getline()`.
There is a third category of line endings that has been used originally
by MacOS, and that is respected by the default line readers of .NET and
node.js: bare Carriage Returns.
Git cannot handle those, and what is worse: Git's remedy against
CVE-2020-5260 does not catch when credential helpers are used that
interpret bare Carriage Returns as newlines.
Git Credential Manager addressed this as CVE-2024-50338, but other
credential helpers may still be vulnerable. So let's not only disallow
Line Feed characters as part of the values in the credential protocol,
but also disallow Carriage Return characters.
In the unlikely event that a credential helper relies on Carriage
Returns in the protocol, introduce an escape hatch via the
`credential.protectProtocol` config setting.
This addresses CVE-2024-52006.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
When asking the user interactively for credentials, we want to avoid
misleading them e.g. via control sequences that pretend that the URL
targets a trusted host when it does not.
While Git learned, over the course of the preceding commits, to disallow
URLs containing URL-encoded control characters by default, credential
helpers are still allowed to specify values very freely (apart from Line
Feed and NUL characters, anything is allowed), and this would allow,
say, a username containing control characters to be specified that would
then be displayed in the interactive terminal prompt asking the user for
the password, potentially sending those control characters directly to
the terminal. This is undesirable because control characters can be used
to mislead users to divulge secret information to untrusted sites.
To prevent such an attack vector, let's add a `git_prompt()` that forces
the displayed text to be sanitized, i.e. displaying question marks
instead of control characters.
Note: While this commit's diff changes a lot of `user@host` strings to
`user%40host`, which may look suspicious on the surface, there is a good
reason for that: this string specifies a user name, not a
<username>@<hostname> combination! In the context of t5541, the actual
combination looks like this: `user%40@127.0.0.1:5541`. Therefore, these
string replacements document a net improvement introduced by this
commit, as `user@host@127.0.0.1` could have left readers wondering where
the user name ends and where the host name begins.
Hinted-at-by: Jeff King <peff@peff.net>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
An upcoming change wants to sanitize the credential password prompt
where a URL is displayed that may potentially come from a `.gitmodules`
file. To this end, the `credential_format()` function is employed.
To sanitize the host name (and optional port) part of the URL, we need a
new mode of the `strbuf_add_percentencode()` function because the
current mode is both too strict and too lenient: too strict because it
encodes `:`, `[` and `]` (which should be left unencoded in
`<host>:<port>` and in IPv6 addresses), and too lenient because it does
not encode invalid host name characters `/`, `_` and `~`.
So let's introduce and use a new mode specifically to encode the host
name and optional port part of a URI, leaving alpha-numerical
characters, periods, colons and brackets alone and encoding all others.
This only leads to a change of behavior for URLs that contain invalid
host names.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
When reading references the reftable backend has to:
1. Create a new ref iterator.
2. Seek the iterator to the record we're searching for.
3. Read the record.
We cannot really avoid the last two steps, but re-creating the iterator
every single time we want to read a reference is kind of expensive and a
waste of resources. We couldn't help it in the past though because it
was not possible to reuse iterators. But starting with 5bf96e0c39
(reftable/generic: move seeking of records into the iterator,
2024-05-13) we have split up the iterator lifecycle such that creating
the iterator and seeking are two different concerns.
Refactor the code such that we cache iterators in the reftable backend.
This cache is invalidated whenever the respective stack is reloaded such
that we know to recreate the iterator in that case. This leads to a
sizeable speedup when creating many refs, which requires a lot of random
reference reads:
Benchmark 1: update-ref: create many refs (refcount = 100000, revision = master)
Time (mean ± σ): 1.793 s ± 0.010 s [User: 0.954 s, System: 0.835 s]
Range (min … max): 1.781 s … 1.811 s 10 runs
Benchmark 2: update-ref: create many refs (refcount = 100000, revision = HEAD)
Time (mean ± σ): 1.680 s ± 0.013 s [User: 0.846 s, System: 0.831 s]
Range (min … max): 1.664 s … 1.702 s 10 runs
Summary
update-ref: create many refs (refcount = 100000, revision = HEAD) ran
1.07 ± 0.01 times faster than update-ref: create many refs (refcount = 100000, revision = master)
While 7% is not a huge win, you have to consider that the benchmark is
_writing_ data, so _reading_ references is only one part of what we do.
Flame graphs show that we spend around 40% of our time reading refs, so
the speedup when reading refs is approximately ~2.5x that. I could not
find better benchmarks where we perform a lot of random ref reads.
You can also see a sizeable impact on memory usage when creating 100k
references. Before this change:
HEAP SUMMARY:
in use at exit: 19,112,538 bytes in 200,170 blocks
total heap usage: 8,400,426 allocs, 8,200,256 frees, 454,367,048 bytes allocated
After this change:
HEAP SUMMARY:
in use at exit: 674,416 bytes in 169 blocks
total heap usage: 7,929,872 allocs, 7,929,703 frees, 281,509,985 bytes allocated
As an additional factor, this refactoring opens up the possibility for
more performance optimizations in how we re-seek iterators. Any change
that allows us to optimize re-seeking by e.g. reusing data structures
would thus also directly speed up random reads.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In 5bf96e0c39 (reftable/generic: move seeking of records into the
iterator, 2024-05-13) we have refactored the reftable codebase such that
iterators can be initialized once and then re-seeked multiple times.
This feature is used by 1869525066 (refs/reftable: wire up support for
exclude patterns, 2024-09-16) in order to skip records based on exclude
patterns provided by the caller.
The logic to re-seek the merged iterator is insufficient though because
we don't drain the priority queue on a re-seek. This means that the
queue may contain stale entries and thus reading the next record in the
queue will return the wrong entry. While this is an obvious bug, it is
harmless in the context of above exclude patterns:
- If the queue contained stale entries that match the pattern then the
caller would already know to filter out such refs. This is because
our codebase is prepared to handle backends that don't have a way to
efficiently implement exclude patterns.
- If the queue contained stale entries that don't match the pattern
we'd eventually filter out any duplicates. This is because the
reftable code discards items with the same ref name and sorts any
remaining entries properly.
So things happen to work in this context regardless of the bug, and
there is no other use case yet where we re-seek iterators. We're about
to introduce a caching mechanism though where iterators are reused by
the reftable backend, and that will expose the bug.
Fix the issue by draining the priority queue when seeking and add a
testcase that surfaces the issue.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Reftable stacks are reloaded in two cases:
- When calling `reftable_stack_reload()`, if the stat-cache tells us
that the stack has been modified.
- When committing a reftable addition.
While callers can figure out the second case, they do not have a
mechanism to figure out whether `reftable_stack_reload()` led to an
actual reload of the on-disk data. All they can do is thus to assume
that data is always being reloaded in that case.
Improve the situation by introducing a new `on_reload()` callback to the
reftable options. If provided, the function will be invoked every time
the stack has indeed been reloaded. This allows callers to invalidate
data that depends on the current stack data.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Refactor the callback function that expires reflog entries in the
reftable backend to use `reftable_backend_read_ref()` instead of
accessing the reftable stack directly. This ensures that the function
will benefit from the new caching layer that we're about to introduce.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Refactor the callback function that reads symbolic references in the
reftable backend to use `reftable_backend_read_ref()` instead of
accessing the reftable stack directly. This ensures that the function
will benefit from the new caching layer that we're about to introduce.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Refactor `read_ref_without_reload()` to accept `struct reftable_backend`
as parameter instead of `struct reftable_stack`. Rename the function to
`reftable_backend_read_ref()` to clarify its scope and move it close to
other functions operating on `struct reftable_backend`.
This change allows us to implement an additional caching layer when
reading refs where we can reuse reftable iterators.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The function `read_ref_without_reload()` accepts a ref store as input
only so that we can figure out the hash function used by it. This is
duplicate information though because the reftable stack knows about its
hash function, too.
Drop the superfluous parameter to simplify the calling convention a bit.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add an accessor function that allows callers to access the hash ID of a
reftable stack. This function will be used in a subsequent commit.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When accessing a stack we almost always have to reload the stack before
reading data from it. This is mostly because Git does not have a
notification mechanism for when underlying data has been changed, and
thus we are forced to opportunistically reload the stack every single
time to account for any changes that may have happened concurrently.
Handle the reload internally in `backend_for()`. For one this forces
callsites to think about whether or not they need to reload the stack.
But second this makes the logic to access stacks more self-contained by
letting the `struct reftable_backend` manage themselves.
Update callsites where we don't reload the stack to document why we
don't. In some cases it's unclear whether it is the right thing to do in
the first place, but fixing that is outside of the scope of this patch
series.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The reftable ref store needs to keep track of multiple stacks, one for
the main worktree and an arbitrary number of stacks for worktrees. This
is done by storing pointers to `struct reftable_stack`, which we then
access directly.
Wrap the stack in a new `struct reftable_backend`. This will allow us to
attach more data to each respective stack in subsequent commits.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In 9b1cb5070f (builtin: add a repository parameter for builtin
functions, 2024-09-13) the repository was passed down to all builtin
commands. This allowed the repository to be passed down to lower layers
without depending on the global `the_repository` variable.
Continue this work by also passing down the repository parameter from
the command to sub-commands. This will help pass down the repository to
other subsystems and cleanup usage of global variables like
'the_repository' and 'the_hash_algo'.
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
If a user specified e.g.
M 100644 :1 ../some-file
then fast-import previously would happily create a git history where
there is a tree in the top-level directory named "..", and with a file
inside that directory named "some-file". The top-level ".." directory
causes problems. While git checkout will die with errors and fsck will
report hasDotdot problems, the user is going to have problems trying to
remove the problematic file. Simply avoid creating this bad history in
the first place.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Coverity has started to warn about a potential double-free in
`find_bisection()`. This warning is triggered because we may modify the
list head of the passed-in `commit_list` in case it is an UNINTERESTING
commit, but still call `free_commit_list()` on the original variable
that points to the now-freed head in case where `do_find_bisection()`
returns a `NULL` pointer.
As far as I can see, this double free cannot happen in practice, as
`do_find_bisection()` only returns a `NULL` pointer when it was passed a
`NULL` input. So in order to trigger the double free we would have to
call `find_bisection()` with a commit list that only consists of
UNINTERESTING commits, but I have not been able to construct a case
where that happens.
Drop the `else` branch entirely as it seems to be a no-op anyway.
Another option might be to instead call `free_commit_list()` on `list`,
which is the modified version of `commit_list` and thus wouldn't cause a
double free. But as mentioned, I couldn't come up with any case where a
passed-in non-NULL list becomes empty, so this shouldn't be necessary.
And if it ever does become necessary we'd notice anyway via the leak
sanitizer.
Interestingly enough we did not have a single test exercising this
branch: all tests pass just fine even when replacing it with a call to
`BUG()`. Add a test that exercises it.
Reported-by: Jeff King <peff@peff.net>
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* ps/leakfixes-part-10: (27 commits)
t: remove TEST_PASSES_SANITIZE_LEAK annotations
test-lib: unconditionally enable leak checking
t: remove unneeded !SANITIZE_LEAK prerequisites
t: mark some tests as leak free
t5601: work around leak sanitizer issue
git-compat-util: drop now-unused `UNLEAK()` macro
global: drop `UNLEAK()` annotation
t/helper: fix leaking commit graph in "read-graph" subcommand
builtin/branch: fix leaking sorting options
builtin/init-db: fix leaking directory paths
builtin/help: fix leaks in `check_git_cmd()`
help: fix leaking return value from `help_unknown_cmd()`
help: fix leaking `struct cmdnames`
help: refactor to not use globals for reading config
builtin/sparse-checkout: fix leaking sanitized patterns
split-index: fix memory leak in `move_cache_to_base_index()`
git: refactor builtin handling to use a `struct strvec`
git: refactor alias handling to use a `struct strvec`
strvec: introduce new `strvec_splice()` function
line-log: fix leak when rewriting commit parents
...
The rebase todo editor has commands like `fixup -c` which affects
the commit messages of the rebased commits.[1] For example:
pick hash1 <msg>
fixup hash2 <msg>
fixup -c hash3 <msg>
This says that hash2 and hash3 should be squashed into hash1 and
that hash3’s commit message should be used for the resulting commit.
So the user is presented with an editor where the two first commit
messages are commented out and the third is not. However this does
not work if `core.commentChar`/`core.commentString` is in use since
the comment char is hardcoded (#) in this `sequencer.c` function.
As a result the first commit message will not be commented out.
† 1: See 9e3cebd97cb (rebase -i: add fixup [-C | -c] command,
2021-01-29)
Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Co-authored-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Reported-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Kristoffer Haugsbakk <code@khaugsbakk.name>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
`git revert --reference <commit>` leaves behind a comment in the
first line:[1]
# *** SAY WHY WE ARE REVERTING ON THE TITLE LINE ***
Meaning that the commit will just consist of the next line if the user
exits the editor directly:
This reverts commit <--format=reference commit>
But the comment char here is hardcoded (#). Which means that the
comment line will inadvertently be included in the commit message if
`core.commentChar`/`core.commentString` is in use.
† 1: See 43966ab3156 (revert: optionally refer to commit in the
"reference" format, 2022-05-26)
Signed-off-by: Kristoffer Haugsbakk <code@khaugsbakk.name>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
`git rebase --update-ref` does not insert commands for dependent/sub-
branches which are checked out.[1] Instead it leaves a comment about
that fact. The comment char is hardcoded (#). In turn the comment
line gets interpreted as an invalid command when `core.commentChar`/
`core.commentString` is in use.
† 1: See 900b50c242 (rebase: add --update-refs option, 2022-07-19)
Signed-off-by: Kristoffer Haugsbakk <code@khaugsbakk.name>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Both `struct block_writer` and `struct reftable_writer` have a `buf`
member that is being reused to optimize the number of allocations.
Rename the variable to `scratch` to clarify its intend and provide a
comment explaining why it exists.
Suggested-by: Christian Couder <christian.couder@gmail.com>
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The `initial_transaction` flag is tracked as a signed integer, but we
typically pass around flags via unsigned integers. Adapt the type
accordingly.
Suggested-by: Christian Couder <christian.couder@gmail.com>
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We have recently added a new test to t7900 that exercises whether
git-maintenance(1) fails as expected when the "schedule.lock" file
exists. The test depends on whether or not the host has the required
executables present to schedule maintenance tasks in the first place,
like systemd or launchctl -- if not, the test fails with an unrelated
error before even checking for the lock file. This fails for example in
our CI systems, where macOS images do not have launchctl available.
Fix this issue by creating a stub systemctl(1) binary and using the
systemd scheduler.
Reported-by: Jeff King <peff@peff.net>
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Even though `git help cli` recommends users to prefer using
"--option=value" over "--option value", there can be reasons why
giving them separately is a good idea. One reason is that shells do
not perform tilde expansion for `--option=~/path/name` but they
expand `--options ~/path/name` just fine.
This is not a problem for many options whose option parsing is
properly written using OPT_FILENAME(), because the value given to
OPT_FILENAME() is tilde-expanded internally by us, but some commands
take a pathname as a mere string, which needs this trick to have the
shell help us.
I think the reason we originally decided to recommend the stuck form
was because an option that takes an optional value requires you to
use it in the stuck form, and it is one less thing for users to
worry about if they get into the habit to always use the stuck form.
But we should be discouraging ourselves from adding an option with
an optional value in the first place, and we might want to weaken
the current recommendation.
In any case, let's describe this one case where it is necessary to
use the separate form, with an example.
Reviewed-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Code clean-up.
* jk/output-prefix-cleanup:
diff: store graph prefix buf in git_graph struct
diff: return line_prefix directly when possible
diff: return const char from output_prefix callback
diff: drop line_prefix_length field
line-log: use diff_line_prefix() instead of custom helper
Doc update to clarify how periodical maintenance are scheduled,
spread across time to avoid thundering hurds.
* sk/doc-maintenance-schedule:
doc: add a note about staggering of maintenance
* 'master' of https://github.com/j6t/gitk:
Makefile(s): avoid recipe prefix in conditional statements
doc: switch links to https
doc: update links to current pages
Since the introduction of 'initialize_merge_tool' in de8dafbada
(mergetool: break setup_tool out into separate initialization function,
2021-02-09), any errors from this function are ignored in
git-difftool--helper.sh::launch_merge_tool, which is not the case for
its call in git-mergetool.sh::merge_file.
Despite the in-code comment, initialize_merge_tool (via its call to
setup_tool) does different checks than run_merge_tool, so it makes sense
to abort early if it encounters errors. Add exit calls if
initialize_merge_tool fails.
Signed-off-by: Philippe Blain <levraiphilippeblain@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In setup_tool, we check if the given tool is a known variant of a tool,
and quietly return with an error if not. This leads to the following
invocation quietly failing:
git mergetool --tool=vimdiff4
Add an error message before returning in this case.
Signed-off-by: Philippe Blain <levraiphilippeblain@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In git-mergetool--lib.sh::setup_tool, we check if the given tool is a
known builtin tool, a known variant, or a user-defined tool by calling
setup_user_tool, and we return with the exit code from setup_user_tool
if it was called. setup_user_tool checks if {diff,merge}tool.$tool.cmd
is set and quietly returns with an error if not.
This leads to the following invocation quietly failing:
git mergetool --tool=unknown
which is not very user-friendly. Adjust setup_tool to output an error
message before returning if setup_user_tool returned with an error.
Note that we do not check the result of the second call to
setup_user_tool in setup_tool, as this call is only meant to allow users
to redefine 'cmd' for a builtin tool; it is not an error if they have
not done so.
Note that this behaviour of quietly failing is a regression dating back
to de8dafbada (mergetool: break setup_tool out into separate
initialization function, 2021-02-09), as before this commit an unknown
mergetool would be diagnosed in get_merge_tool_path when called from
run_merge_tool.
Signed-off-by: Philippe Blain <levraiphilippeblain@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In git-mergetool--lib.sh::get_merge_tool_path, we check if the chosen
tool is valid via valid_tool and exit with an error message if not. This
error message mentions "Unknown merge tool", even if the command the
user tried was 'git difftool --tool=unknown'. Use the global 'TOOL_MODE'
variable for a more correct error message.
Signed-off-by: Philippe Blain <levraiphilippeblain@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When adding a remote to bare repository with "git remote add --mirror",
running fetch will fail to update HEAD to the remote's HEAD, since it
does not know how to handle bare repositories. On the other hand HEAD
already has content, since "git init --bare" has already set HEAD to
whatever is the default branch set for the user. Unless this - by chance
- is the same as the remote's HEAD, HEAD will be pointing to a bad
symref. Teach set_head to handle bare repositories, by overwriting HEAD
so it mirrors the remote's HEAD.
Note, that in this case overriding the local HEAD reference is
necessary, since HEAD will exist before fetch can be run, but this
should not be an issue, since the whole purpose of --mirror is to be an
exact mirror of the remote, so following any changes to HEAD makes
sense.
Also note, that although "git remote set-head" also fails when trying to
update the remote's locally tracked HEAD in a mirrored bare repository,
the usage of the command does not make much sense after this patch:
fetch will update the remote HEAD correctly, and setting it manually to
something else is antithetical to the concept of mirroring.
Signed-off-by: Bence Ferdinandy <bence@ferdinandy.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When cloning a repository remote/HEAD is created, but when the user
creates a repository with git init, and later adds a remote, remote/HEAD
is only created if the user explicitly runs a variant of "remote
set-head". Attempt to set remote/HEAD during fetch, if the user does not
have it already set. Silently ignore any errors.
Signed-off-by: Bence Ferdinandy <bence@ferdinandy.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Allow the caller to specify that it only wants to update the symref if
it does not already exist. Silently ignore the error from the
transaction API if the symref already exists.
Signed-off-by: Bence Ferdinandy <bence@ferdinandy.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Currently there is only one special error for transaction, for when
there is a naming conflict, all other errors are dumped under a generic
error. Add a new special error case for when the caller requests the
reference to be updated only when it does not yet exist and the
reference actually does exist.
Signed-off-by: Bence Ferdinandy <bence@ferdinandy.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Currently, set-head --auto will print a message saying "remote/HEAD set
to branch", which implies something was changed.
Change the output of --auto, so the output actually reflects what was
done: a) set a previously unset HEAD, b) change HEAD because remote
changed or c) no updates. As edge cases, if HEAD is changed from
a previous symbolic reference that was not a remote branch, explicitly
call attention to this fact, and also notify the user if the previous
reference was not a symbolic reference.
Signed-off-by: Bence Ferdinandy <bence@ferdinandy.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Make two different readability refactors:
Rename strbufs "buf" and "buf2" to something more explanatory.
Instead of calling get_main_ref_store(the_repository) multiple times,
call it once and store the result in a new refs variable. Although this
change probably offers some performance benefits, the main purpose is to
shorten the line lengths of function calls using this variable.
Signed-off-by: Bence Ferdinandy <bence@ferdinandy.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When updating a symref with update_symref it's currently not possible to
know for sure what was the previous value that was overwritten. Extend
refs_update_symref under a new function name, to record the value after
the ref has been locked if the caller of refs_update_symref_extended
requests it via a new variable in the function call. Make the return
value of the function notify the caller, if the previous value was
actually not a symbolic reference. Keep the original refs_update_symref
function with the same signature, but now as a wrapper around
refs_update_symref_extended.
Signed-off-by: Bence Ferdinandy <bence@ferdinandy.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When the symbolic reference we want to read with refs_read_symbolic_ref
is actually not a symbolic reference, the files and the reftable
backends return different values (1 and -1 respectively). Standardize
the returned values so that 0 is success, -1 is a generic error and -2
is that the reference was actually non-symbolic.
Signed-off-by: Bence Ferdinandy <bence@ferdinandy.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The test coverage was missing a test for the failure branch of remote
set-head auto's output. Add the missing text and while we are at it,
correct a small grammatical mistake in the error's output ("setup" is
the noun, "set up" is the verb).
Signed-off-by: Bence Ferdinandy <bence@ferdinandy.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Consider the bare repository called "mirror" in the test. Running `git
remote add --mirror -f origin ../one` will not change HEAD, consequently
if init.defaultBranch is not the same as what HEAD in the remote
("one"), HEAD in "mirror" will be pointing to a non-existent reference.
Hence if "mirror" is used as a remote by yet another repository,
ls-remote will not show HEAD. On the other hand, if init.defaultBranch
happens to match HEAD in "one", then ls-remote will show HEAD.
Since the "ci/run-build-and-tests.sh" script globally exports
GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=main for some (but not all) jobs,
there may be a drift in some tests between how the test repositories are
set up in the CI and during local testing, if the test itself uses
"master" as default instead of "main". In particular, this happens in
t5505-remote.sh. This issue does not manifest currently, as the test
does not do any remote HEAD manipulation where this would come up, but
should such things be added, a locally passing test would break the CI
and vice-versa.
Set GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=main in t5505-remote to be
consistent with the CI.
Signed-off-by: Bence Ferdinandy <bence@ferdinandy.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
If the main window is already visible when gitk waits for it to
become visible, gitk hangs forever.
This commit adds a check whether the window is already visible.
See https://wiki.tcl-lang.org/page/tkwait+visibility
Signed-off-by: Tobias Pietzsch <pietzsch@mycroft.speedport.ip>
Signed-off-by: Johannes Sixt <j6t@kdbg.org>
In GNU Make commit 07fcee35 ([SV 64815] Recipe lines cannot contain
conditional statements, 2023-05-22) and following, conditional
statements may no longer be preceded by a tab character (which Make
refers to as the recipe prefix).
There are a handful of spots in our various Makefile(s) which will break
in a future release of Make containing 07fcee35. For instance, trying to
compile the pre-image of this patch with the tip of make.git results in
the following:
$ make -v | head -1 && make
GNU Make 4.4.90
config.mak.uname:842: *** missing 'endif'. Stop.
The kernel addressed this issue in 82175d1f9430 (kbuild: Replace tabs
with spaces when followed by conditionals, 2024-01-28). Address the
issues in Git's tree by applying the same strategy.
When a conditional word (ifeq, ifneq, ifdef, etc.) is preceded by one or
more tab characters, replace each tab character with 8 space characters
with the following:
find . -type f -not -path './.git/*' -name Makefile -or -name '*.mak' |
xargs perl -i -pe '
s/(\t+)(ifn?eq|ifn?def|else|endif)/" " x (length($1) * 8) . $2/ge unless /\\$/
'
The "unless /\\$/" removes any false-positives (like "\telse \"
appearing within a shell script as part of a recipe).
After doing so, Git compiles on newer versions of Make:
$ make -v | head -1 && make
GNU Make 4.4.90
GIT_VERSION = 2.44.0.414.gfac1dc44ca9
[...]
$ echo $?
0
Reported-by: Dario Gjorgjevski <dario.gjorgjevski@gmail.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Cherry-picked-from: 728b9ac0c3b93aaa4ea80280c591deb198051785
Signed-off-by: Johannes Sixt <j6t@kdbg.org>
These sites offer https versions of their content.
Using the https versions provides some protection for users.
Signed-off-by: Josh Soref <jsoref@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Cherry-picked-from: d05b08cd52cfda627f1d865bdfe6040a2c9521b5
Signed-off-by: Johannes Sixt <j6t@kdbg.org>
It's somewhat traditional to respect sites' self-identification.
Signed-off-by: Josh Soref <jsoref@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Cherry-picked-from: 65175d9ea26bebeb9d69977d0e75efc0e88dbced
Signed-off-by: Johannes Sixt <j6t@kdbg.org>
Object reuse code based on multi-pack-index sent an unwanted copy
of object.
* tb/multi-pack-reuse-dupfix:
pack-objects: only perform verbatim reuse on the preferred pack
t5332-multi-pack-reuse.sh: demonstrate duplicate packing failure
Double-free fix.
* jk/fetch-prefetch-double-free-fix:
refspec: store raw refspecs inside refspec_item
refspec: drop separate raw_nr count
fetch: adjust refspec->raw_nr when filtering prefetch refspecs
Avoid build/test breakage on a system without working malloc debug
support dynamic library.
* jk/test-malloc-debug-check:
test-lib: move malloc-debug setup after $PATH setup
test-lib: check malloc debug LD_PRELOAD before using
The perf test suite prefers to use test_file_size over 'wc -c' when
inside of a test_size block. One advantage is that accidentally writign
"wc -c file" (instead of "wc -c <file") does not inadvertently break the
tests (since the former will include the filename in the output of wc).
Both of the two uses of test_size use "wc -c", but let's convert those
to the more conventional test_file_size helper instead.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In the boundary-based bitmap traversal, we use the given 'rev_info'
structure to first do a commit-only walk in order to determine the
boundary between interesting and uninteresting objects.
That walk only looks at commit objects, regardless of the state of
revs->blob_objects, revs->tree_objects, and so on. In order to do this,
we store the state of these variables in temporary fields before
setting them back to zero, performing the traversal, and then setting
them back.
But there is a typo here that dates back to b0afdce5da (pack-bitmap.c:
use commit boundary during bitmap traversal, 2023-05-08), where we
incorrectly store the value of the "tags" field as "revs->blob_objects".
This could lead to problems later on if, say, the caller wants tag
objects but *not* blob objects. In the pre-image behavior, we'd set
revs->tag_objects back to the old value of revs->blob_objects, thus
emitting fewer objects than expected back to the caller.
Fix that by correctly assigning the value of 'revs->tag_objects' to the
'tmp_tags' field.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Now that the default value for TEST_PASSES_SANITIZE_LEAK is `true` there
is no longer a need to have that variable declared in all of our tests.
Drop it.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Over the last two releases we have plugged a couple hundred of memory
leaks exposed by the Git test suite. With the preceding commits we have
finally fixed the last leak exposed by our test suite, which means that
we are now basically leak free wherever we have branch coverage.
From hereon, the Git test suite should ideally stay free of memory
leaks. Most importantly, any test suite that is being added should
automatically be subject to the leak checker, and if that test does not
pass it is a strong signal that the added code introduced new memory
leaks and should not be accepted without further changes.
Drop the infrastructure around TEST_PASSES_SANITIZE_LEAK to reflect this
new requirement. Like this, all test suites will be subject to the leak
checker by default.
This is being intentionally strict, but we still have an escape hatch:
the SANITIZE_LEAK prerequisite. There is one known case in t5601 where
the leak sanitizer itself is buggy, so adding this prereq in such cases
is acceptable. Another acceptable situation is when a newly added test
uncovers preexisting memory leaks: when fixing that memory leak would be
sufficiently complicated it is fine to annotate and document the leak
accordingly. But in any case, the burden is now on the patch author to
explain why exactly they have to add the SANITIZE_LEAK prerequisite.
The TEST_PASSES_SANITIZE_LEAK annotations will be dropped in the next
patch.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We have a couple of !SANITIZE_LEAK prerequisites for tests that used to
fail due to memory leaks. These have all been fixed by now, so let's
drop the prerequisite.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Both t5558 and t5601 are leak-free starting with 6dab49b9fb (bundle-uri:
plug leak in unbundle_from_file(), 2024-10-10). Mark them accordingly.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When running t5601 with the leak checker enabled we can see a hang in
our CI systems. This hang seems to be system-specific, as I cannot
reproduce it on my own machine.
As it turns out, the issue is in those testcases that exercise cloning
of `~repo`-style paths. All of the testcases that hang eventually end up
interpreting "repo" as the username and will call getpwnam(3p) with that
username. That should of course be fine, and getpwnam(3p) should just
return an error. But instead, the leak sanitizer seems to be recursing
while handling a call to `free()` in the NSS modules:
#0 0x00007ffff7fd98d5 in _dl_update_slotinfo (req_modid=1, new_gen=2) at ../elf/dl-tls.c:720
#1 0x00007ffff7fd9ac4 in update_get_addr (ti=0x7ffff7a91d80, gen=<optimized out>) at ../elf/dl-tls.c:916
#2 0x00007ffff7fdc85c in __tls_get_addr () at ../sysdeps/x86_64/tls_get_addr.S:55
#3 0x00007ffff7a27e04 in __lsan::GetAllocatorCache () at ../../../../src/libsanitizer/lsan/lsan_linux.cpp:27
#4 0x00007ffff7a2b33a in __lsan::Deallocate (p=0x0) at ../../../../src/libsanitizer/lsan/lsan_allocator.cpp:127
#5 __lsan::lsan_free (p=0x0) at ../../../../src/libsanitizer/lsan/lsan_allocator.cpp:220
...
#261505 0x00007ffff7fd99f2 in free (ptr=<optimized out>) at ../include/rtld-malloc.h:50
#261506 _dl_update_slotinfo (req_modid=1, new_gen=2) at ../elf/dl-tls.c:822
#261507 0x00007ffff7fd9ac4 in update_get_addr (ti=0x7ffff7a91d80, gen=<optimized out>) at ../elf/dl-tls.c:916
#261508 0x00007ffff7fdc85c in __tls_get_addr () at ../sysdeps/x86_64/tls_get_addr.S:55
#261509 0x00007ffff7a27e04 in __lsan::GetAllocatorCache () at ../../../../src/libsanitizer/lsan/lsan_linux.cpp:27
#261510 0x00007ffff7a2b33a in __lsan::Deallocate (p=0x5020000001e0) at ../../../../src/libsanitizer/lsan/lsan_allocator.cpp:127
#261511 __lsan::lsan_free (p=0x5020000001e0) at ../../../../src/libsanitizer/lsan/lsan_allocator.cpp:220
#261512 0x00007ffff793da25 in module_load (module=0x515000000280) at ./nss/nss_module.c:188
#261513 0x00007ffff793dee5 in __nss_module_load (module=0x515000000280) at ./nss/nss_module.c:302
#261514 __nss_module_get_function (module=0x515000000280, name=name@entry=0x7ffff79b9128 "getpwnam_r") at ./nss/nss_module.c:328
#261515 0x00007ffff793e741 in __GI___nss_lookup_function (fct_name=<optimized out>, ni=<optimized out>) at ./nss/nsswitch.c:137
#261516 __GI___nss_next2 (ni=ni@entry=0x7fffffffa458, fct_name=fct_name@entry=0x7ffff79b9128 "getpwnam_r", fct2_name=fct2_name@entry=0x0, fctp=fctp@entry=0x7fffffffa460,
status=status@entry=0, all_values=all_values@entry=0) at ./nss/nsswitch.c:120
#261517 0x00007ffff794c6a7 in __getpwnam_r (name=name@entry=0x501000000060 "repo", resbuf=resbuf@entry=0x7ffff79fb320 <resbuf>, buffer=<optimized out>,
buflen=buflen@entry=1024, result=result@entry=0x7fffffffa4b0) at ../nss/getXXbyYY_r.c:343
#261518 0x00007ffff794c4d8 in getpwnam (name=0x501000000060 "repo") at ../nss/getXXbyYY.c:140
#261519 0x00005555557e37ff in getpw_str (username=0x5020000001a1 "repo", len=4) at path.c:613
#261520 0x00005555557e3937 in interpolate_path (path=0x5020000001a0 "~repo", real_home=0) at path.c:654
#261521 0x00005555557e3aea in enter_repo (path=0x501000000040 "~repo", strict=0) at path.c:718
#261522 0x000055555568f0ba in cmd_upload_pack (argc=1, argv=0x502000000100, prefix=0x0, repo=0x0) at builtin/upload-pack.c:57
#261523 0x0000555555575ba8 in run_builtin (p=0x555555a20c98 <commands+3192>, argc=2, argv=0x502000000100, repo=0x555555a53b20 <the_repo>) at git.c:481
#261524 0x0000555555576067 in handle_builtin (args=0x7fffffffaab0) at git.c:742
#261525 0x000055555557678d in cmd_main (argc=2, argv=0x7fffffffac58) at git.c:912
#261526 0x00005555556963cd in main (argc=2, argv=0x7fffffffac58) at common-main.c:64
Note that this stack is more than 260000 function calls deep. Run under
the debugger this will eventually segfault, but in our CI systems it
seems like this just hangs forever.
I assume that this is a bug either in the leak sanitizer or in glibc, as
I cannot reproduce it on my machine. In any case, let's work around the
bug for now by marking those tests with the "!SANITIZE_LEAK" prereq.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The `UNLEAK()` macro has been introduced with 0e5bba53af (add UNLEAK
annotation for reducing leak false positives, 2017-09-08) to help us
reduce the amount of reported memory leaks in cases we don't care about,
e.g. when exiting immediately afterwards. We have since removed all of
its users in favor of freeing the memory and thus don't need the macro
anymore.
Remove it.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
There are two users of `UNLEAK()` left in our codebase:
- In "builtin/clone.c", annotating the `repo` variable. That leak has
already been fixed though as you can see in the context, where we do
know to free `repo_to_free`.
- In "builtin/diff.c", to unleak entries of the `blob[]` array. That
leak has also been fixed, because the entries we assign to that
array come from `rev.pending.objects`, and we do eventually release
`rev`.
This neatly demonstrates one of the issues with `UNLEAK()`: it is quite
easy for the annotation to become stale. A second issue is that its
whole intent is to paper over leaks. And while that has been a necessary
evil in the past, because Git was leaking left and right, it isn't
really much of an issue nowadays where our test suite has no known leaks
anymore.
Remove the last two users of this macro.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We're leaking the commit-graph in the "test-helper read-graph"
subcommand, but as the leak is annotated with `UNLEAK()` the leak
sanitizer doesn't complain.
Fix the leak by calling `free_commit_graph()`. Besides getting rid of
the `UNLEAK()` annotation, it also increases code coverage because we
properly release resources as Git would do it, as well.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The sorting options are leaking, but given that they are marked with
`UNLEAK()` the leak sanitizer doesn't complain.
Fix the leak by creating a common exit path and clearing the vector such
that we can get rid of the `UNLEAK()` annotation entirely.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We've got a couple of leaking directory paths in git-init(1), all of
which are marked with `UNLEAK()`. Fixing them is trivial, so let's do
that instead so that we can get rid of `UNLEAK()` entirely.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The `check_git_cmd()` function is declared to return a string constant.
And while it sometimes does return a constant, it may also return an
allocated string in two cases:
- When handling aliases. This case is already marked with `UNLEAK()`
to work around the leak.
- When handling unknown commands in case "help.autocorrect" is
enabled. This one is not marked with `UNLEAK()`.
The function only has a single caller, so let's fix its return type to
be non-constant, consistently return an allocated string and free it at
its callsite to plug the leak.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
While `help_unknown_cmd()` would usually die on an unknown command, it
instead returns an autocorrected command when "help.autocorrect" is set.
But while the function is declared to return a string constant, it
actually returns an allocated string in that case. Callers thus aren't
aware that they have to free the string, leading to a memory leak.
Fix the function return type to be non-constant and free the returned
value at its only callsite.
Note that we cannot simply take ownership of `main_cmds.names[0]->name`
and then eventually free it. This is because the `struct cmdname` is
using a flex array to allocate the name, so the name pointer points into
the middle of the structure and thus cannot be freed.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We're populating multiple `struct cmdnames`, but don't ever free them.
Plug this memory leak.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We're reading the "help.autocorrect" and "alias.*" configuration into
global variables, which makes it hard to manage their lifetime
correctly. Refactor the code to use a struct instead.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Both `git sparse-checkout add` and `git sparse-checkout set` accept a
list of additional directories or patterns. These get massaged via calls
to `sanitize_paths()`, which may end up modifying the passed-in array by
updating its pointers to be prefixed paths. This allocates memory that
we never free.
Refactor the code to instead use a `struct strvec`, which makes it way
easier for us to track the lifetime correctly. The couple of extra
memory allocations likely do not matter as we only ever populate it with
command line arguments.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In `move_cache_to_base_index()` we move the index cache of the main
index into the split index, which is used when writing a shared index.
But we don't release the old split index base in case we already had a
split index before this operation, which can thus leak memory.
Plug the leak by releasing the previous base.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Similar as with the preceding commit, `handle_builtin()` does not
properly track lifetimes of the `argv` array and its strings. As it may
end up modifying the array this can lead to memory leaks in case it
contains allocated strings.
Refactor the function to use a `struct strvec` instead.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In `handle_alias()` we use both `argcp` and `argv` as in-out parameters.
Callers mostly pass through the static array from `main()`, but once we
handle an alias we replace it with an allocated array that may contain
some allocated strings. Callers do not handle this scenario at all and
thus leak memory.
We could in theory handle the lifetime of `argv` in a hacky fashion by
letting callers free it in case they see that an alias was handled. But
while that would likely work, we still wouldn't be able to easily handle
the lifetime of strings referenced by `argv`.
Refactor the code to instead use a `struct strvec`, which effectively
removes the need for us to manually track lifetimes.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Introduce a new `strvec_splice()` function that can replace a range of
strings in the vector with another array of strings. This function will
be used in subsequent commits.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In `process_ranges_merge_commit()` we try to figure out which of the
parents can be blamed for the given line changes. When we figure out
that none of the files in the line-log have changed we assign the
complete blame to that commit and rewrite the parents of the current
commit to only use that single parent.
This is done via `commit_list_append()`, which is misleadingly _not_
appending to the list of parents. Instead, we overwrite the parents with
the blamed parent. This makes us lose track of the old pointers,
creating a memory leak.
Fix this issue by freeing the parents before we overwrite them.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
There are various cases where we leak commit list items because we
evict items from the list, but don't free them. Plug those.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
While we free the result commit list at the end of `check_merge_base()`,
we forget to free any items that we have already iterated over. Fix this
by using a separate variable to iterate through them.
This leak is exposed by t6030, but plugging it does not make the whole
test suite pass.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
There are multiple leaks in `bisect_next_all()`. For one we don't free
the `tried` commit list. Second, one of the branches uses a direct
return instead of jumping to the cleanup code.
Fix these by freeing the commit list and converting the return to a
goto.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When reading bisect refs we read the reference mapping to the "bad" term
into the global `current_bad_oid` variable. This is an allocated string,
but because it is global we never have to free it. This changes though
when `register_ref()` is being called multiple times, at which point
we'll overwrite the previous pointer and thus make it unreachable.
Fix this issue by freeing the previous value. This leak is exposed by
t6030, but plugging it does not make the whole test suite pass.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When handling a bad merge base we print an error, which includes the set
of good revisions joined by spaces. This string is allocated, but never
freed.
Fix this memory leak. Note that the local `bad_hex` varible also looks
like a string that we should free. But in fact, `oid_to_hex()` returns
an address to a static variable even though it is declared to return a
non-constant string. The function signature is thus quite misleading and
really should be fixed, but doing so is outside of the scope of this
patch series.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Even though `read_bisect_terms()` is declared as assigning string
constants, it in fact assigns allocated strings to the `read_bad` and
`read_good` out parameters. The only callers of this function assign the
result to global variables and thus don't have to free them in order to
be leak-free. But that changes when executing the function multiple
times because we'd then overwrite the previous value and thus make it
unreachable.
Fix the function signature and free the previous values. This leak is
exposed by t0630, but plugging it does not make the whole test suite
pass.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When passing `--incremental` to git-blame(1) we exit early by jumping to
the `cleanup` label. But some of the cleanups we perform are handled
between the `goto` and its label, and thus we leak the data.
Move the cleanups after the `cleanup` label. While at it, move the logic
to free the scoreboard's `final_buf` into `cleanup_scoreboard()` and
drop its `const` declaration.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Besides the textual symref, we also allow symbolic links as the symref.
So, we should also provide the consistency check as what we have done
for textual symref. And also we consider deprecating writing the
symbolic links. We first need to access whether symbolic links still
be used. So, add a new fsck message "symlinkRef(INFO)" to tell the
user be aware of this information.
We have already introduced "files_fsck_symref_target". We should reuse
this function to handle the symrefs which use legacy symbolic links. We
should not check the trailing garbage for symbolic refs. Add a new
parameter "symbolic_link" to disable some checks which should only be
executed for textual symrefs.
And we need to also generate the "referent" parameter for reusing
"files_fsck_symref_target" by the following steps:
1. Use "strbuf_add_real_path" to resolve the symlink and get the
absolute path "ref_content" which the symlink ref points to.
2. Generate the absolute path "abs_gitdir" of "gitdir" and combine
"ref_content" and "abs_gitdir" to extract the relative path
"relative_referent_path".
3. If "ref_content" is outside of "gitdir", we just set "referent" with
"ref_content". Instead, we set "referent" with
"relative_referent_path".
Mentored-by: Patrick Steinhardt <ps@pks.im>
Mentored-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: shejialuo <shejialuo@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Ideally, we want to the users use "git symbolic-ref" to create symrefs
instead of writing raw contents into the filesystem. However, "git
symbolic-ref" is strict with the refname but not strict with the
referent. For example, we can make the "referent" located at the
"$(gitdir)/logs/aaa" and manually write the content into this where we
can still successfully parse this symref by using "git rev-parse".
$ git init repo && cd repo && git commit --allow-empty -mx
$ git symbolic-ref refs/heads/test logs/aaa
$ echo $(git rev-parse HEAD) > .git/logs/aaa
$ git rev-parse test
We may need to add some restrictions for "referent" parameter when using
"git symbolic-ref" to create symrefs because ideally all the
nonpseudo-refs should be located under the "refs" directory and we may
tighten this in the future.
In order to tell the user we may tighten the above situation, create
a new fsck message "symrefTargetIsNotARef" to notify the user that this
may become an error in the future.
Mentored-by: Patrick Steinhardt <ps@pks.im>
Mentored-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: shejialuo <shejialuo@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We have code that checks regular ref contents, but we do not yet check
the contents of symbolic refs. By using "parse_loose_ref_content" for
symbolic refs, we will get the information of the "referent".
We do not need to check the "referent" by opening the file. This is
because if "referent" exists in the file system, we will eventually
check its correctness by inspecting every file in the "refs" directory.
If the "referent" does not exist in the filesystem, this is OK as it is
seen as the dangling symref.
So we just need to check the "referent" string content. A regular ref
could be accepted as a textual symref if it begins with "ref:", followed
by zero or more whitespaces, followed by the full refname, followed only
by whitespace characters. However, we always write a single SP after
"ref:" and a single LF after the refname. It may seem that we should
report a fsck error message when the "referent" does not apply above
rules and we should not be so aggressive because third-party
reimplementations of Git may have taken advantage of the looser syntax.
Put it more specific, we accept the following contents:
1. "ref: refs/heads/master "
2. "ref: refs/heads/master \n \n"
3. "ref: refs/heads/master\n\n"
When introducing the regular ref content checks, we created two fsck
infos "refMissingNewline" and "trailingRefContent" which exactly
represents above situations. So we will reuse these two fsck messages to
write checks to info the user about these situations.
But we do not allow any other trailing garbage. The followings are bad
symref contents which will be reported as fsck error by "git-fsck(1)".
1. "ref: refs/heads/master garbage\n"
2. "ref: refs/heads/master \n\n\n garbage "
And we introduce a new "badReferentName(ERROR)" fsck message to report
above errors by using "is_root_ref" and "check_refname_format" to check
the "referent". Since both "is_root_ref" and "check_refname_format"
don't work with whitespaces, we use the trimmed version of "referent"
with these functions.
In order to add checks, we will do the following things:
1. Record the untrimmed length "orig_len" and untrimmed last byte
"orig_last_byte".
2. Use "strbuf_rtrim" to trim the whitespaces or newlines to make sure
"is_root_ref" and "check_refname_format" won't be failed by them.
3. Use "orig_len" and "orig_last_byte" to check whether the "referent"
misses '\n' at the end or it has trailing whitespaces or newlines.
Mentored-by: Patrick Steinhardt <ps@pks.im>
Mentored-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: shejialuo <shejialuo@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We have already used "parse_loose_ref_contents" function to check
whether the ref content is valid in files backend. However, by
using "parse_loose_ref_contents", we allow the ref's content to end with
garbage or without a newline.
Even though we never create such loose refs ourselves, we have accepted
such loose refs. So, it is entirely possible that some third-party tools
may rely on such loose refs being valid. We should not report an error
fsck message at current. We should notify the users about such
"curiously formatted" loose refs so that adequate care is taken before
we decide to tighten the rules in the future.
And it's not suitable either to report a warn fsck message to the user.
We don't yet want the "--strict" flag that controls this bit to end up
generating errors for such weirdly-formatted reference contents, as we
first want to assess whether this retroactive tightening will cause
issues for any tools out there. It may cause compatibility issues which
may break the repository. So, we add the following two fsck infos to
represent the situation where the ref content ends without newline or
has trailing garbages:
1. refMissingNewline(INFO): A loose ref that does not end with
newline(LF).
2. trailingRefContent(INFO): A loose ref has trailing content.
It might appear that we can't provide the user with any warnings by
using FSCK_INFO. However, in "fsck.c::fsck_vreport", we will convert
FSCK_INFO to FSCK_WARN and we can still warn the user about these
situations when using "git refs verify" without introducing
compatibility issues.
Mentored-by: Patrick Steinhardt <ps@pks.im>
Mentored-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: shejialuo <shejialuo@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git-fsck(1)" implicitly checks the ref content by passing the
callback "fsck_handle_ref" to the "refs.c::refs_for_each_rawref".
Then, it will check whether the ref content (eventually "oid")
is valid. If not, it will report the following error to the user.
error: refs/heads/main: invalid sha1 pointer 0000...
And it will also report above errors when there are dangling symrefs
in the repository wrongly. This does not align with the behavior of
the "git symbolic-ref" command which allows users to create dangling
symrefs.
As we have already introduced the "git refs verify" command, we'd better
check the ref content explicitly in the "git refs verify" command thus
later we could remove these checks in "git-fsck(1)" and launch a
subprocess to call "git refs verify" in "git-fsck(1)" to make the
"git-fsck(1)" more clean.
Following what "git-fsck(1)" does, add a similar check to "git refs
verify". Then add a new fsck error message "badRefContent(ERROR)" to
represent that a ref has an invalid content.
Mentored-by: Patrick Steinhardt <ps@pks.im>
Mentored-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: shejialuo <shejialuo@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We have already set up the infrastructure to check the consistency for
refs, but we do not support multiple worktrees. However, "git-fsck(1)"
will check the refs of worktrees. As we decide to get feature parity
with "git-fsck(1)", we need to set up support for multiple worktrees.
Because each worktree has its own specific refs, instead of just showing
the users "refs/worktree/foo", we need to display the full name such as
"worktrees/<id>/refs/worktree/foo". So we should know the id of the
worktree to get the full name. Add a new parameter "struct worktree *"
for "refs-internal.h::fsck_fn". Then change the related functions to
follow this new interface.
The "packed-refs" only exists in the main worktree, so we should only
check "packed-refs" in the main worktree. Use "is_main_worktree" method
to skip checking "packed-refs" in "packed_fsck" function.
Then, enhance the "files-backend.c::files_fsck_refs_dir" function to add
"worktree/<id>/" prefix when we are not in the main worktree.
Last, add a new test to check the refname when there are multiple
worktrees to exercise the code.
Mentored-by: Patrick Steinhardt <ps@pks.im>
Mentored-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: shejialuo <shejialuo@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We passes "refs_check_dir" to the "files_fsck_refs_name" function which
allows it to create the checked ref name later. However, when we
introduce a new check function, we have to allocate redundant memory and
re-calculate the ref name. It's bad for us to allocate redundant memory
and duplicate logic. Instead, we should allocate and calculate it only
once and pass the ref name to the check functions.
In order not to do repeat calculation, rename "refs_check_dir" to
"refname". And in "files_fsck_refs_dir", create a new strbuf "refname",
thus whenever we handle a new ref, calculate the name and call the check
functions one by one.
Mentored-by: Patrick Steinhardt <ps@pks.im>
Mentored-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: shejialuo <shejialuo@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In "files-backend.c::files_fsck_refs_name", we validate the refname
format by using "check_refname_format" to check the basename of the
iterator with "REFNAME_ALLOW_ONELEVEL" flag.
However, this is a bad implementation. Although we doesn't allow a
single "@" in ".git" directory, we do allow "refs/heads/@". So, we will
report an error wrongly when there is a "refs/heads/@" ref by using one
level refname "@".
Because we just check one level refname, we either cannot check the
other parts of the full refname. And we will ignore the following
errors:
"refs/heads/ new-feature/test"
"refs/heads/~new-feature/test"
In order to fix the above problem, enhance "files_fsck_refs_name" to use
the full name for "check_refname_format". Then, replace the tests which
are related to "@" and add tests to exercise the above situations using
for loop to avoid repetition.
Mentored-by: Patrick Steinhardt <ps@pks.im>
Mentored-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: shejialuo <shejialuo@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In "fsck.c::fsck_refs_error_function", we need to tell whether "oid" and
"referent" is NULL. So, we need to always initialize these parameters to
NULL instead of letting them point to anywhere when creating a new
"fsck_ref_report" structure.
The original code explicitly initializes the "path" member in the
"struct fsck_ref_report" to NULL (which implicitly 0-initializes other
members in the struct). It is more customary to use "{ 0 }" to express
that we are 0-initializing everything. In order to align with the
codebase, initialize "fsck_ref_report" with zero.
Mentored-by: Patrick Steinhardt <ps@pks.im>
Mentored-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: shejialuo <shejialuo@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The block writer needs to compute the key for every record that one adds
to the writer. The buffer for this key is stored on the stack and thus
reallocated on every call to `block_writer_add()`, which is inefficient.
Refactor the code so that we store the buffer in the `block_writer`
struct itself so that we can reuse it. This reduces the number of
allocations when writing many refs, e.g. when migrating one million refs
from the "files" backend to the "reftable backend. Before this change:
HEAP SUMMARY:
in use at exit: 80,048 bytes in 49 blocks
total heap usage: 3,025,864 allocs, 3,025,815 frees, 372,746,291 bytes allocated
After this change:
HEAP SUMMARY:
in use at exit: 80,048 bytes in 49 blocks
total heap usage: 2,013,250 allocs, 2,013,201 frees, 347,543,583 bytes allocated
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Adapt the name of the `block_writer::buf` variable to instead be called
`block`. This aligns it with the existing `block_len` variable, which
tracks the length of this buffer, and is generally a bit more tied to
the actual context where this variable gets used.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Both `writer_add_record()` and `reftable_writer_add_ref()` get executed
for every single ref record we're adding to the reftable writer. And as
both functions use a local buffer to write data, the allocations we have
to do here add up during larger transactions.
Refactor the code to use a scratch buffer part of the `reftable_writer`
itself such that we can reuse it. This signifcantly reduces the number
of allocations during large transactions, e.g. when migrating refs from
the "files" backend to the "reftable" backend. Before this change:
HEAP SUMMARY:
in use at exit: 80,048 bytes in 49 blocks
total heap usage: 5,032,171 allocs, 5,032,122 frees, 418,792,092 bytes allocated
After this change:
HEAP SUMMARY:
in use at exit: 80,048 bytes in 49 blocks
total heap usage: 3,025,864 allocs, 3,025,815 frees, 372,746,291 bytes allocated
This also translate into a small speedup:
Benchmark 1: migrate files:reftable (refcount = 1000000, revision = HEAD~)
Time (mean ± σ): 827.2 ms ± 16.5 ms [User: 689.4 ms, System: 124.9 ms]
Range (min … max): 809.0 ms … 924.7 ms 50 runs
Benchmark 2: migrate files:reftable (refcount = 1000000, revision = HEAD)
Time (mean ± σ): 813.6 ms ± 11.6 ms [User: 679.0 ms, System: 123.4 ms]
Range (min … max): 786.7 ms … 833.5 ms 50 runs
Summary
migrate files:reftable (refcount = 1000000, revision = HEAD) ran
1.02 ± 0.02 times faster than migrate files:reftable (refcount = 1000000, revision = HEAD~)
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When the `REF_SKIP_CREATE_REFLOG` flag is set we skip the creation of
the reflog entry, but we still normalize the reflog message when we
queue the update. This is a waste of resources as the normalized message
will never get used in the first place.
Fix this issue by skipping the normalization in case the flag is set.
This leads to a surprisingly large speedup when migrating from the
"files" to the "reftable" backend:
Benchmark 1: migrate files:reftable (refcount = 1000000, revision = HEAD~)
Time (mean ± σ): 878.5 ms ± 14.9 ms [User: 726.5 ms, System: 139.2 ms]
Range (min … max): 858.4 ms … 941.3 ms 50 runs
Benchmark 2: migrate files:reftable (refcount = 1000000, revision = HEAD)
Time (mean ± σ): 831.1 ms ± 10.5 ms [User: 694.1 ms, System: 126.3 ms]
Range (min … max): 812.4 ms … 851.4 ms 50 runs
Summary
migrate files:reftable (refcount = 1000000, revision = HEAD) ran
1.06 ± 0.02 times faster than migrate files:reftable (refcount = 1000000, revision = HEAD~)
And an ever larger speedup when migrating the other way round:
Benchmark 1: migrate reftable:files (refcount = 1000000, revision = HEAD~)
Time (mean ± σ): 923.6 ms ± 11.6 ms [User: 705.5 ms, System: 208.1 ms]
Range (min … max): 905.3 ms … 946.5 ms 50 runs
Benchmark 2: migrate reftable:files (refcount = 1000000, revision = HEAD)
Time (mean ± σ): 818.5 ms ± 9.0 ms [User: 627.6 ms, System: 180.6 ms]
Range (min … max): 802.2 ms … 842.9 ms 50 runs
Summary
migrate reftable:files (refcount = 1000000, revision = HEAD) ran
1.13 ± 0.02 times faster than migrate reftable:files (refcount = 1000000, revision = HEAD~)
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Reference transactions use `refs_verify_refname_available()` to check
for colliding references. This check consists of two parts:
- Checks for whether multiple ref updates in the same transaction
conflict with each other.
- Checks for whether existing refs conflict with any refs part of the
transaction.
While we generally cannot avoid the first check, the second check is
superfluous in cases where the transaction is an initial one in an
otherwise empty ref store. The check results in multiple ref reads as
well as the creation of a ref iterator for every ref we're checking,
which adds up quite fast when performing the check for many refs.
Introduce a new flag that allows us to skip this check and wire it up in
such that the backends pass it when running an initial transaction. This
leads to significant speedups when migrating ref storage backends. From
"files" to "reftable":
Benchmark 1: migrate files:reftable (refcount = 100000, revision = HEAD~)
Time (mean ± σ): 472.4 ms ± 6.7 ms [User: 175.9 ms, System: 285.2 ms]
Range (min … max): 463.5 ms … 483.2 ms 10 runs
Benchmark 2: migrate files:reftable (refcount = 100000, revision = HEAD)
Time (mean ± σ): 86.1 ms ± 1.9 ms [User: 67.9 ms, System: 16.0 ms]
Range (min … max): 82.9 ms … 90.9 ms 29 runs
Summary
migrate files:reftable (refcount = 100000, revision = HEAD) ran
5.48 ± 0.15 times faster than migrate files:reftable (refcount = 100000, revision = HEAD~)
And from "reftable" to "files":
Benchmark 1: migrate reftable:files (refcount = 100000, revision = HEAD~)
Time (mean ± σ): 452.7 ms ± 3.4 ms [User: 209.9 ms, System: 235.4 ms]
Range (min … max): 445.9 ms … 457.5 ms 10 runs
Benchmark 2: migrate reftable:files (refcount = 100000, revision = HEAD)
Time (mean ± σ): 95.2 ms ± 2.2 ms [User: 73.6 ms, System: 20.6 ms]
Range (min … max): 91.7 ms … 100.8 ms 28 runs
Summary
migrate reftable:files (refcount = 100000, revision = HEAD) ran
4.76 ± 0.11 times faster than migrate reftable:files (refcount = 100000, revision = HEAD~)
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Until now, we couldn't use "initial" transaction semantics to migrate
refs because the "files" backend only supported writing regular refs via
the initial transaction because it simply mapped the transaction to a
"packed-refs" transaction. But with the preceding commit, the "files"
backend has learned to also write symbolic and root refs in the initial
transaction by creating a second transaction for all refs that need to
be written as loose refs.
Adapt the code to migrate refs to commit the transaction as an initial
transaction. This results in a signiticant speedup when migrating many
refs:
Benchmark 1: migrate reftable:files (refcount = 100000, revision = HEAD~)
Time (mean ± σ): 3.247 s ± 0.034 s [User: 0.485 s, System: 2.722 s]
Range (min … max): 3.216 s … 3.309 s 10 runs
Benchmark 2: migrate reftable:files (refcount = 100000, revision = HEAD)
Time (mean ± σ): 453.6 ms ± 1.9 ms [User: 214.6 ms, System: 230.5 ms]
Range (min … max): 451.5 ms … 456.4 ms 10 runs
Summary
migrate reftable:files (refcount = 100000, revision = HEAD) ran
7.16 ± 0.08 times faster than migrate reftable:files (refcount = 100000, revision = HEAD~)
As the reftable backend doesn't (yet) special-case initial transactions
there is no comparable speedup for that backend.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The "files" backend has implemented special logic when committing
the first transactions in an otherwise empty ref store: instead of
writing all refs as separate loose files, it instead knows to write them
all into a "packed-refs" file directly. This is significantly more
efficient than having to write each of the refs as separate "loose" ref.
The only user of this optimization is git-clone(1), which only uses this
mechanism to write regular refs. Consequently, the implementation does
not know how to handle both symbolic and root refs. While fine in the
context of git-clone(1), this keeps us from using the mechanism in more
cases.
Adapt the logic to also support symbolic and root refs by using a second
transaction that we use for all of the refs that need to be written as
loose refs.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
There are two different ways to commit a transaction:
- `ref_transaction_commit()` can be used to commit a regular
transaction and is what almost every caller wants.
- `initial_ref_transaction_commit()` can be used when it is known that
the ref store that the transaction is committed for is empty and
when there are no concurrent processes. This is used when cloning a
new repository.
Implementing this via two separate functions has a couple of downsides.
First, every reference backend needs to implement a separate callback
even in the case where they don't special-case the initial transaction.
Second, backends are basically forced to reimplement the whole logic for
how to commit the transaction like the "files" backend does, even though
backends may wish to only tweak certain behaviour of a "normal" commit.
Third, it is awkward that callers must never prepare the transaction as
this is somewhat different than how a transaction typically works.
Refactor the code such that we instead mark initial transactions via a
separate flag when starting the transaction. This addresses all of the
mentioned painpoints, where the most important part is that it will
allow backends to have way more leeway in how exactly they want to
handle the initial transaction.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Move the logic to commit initial transactions such that we can start to
call it in `files_transaction_finish()` in a subsequent commit without
requiring a separate function declaration.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Allow passing flags when setting up a transaction such that the
behaviour of the transaction itself can be altered. This functionality
will be used in a subsequent patch.
Adapt callers accordingly.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git gc" discards any objects that are outside promisor packs that
are referred to by an object in a promisor pack, and we do not
refetch them from the promisor at runtime, resulting an unusable
repository. Work it around by including these objects in the
referring promisor pack at the receiving end of the fetch.
* jt/repack-local-promisor:
index-pack: repack local links into promisor packs
t5300: move --window clamp test next to unclamped
t0410: use from-scratch server
t0410: make test description clearer
A "git fetch" from the superproject going down to a submodule used
a wrong remote when the default remote names are set differently
between them.
* db/submodule-fetch-with-remote-name-fix:
submodule: correct remote name with fetch
Fail gracefully instead of crashing when attempting to write the
contents of a corrupt in-core index as a tree object.
* ps/cache-tree-w-broken-index-entry:
unpack-trees: detect mismatching number of cache-tree/index entries
cache-tree: detect mismatching number of index entries
cache-tree: refactor verification to return error codes
"git maintenance start" crashed due to an uninitialized variable
reference, which has been corrected.
* ps/maintenance-start-crash-fix:
builtin/gc: fix crash when running `git maintenance start`
On macOS, fsmonitor can fall into a race condition that results in
a client waiting forever to be notified for an event that have
already happened. This problem has been corrected.
* jk/fsmonitor-event-listener-race-fix:
fsmonitor: initialize fs event listener before accepting clients
simple-ipc: split async server initialization and running
Use after free and double freeing at the end in "git log -L... -p"
had been identified and fixed.
* ds/line-log-asan-fix:
line-log: protect inner strbuf from free
Currently,
- Running "index-pack --promisor" outside a repo segfaults.
- It may be confusing to a user that running "index-pack --promisor"
within a repo may make changes to the repo's object DB, especially
since the packs indexed by the index-pack invocation may not even be
related to the repo.
As discussed in [1] and [2], teaching --promisor to forbid a packfile
name solves both these problems. This combination of arguments requires
a repo (since we are writing the resulting .pack and .idx to it) and it
is clear that the files are related to the repo.
Currently, Git uses "index-pack --promisor" only when fetching into
a repo, so it could be argued that we should teach "index-pack" a
new argument (say, "--fetching-mode") instead of tying --promisor to
a generic argument like the packfile name. However, this --promisor
feature could conceivably be used whenever we have a packfile that is
known to come from the promisor remote (whether obtained through Git's
fetch protocol or through other means) so not using a new argument seems
reasonable - one could envision a user-made script obtaining a packfile
and then running "index-pack --promisor --stdin", for example. In fact,
it might be possible to relax the restriction further (say, by also
allowing --promisor when indexing a packfile that is in the object DB),
but relaxing the restriction is backwards-compatible so we can revisit
that later.
One thing to watch out for is the possibility of a future Git feature
that indexes a pack in the context of a repo, but does not necessarily
write the resulting pack to it (and does not necessarily desire to
make any changes to the object DB). One such feature would be fetch
quarantine, which might need the repo context in order to detect
hash collisions, but would also need to ensure that the object DB
is undisturbed in case the fetch fails for whatever reason, even if
the reason occurs only after the indexing is complete. It may not be
obvious to the implementer of such a feature that "index-pack" could
sometimes write packs other than the indexed pack to the object DB,
but there are already other ways that "fetch" could write to the object
DB (in particular, packfile URIs and bundle URIs), so hopefully the
implementation of this future feature would already include a test that
the object DB be undisturbed.
This change requires the change to t5300 by 1f52cdfacb (index-pack:
document and test the --promisor option, 2022-03-09) to be undone.
(--promisor is already tested indirectly, so we don't need the explicit
test here any more.)
[1] https://lore.kernel.org/git/20241114005652.GC1140565@coredump.intra.peff.net/
[2] https://lore.kernel.org/git/20241119185345.GB15723@coredump.intra.peff.net/
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When running scheduled maintenance via `git maintenance start`, we
acquire a lockfile to ensure that no other scheduled maintenance task is
running in the repository concurrently. If so, we do provide an error to
the user hinting that another process seems to be running in this repo.
There are two important cases why such a lockfile may exist:
- An actual git-maintenance(1) process is still running in this
repository.
- An earlier process may have crashed or was interrupted part way
through and has left a stale lockfile behind.
In c95547a394 (builtin/gc: fix crash when running `git maintenance
start`, 2024-10-10), we have fixed an issue where git-maintenance(1)
would crash with the "start" subcommand, and the underlying bug causes
the second scenario to trigger quite often now.
Most users don't know how to get out of that situation again though.
Ideally, we'd be removing the stale lock for our users automatically.
But in the context of repository maintenance this is rather risky, as it
can easily run for hours or even days. So finding a clear point where we
know that the old process has exited is basically impossible.
We have the same issue in other subsystems, e.g. when locking refs. Our
lockfile interfaces thus provide the `unable_to_lock_message()` function
for exactly this purpose: it provides a nice hint to the user that
explains what is going on and how to get out of that situation again by
manually removing the file.
Adapt git-maintenance(1) to print a similar hint. While we could use the
above function, we can provide a bit more context as we know exactly
what kind of process would create the lockfile.
Reported-by: Miguel Rincon Barahona <mrincon@gitlab.com>
Reported-by: Kev Kloss <kkloss@gitlab.com>
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
By the way, we also change the sentences where git-diff would refer to
itself, so that no link is created in the HTML output.
Signed-off-by: Jean-Noël Avila <jn.avila@free.fr>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The format change is only applied to the sections of the file that are
filtered in git-diff.
Signed-off-by: Jean-Noël Avila <jn.avila@free.fr>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The documentation for git-diff has been updated to follow the new
documentation guidelines. The following changes have been applied to
the series of patches:
- switching the synopsis to a synopsis block which will automatically
format placeholders in italics and keywords in monospace
- use _<placeholder>_ instead of <placeholder> in the description
- use `backticks for keywords and more complex option
descriptions`. The new rendering engine will apply synopsis rules to
these spans.
- prevent git-diff from self-referencing itself via gitlink macro when
the generated link would point to the same page.
Signed-off-by: Jean-Noël Avila <jn.avila@free.fr>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* ps/reftable-detach:
reftable/system: provide thin wrapper for lockfile subsystem
reftable/stack: drop only use of `get_locked_file_path()`
reftable/system: provide thin wrapper for tempfile subsystem
reftable/stack: stop using `fsync_component()` directly
reftable/system: stop depending on "hash.h"
reftable: explicitly handle hash format IDs
reftable/system: move "dir.h" to its only user
We use the lockfile subsystem to write lockfiles for "tables.list". As
with the tempfile subsystem, the lockfile subsystem also hooks into our
infrastructure to prune stale locks via atexit(3p) or signal handlers.
Furthermore, the lockfile subsystem also handles locking timeouts, which
do add quite a bit of logic. Having to reimplement that in the context
of Git wouldn't make a whole lot of sense, and it is quite likely that
downstream users of the reftable library may have a better idea for how
exactly to implement timeouts.
So again, provide a thin wrapper for the lockfile subsystem instead such
that the compatibility shim is fully self-contained.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We've got a single callsite where we call `get_locked_file_path()`. As
we're about to convert our usage of the lockfile subsystem to instead be
used via a compatibility shim we'd have to implement more logic for this
single callsite. While that would be okay if Git was the only supposed
user of the reftable library, it's a bit more awkward when considering
that we have to reimplement this functionality for every user of the
library eventually.
Refactor the code such that we don't call `get_locked_file_path()`
anymore.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We use the tempfile subsystem to write temporary tables, but given that
we're in the process of converting the reftable library to become
standalone we cannot use this subsystem directly anymore. While we could
in theory convert the code to use mkstemp(3p) instead, we'd lose access
to our infrastructure that automatically prunes tempfiles via atexit(3p)
or signal handlers.
Provide a thin wrapper for the tempfile subsystem instead. Like this,
the compatibility shim is fully self-contained in "reftable/system.c".
Downstream users of the reftable library would have to implement their
own tempfile shims by replacing "system.c" with a custom version.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We're executing `fsync_component()` directly in the reftable library so
that we can fsync data to disk depending on "core.fsync". But as we're
in the process of converting the reftable library to become standalone
we cannot use that function in the library anymore.
Refactor the code such that users of the library can inject a custom
fsync function via the write options. This allows us to get rid of the
dependency on "write-or-die.h".
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We include "hash.h" in "reftable/system.h" such that we can use hash
format IDs as well as the raw size of SHA1 and SHA256. As we are in the
process of converting the reftable library to become standalone we of
course cannot rely on those constants anymore.
Introduce a new `enum reftable_hash` to replace internal uses of the
hash format IDs and new constants that replace internal uses of the hash
size. Adapt the reftable backend to set up the correct hash function.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The hash format IDs are used for two different things across the
reftable codebase:
- They are used as a 32 bit unsigned integer when reading and writing
the header in order to identify the hash function.
- They are used internally to identify which hash function is in use.
When one only considers the second usecase one might think that one can
easily change the representation of those hash IDs. But because those
IDs end up in the reftable header and footer on disk it is important
that those never change.
Create separate constants `REFTABLE_FORMAT_ID_*` and use them in
contexts where we read or write reftable headers. This serves multiple
purposes:
- It allows us to more easily discern cases where we actually use
those constants for the on-disk format.
- It detangles us from the same constants that are defined in
libgit.a, which is another required step to convert the reftable
library to become standalone.
- It makes the next step easier where we stop using `GIT_*_FORMAT_ID`
constants in favor of a custom enum.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We still include "dir.h" in "reftable/system.h" even though it is not
used by anything but by a single unit test. Move it over into that unit
test so that we don't accidentally use any functionality provided by it
in the reftable codebase.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
If someone replaces a commit with a modified version, then builds on
that commit, and then later decides to rewrite history in a format like
git fast-export --all | CMD_TO_TWEAK_THE_STREAM | git fast-import
and CMD_TO_TWEAK_THE_STREAM undoes the modifications that the
replacement did, then at the end you'd get a replace ref that points to
itself. For example:
$ git show-ref | grep replace
fb92ebc654641b310e7d0360d0a5a49316fd7264 refs/replace/fb92ebc654641b310e7d0360d0a5a49316fd7264
Git commands which pay attention to replace refs will die with an error
when a self-referencing replace ref is present:
$ git log
fatal: replace depth too high for object fb92ebc654641b310e7d0360d0a5a49316fd7264
Avoid such problems by deleting replace refs that will simply end up
pointing to themselves at the end of our writing. Unless users specify
--quiet, warn them when we delete such a replace ref.
Two notes about this patch:
* We are not ignoring the problematic update of the replace ref
(turning it into a no-op), we are replacing the update with a delete.
The logic here is that if the repository had a value for the replace
ref before fast-import was run, and the replace ref was explicitly
named in the fast-import stream, we don't want the replace ref to be
left with a pre-fast-import value.
* While loops with more than one element (e.g. refs/replace/A points
to B, and refs/replace/B points to A) are possible, they seem much
less plausible. It is pretty easy to create a sequence of
git-filter-repo commands that will trigger a self-referencing replace
ref, but I do not know how to trigger a scenario with a cycle length
greater than 1.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We define macros with the bytes of the empty trees and blobs for sha1
and sha256. But since e1ccd7e2b1 (sha1_file: only expose empty object
constants through git_hash_algo, 2018-05-02), those are used only for
initializing the git_hash_algo entries. Any other code using the macros
directly would be suspicious, since a hash_algo pointer is the level of
indirection we use to make everything work with both sha1 and sha256.
So let's future proof against code doing the wrong thing by dropping the
macros entirely and just initializing the structs directly.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The cached-object API maps oids to in-memory entries. Once inserted,
these entries should be immutable. Let's return them from the
find_cached_object() call with a const tag to make this clear.
Suggested-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The pretend_object_file() function adds to an array mapping oids to
object contents, which are later retrieved with find_cached_object().
We naturally need to store the oid for each entry, since it's the lookup
key.
But find_cached_object() also returns a hard-coded empty_tree object.
There we don't care about its oid field and instead compare against
the_hash_algo->empty_tree. The oid field is left as all-zeroes.
This all works, but it means that the cached_object struct we return
from find_cached_object() may or may not have a valid oid field, depend
whether it is the hard-coded tree or came from pretend_object_file().
Nobody looks at the field, so there's no bug. But let's future-proof it
by returning only the object contents themselves, not the oid. We'll
continue to call this "struct cached_object", and the array entry
mapping the key to those contents will be a "cached_object_entry".
This would also let us swap out the array for a better data structure
(like a hashmap) if we chose, but there's not much point. The only code
that adds an entry is git-blame, which adds at most a single entry per
process.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The fake empty_tree struct is a static global, but the only code that
looks at it is find_cached_object(). The struct itself is a little odd,
with an invalid "oid" field that is handled specially by that function.
Since it's really just an implementation detail, let's move it to a
static within the function. That future-proofs against other code trying
to use it and seeing the weird oid value.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We treat the empty tree specially, providing an in-memory "cached" copy,
which allows you to diff against it even if the object doesn't exist in
the repository. This is implemented as part of the larger cached_object
subsystem, but we use a stand-alone empty_tree struct.
We initialize the oid of that struct using EMPTY_TREE_SHA1_BIN_LITERAL.
At first glance, that seems like a bug; how could this ever work for
sha256 repositories?
The answer is that we never look at the oid field! The oid field is used
to look up entries added by pretend_object_file() to the cached_objects
array. But for our stand-alone entry, we look for it independently using
the_hash_algo->empty_tree, which will point to the correct algo struct
for the repository.
This happened in 62ba93eaa9 (sha1_file: convert cached object code to
struct object_id, 2018-05-02), which even mentions that this field is
never used. Let's reduce confusion for anybody reading this code by
replacing the sha1 initializer with a comment. The resulting field will
be all-zeroes, so any violation of our assumption that the oid field is
not used will break equally for sha1 and sha256.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We hard-code a few well-known hash values for empty trees and blobs in
both sha1 and sha256 formats. We do so with string literals like this:
#define EMPTY_TREE_SHA256_BIN_LITERAL \
"\x6e\xf1\x9b\x41\x22\x5c\x53\x69\xf1\xc1" \
"\x04\xd4\x5d\x8d\x85\xef\xa9\xb0\x57\xb5" \
"\x3b\x14\xb4\xb9\xb9\x39\xdd\x74\xde\xcc" \
"\x53\x21"
and then use it to initialize the hash field of an object_id struct.
That hash field is exactly 32 bytes long (the size we need for sha256).
But the string literal above is actually 33 bytes long due to the NUL
terminator. This is legal in C, and the NUL is ignored.
Side note on legality: in general excess initializer elements are
forbidden, and gcc will warn on both of these:
char foo[3] = { 'h', 'u', 'g', 'e' };
char bar[3] = "VeryLongString";
I couldn't find specific language in the standard allowing
initialization from a string literal where _just_ the NUL is ignored,
but C99 section 6.7.8 (Initialization), paragraph 32 shows this exact
case as "example 8".
However, the upcoming gcc 15 will start warning for this case (when
compiled with -Wextra via DEVELOPER=1):
CC object-file.o
object-file.c:52:9: warning: initializer-string for array of ‘unsigned char’ is too long [-Wunterminated-string-initialization]
52 | "\x6e\xf1\x9b\x41\x22\x5c\x53\x69\xf1\xc1" \
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
object-file.c:79:17: note: in expansion of macro ‘EMPTY_TREE_SHA256_BIN_LITERAL’
which is understandable. Even though this is not a bug for us, since we
do not care about the NUL terminator (and are just using the literal as
a convenient format), it would be easy to accidentally create an array
that was mistakenly unterminated.
We can avoid this warning by switching the initializer to an actual
array of unsigned values. That arguably demonstrates our intent more
clearly anyway.
Reported-by: Sam James <sam@gentoo.org>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The targets that generate clar headers depend on their source files, but
not on the script that is actually generating the output. Fix the issue
by adding the missing dependencies.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Pass the VERBATIM option to `add_custom_command()`. Like this, all
arguments to the commands will be escaped properly for the build tool so
that the invoked command receives each argument unchanged.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In 30bf9f0aaa (cmake: set up proper dependencies for generated clar
headers, 2024-10-21), we have deduplicated the logic to generate our
clar headers by reusing the same scripts that our Makefile does. Despite
the deduplication, this refactoring also made us rebuild the headers in
case the source files change, which didn't happen previously.
The commit also introduced an issue though: we execute the scripts
directly, so when the host does not have "/bin/sh" available they will
fail. This is for example the case on Windows when importing the CMake
project into Microsoft Visual Studio.
Address the issue by invoking the scripts with `SH_EXE`, which contains
the discovered path of the shell interpreter.
While at it, wrap the overly long lines in the CMake build instructions.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Convert "clar-generate.awk" into a shell script that invokes awk(1).
This allows us to avoid the shell redirect in the build system, which
may otherwise be a problem with build systems on platforms that use a
different shell.
While at it, wrap the overly long lines in the CMake build instructions.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
It might be naïve to think that those who need this education would end
up here in the first place. But I think it’s good to mention this
high-level concept here on a command which provides a backup strategy.
Signed-off-by: Kristoffer Haugsbakk <code@khaugsbakk.name>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Mention `--all` as an alternative in “Specifying References”.
Signed-off-by: Kristoffer Haugsbakk <code@khaugsbakk.name>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We don’t need this part now that we have a fleshed-out `--all` example.
Signed-off-by: Kristoffer Haugsbakk <code@khaugsbakk.name>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Provide an example about how to make a “full backup” with caveats about
what that means in this case.
This is a requested use-case.[1] But the doc is a bit unassuming
about it:
If you want to match `git clone --mirror`, which would include your
refs such as `refs/remotes/*`, use `--all`.
The user cannot be expected to formulate “I want a full backup” as “I
want to match `git clone --mirror`” for a bundle file or something.
Let’s drop this mention of `--all` later in the doc and frontload it.
† 1: E.g.:
• https://stackoverflow.com/questions/5578270/fully-backup-a-git-repo
• https://stackoverflow.com/questions/11792671/how-to-git-bundle-a-complete-repo
Helped-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Kristoffer Haugsbakk <code@khaugsbakk.name>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In C23, "unreachable" is a macro that invokes undefined behavior if it
is invoked. To make sure that our code compiles on a variety of C
versions, rename unreachable to "is_unreachable".
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"thread_local" is a keyword in C23. To make sure that our code compiles
on a wide variety of C versions, rename struct thread_local to "struct
thread_local_data" to avoid a conflict.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Reported-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Kristoffer Haugsbakk <code@khaugsbakk.name>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
There somehow ended up too many bogus "merge X later to maint"
comments for topics that cannot be merged ever down to 'maint'
because they were forked from more recent integration branches
in the draft release notes. Remove them, as they are inviting
for mistakes later.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Historically, Git has allowed users to clone from an untrusted
repository, and we have documented that this is safe to do so:
`upload-pack` tries to avoid any dangerous configuration options or
hooks from the repository it's serving, making it safe to clone an
untrusted directory and run commands on the resulting clone.
However, this was broken by f4aa8c8bb1 ("fetch/clone: detect dubious
ownership of local repositories", 2024-04-10) in an attempt to make
things more secure. That change resulted in a variety of problems when
cloning locally and over SSH, but it did not change the stated security
boundary. Because the security boundary has not changed, it is safe to
adjust part of the code that patch introduced.
To do that and restore the previous functionality, adjust enter_repo to
take two flags instead of one.
The two bits are
- ENTER_REPO_STRICT: callers that require exact paths (as opposed
to allowing known suffixes like ".git", ".git/.git" to be
omitted) can set this bit. Corresponds to the "strict" parameter
that the flags word replaces.
- ENTER_REPO_ANY_OWNER_OK: callers that are willing to run without
ownership check can set this bit.
The former is --strict-paths option of "git daemon". The latter is
set only by upload-pack, which honors the claimed security boundary.
Note that local clones across ownership boundaries require --no-local so
that upload-pack is used. Document this fact in the manual page and
provide an example.
This patch was based on one written by Junio C Hamano.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When reusing objects from source pack(s), write_reused_pack_verbatim()
is responsible for reusing objects whole eword_t's at a time. It works
by taking the longest continuous run of objects from the beginning of
each source pack that the caller wants, and reuses the entirety of that
section from each pack.
This is based on the assumption that we don't have any gaps within the
region. This assumption relieves us from having to patch any
OFS_DELTAs, since we know that there aren't any gaps between any delta
and its base in that region.
To illustrate why this assumption is necessary, suppose we have some
pack P, which has objects X, Y, and Z. If the MIDX's copy of Y was
selected from a pack other than P, then the bit corresponding to object
Y will appear earlier in the bitmap than the bits corresponding to X and
Z.
If pack-objects already has or will use the copy of Y from the pack it
was selected from in the MIDX, then it is an error to reuse all objects
between X and Z in the source pack. Doing so will cause us to reuse Y
from a different pack than the one which represents Y in the MIDX,
causing us to either:
- include the object twice, assuming that the caller wants Y in the
pack, or
- include the object once, resulting in us packing more objects than
necessary.
This regression comes from ca0fd69e37 (pack-objects: prepare
`write_reused_pack_verbatim()` for multi-pack reuse, 2023-12-14), which
incorrectly assumed that there would be no gaps in reusable regions of
non-preferred packs.
Instead, we can only safely perform the whole-word reuse optimization on
the preferred pack, where we know with certainty that no gaps exist in
that region of the bitmap. We can still reuse objects from non-preferred
packs, but we have to inspect them individually in write_reused_pack()
to ensure that any gaps that may exist are accounted for.
This allows us to simplify the implementation of
write_reused_pack_verbatim() back to almost its pre-multi-pack reuse
form, since we can now assume that the beginning of the pack appears at
the beginning of the bitmap, meaning that we don't have to account for
any bits up to the first word boundary (like we had to special case in
ca0fd69e37).
The only significant changes from the pre-ca0fd69e37 implementation are:
- that we can no longer inspect words up to the end of
reuse_packfile_bitmap->word_alloc, since we only want to look at
words whose bits all correspond to objects in the given packfile, and
- that we return early when given a reuse_packfile which is not
preferred, making the call a noop.
In the future, it might be possible to restore this optimization if we
could guarantee that some reuse packs don't contain any gaps by
construction (similar to the "disjoint packs" idea in very early
versions of multi-pack reuse).
Helped-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In the multi-pack reuse code, there are two paths for reusing the
on-disk representation of an object, handled by:
- builtin/pack-objects.c::write_reused_pack_one()
- builtin/pack-objects.c::write_reused_pack_verbatim()
The former is responsible for copying the bytes for a single object out
of an existing source pack. The latter does the same but for a region of
objects aligned at eword_t boundaries.
Demonstrate a bug whereby write_reused_pack_verbatim() can be tricked
into writing out objects from some source pack, even when those objects
were selected from a different source pack in the MIDX bitmap.
When the caller wants at least one of the objects in that region,
pack-objects will write the same object twice as a result of this bug.
In the other case where the caller doesn't want any of the objects in
the region of interest, we will write out objects that weren't
requested.
Demonstrate this bug by creating two packs, where the preferred one of
those packs contains a single object which also appears in the main
(non-preferred) pack. A separate bug[^1] prevents us from triggering the
main bug when the duplicated object is the last one in the main pack,
but any earlier object will suffice.
We could fix that separate bug, but the following commit will simplify
write_reused_pack_verbatim() and only call it on the preferred pack, so
doing so would have little point.
[^1]: Because write_reused_pack_verbatim() only reuses bits in the range
off_t pack_start_off = pack_pos_to_offset(reuse_packfile->p, 0);
off_t pack_end_off = pack_pos_to_offset(reuse_packfile->p,
pos - reuse_packfile->bitmap_pos);
written += pos - reuse_packfile->bitmap_pos;
/* We're recording one chunk, not one object. */
record_reused_object(pack_start_off,
pack_start_off - (hashfile_total(out) - pack_start));
, or in other words excluding the object beginning at position 'pos -
reuse_packfile->bitmap_pos' in the source pack. But since
reuse_packfile->bitmap_pos is '1' in the non-preferred pack
(accounting for the single-object pack which is preferred), we don't
actually copy the bytes from the last object.
Helped-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The reference-transaction hook is invoked whenever there is a reference
update being performed. For each state of the transaction, we iterate
over the updates present and pass this information to the hook.
The `ref_update` structure is used to hold these updates within a
`transaction`. We use the same structure for holding reflog updates too.
Which means that the reference transaction hook is also obtaining
information about a reflog update. This is a bug, since:
- The hook is designed to work with reference updates and reflogs
updates are different.
- The hook doesn't have the required information to distinguish
reference updates from reflog updates.
This is particularly evident when the default branch (pointed by HEAD)
is updated, we see that the hook also receives information about HEAD
being changed. In reality, we only add a reflog update for HEAD, while
HEAD's values remains the same.
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Originally, the conditional definition of the setup/teardown functions
for malloc checking could be run at any time, because they depended only
on command-line options and the system getconf function.
But since 02d900361c (test-lib: check malloc debug LD_PRELOAD before
using, 2024-11-11), we probe the system by running "git version". Since
this code runs before we've set $PATH to point to the version of Git we
intend to test, we actually run the system version of git.
This mostly works, since what we really care about is whether the
LD_PRELOAD works, and it should work the same with any program. But
there are some corner cases:
1. You might not have a system git at all, in which case the preload
will appear to fail, even though it could work with the actual
built version of git.
2. Your system git could be linked in a different way. For example, if
it was built statically, then it will ignore LD_PRELOAD entirely,
and we might assume that the preload works, even though it might
not when used with a dynamic build.
We could give a more complete path to the version of Git we intend to
test, but features like GIT_TEST_INSTALLED make that not entirely
trivial. So instead, let's just bump the setup until after we've set up
the $PATH. There's no need for us to do it early, as long as it is done
before the first test runs.
Reported-by: Toon Claes <toon@iotcl.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The MinGW compatibility layer has been taught to support POSIX
semantics for atomic renames when other process(es) have a file
opened at the destination path.
* ps/mingw-rename:
compat/mingw: support POSIX semantics for atomic renames
compat/mingw: allow deletion of most opened files
compat/mingw: share file handles created via `CreateFileW()`
A regression where commit objects missing from a commit-graph can
cause an infinite loop when doing a fetch in a partial clone has
been fixed.
* jt/commit-graph-missing:
fetch-pack: die if in commit graph but not obj db
Revert "fetch-pack: add a deref_without_lazy_fetch_extended()"
The "--shallow-exclude=<ref>" option to various history transfer
commands takes a ref, not an arbitrary revision.
* en/shallow-exclude-takes-a-ref-fix:
doc: correct misleading descriptions for --shallow-exclude
upload-pack: fix ambiguous error message
When running a dir-diff command that produces no diff, variables
`wt_modified` and `tmp_modified` are used while uninitialized, causing:
$ /home/smarchi/src/git/git-difftool --dir-diff master
free(): invalid pointer
[1] 334004 IOT instruction (core dumped) /home/smarchi/src/git/git-difftool --dir-diff master
$ valgrind --track-origins=yes /home/smarchi/src/git/git-difftool --dir-diff master
...
Invalid free() / delete / delete[] / realloc()
at 0x48478EF: free (vg_replace_malloc.c:989)
by 0x422CAC: hashmap_clear_ (hashmap.c:208)
by 0x283830: run_dir_diff (difftool.c:667)
by 0x284103: cmd_difftool (difftool.c:801)
by 0x238E0F: run_builtin (git.c:484)
by 0x2392B9: handle_builtin (git.c:750)
by 0x2399BC: cmd_main (git.c:921)
by 0x356FEF: main (common-main.c:64)
Address 0x1ffefff180 is on thread 1's stack
in frame #2, created by run_dir_diff (difftool.c:358)
...
If taking any `goto finish` path before these variables are initialized,
`hashmap_clear_and_free()` operates on uninitialized data, sometimes
causing a crash.
This regression was introduced in 7f795a1715 (builtin/difftool: plug
several trivial memory leaks, 2024-09-26).
Fix it by initializing those variables with the `HASHMAP_INIT` macro.
Add a test comparing the main branch to itself, resulting in no diff.
Signed-off-by: Simon Marchi <simon.marchi@efficios.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The refspec struct keeps two matched arrays: one for the refspec_item
structs and one for the original raw refspec strings. The main reason
for this is that there are other users of refspec_item that do not care
about the raw strings. But it does make managing the refspec struct
awkward, as we must keep the two arrays in sync. This has led to bugs in
the past (both leaks and double-frees).
Let's just store a copy of the raw refspec string directly in each
refspec_item struct. This simplifies the handling at a small cost:
1. Direct callers of refspec_item_init() will now get an extra copy of
the refspec string, even if they don't need it. This should be
negligible, as the struct is already allocating two strings for the
parsed src/dst values (and we tend to only do it sparingly anyway
for things like the TAG_REFSPEC literal).
2. Users of refspec_appendf() will now generate a temporary string,
copy it, and then free the result (versus handing off ownership of
the temporary string). We could get around this by having a "nodup"
variant of refspec_item_init(), but it doesn't seem worth the extra
complexity for something that is not remotely a hot code path.
Code which accesses refspec->raw now needs to look at refspec->item.raw.
Other callers which just use refspec_item directly can remain the same.
We'll free the allocated string in refspec_item_clear(), which they
should be calling anyway to free src/dst.
One subtle note: refspec_item_init() can return an error, in which case
we'll still have set its "raw" field. But that is also true of the "src"
and "dst" fields, so any caller which does not _clear() the failed item
is already potentially leaking. In practice most code just calls die()
on an error anyway, but you can see the exception in valid_fetch_refspec(),
which does correctly call _clear() even on error.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
A refspec struct contains zero or more refspec_item structs, along with
matching "raw" strings. The items and raw strings are kept in separate
arrays, but those arrays will always have the same length (because we
write them only via refspec_append_nodup(), which grows both). This can
lead to bugs when manipulating the array, since the arrays and lengths
must be modified in lockstep. For example, the bug fixed in the previous
commit, which forgot to decrement raw_nr.
So let's get rid of "raw_nr" and have only "nr", making this kind of bug
impossible (and also making it clear that the two are always matched,
something that existing code already assumed but was not guaranteed by
the interface).
Even though we'd expect "alloc" and "raw_alloc" to likewise move in
lockstep, we still need to keep separate counts there if we want to
continue to use ALLOC_GROW() for both.
Conceptually this would all be simpler if refspec_item just held onto
its own raw string, and we had a single array. But there are callers
which use refspec_item outside of "struct refspec" (and so don't hold on
to a matching "raw" string at all), which we'd possibly need to adjust.
So let's not worry about refactoring that for now, and just get rid of
the redundant count variable. That is the first step on the road to
combining them anyway.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In filter_prefetch_refspecs(), we may remove one or more refspecs if
they point into refs/tags/. When we do, we remove the item from the
refspec->items array, shifting subsequent items down, and then decrement
the refspec->nr count.
We also remove the item from the refspec->raw array, but fail to
decrement refspec->raw_nr. This leaves us with a count that is too high,
and anybody looking at the "raw" array will erroneously see either:
1. The removed entry, if there were no subsequent items to shift down.
2. A duplicate of the final entry, as everything is shifted down but
there was nothing to overwrite the final item.
The obvious culprit to run into this is calling refspec_clear(), which
will try to free the removed entry (case 1) or double-free the final
entry (case 2). But even though the bug has existed since the function
was added in 2e03115d0c (fetch: add --prefetch option, 2021-04-16), we
did not trigger it in the test suite. The --prefetch option is normally
only used with configured refspecs, and we never bother to call
refspec_clear() on those (they are stored as part of a struct remote,
which is held in a global variable).
But you could trigger case 2 manually like:
git fetch --prefetch . refs/tags/foo refs/tags/bar
Ironically you couldn't trigger case 1, because the code accidentally
leaked the string in the raw array, and the two bugs (the leak and the
double-free) cancelled out. But when we fixed the leak in ea4780307c
(fetch: free "raw" string when shrinking refspec, 2024-09-24), it became
possible to trigger that, too, with a single item:
git fetch --prefetch . refs/tags/foo
We can fix both cases by just correctly decrementing "raw_nr" when we
shrink the array. Even though we don't expect people to use --prefetch
with command-line refspecs, we'll add a test to make sure it behaves
well (like the test just before it, we're just confirming that the
filtered prefetch succeeds at all).
Reported-by: Eric Mills <ermills@epic.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Teach index-pack to, when processing the objects in a pack with
--promisor specified on the CLI, repack local objects (and the local
objects that they refer to, recursively) referenced by these objects
into promisor packs.
This prevents the situation in which, when fetching from a promisor
remote, we end up with promisor objects (newly fetched) referring
to non-promisor objects (locally created prior to the fetch). This
situation may arise if the client had previously pushed objects to the
remote, for example. One issue that arises in this situation is that,
if the non-promisor objects become inaccessible except through promisor
objects (for example, if the branch pointing to them has moved to
point to the promisor object that refers to them), then GC will garbage
collect them. There are other ways to solve this, but the simplest
seems to be to enforce the invariant that we don't have promisor objects
referring to non-promisor objects.
This repacking is done from index-pack to minimize the performance
impact. During a fetch, the only time most objects are fully inflated
in memory is when their object ID is computed, so we also scan the
objects (to see which objects they refer to) during this time.
Also to minimize the performance impact, an object is calculated to be
local if it's a loose object or present in a non-promisor pack. (If it's
also in a promisor pack or referred to by an object in a promisor pack,
it is technically already a promisor object. But a misidentification
of a promisor object as a non-promisor object is relatively benign
here - we will thus repack that promisor object into a promisor pack,
duplicating it in the object store, but there is no correctness issue,
just an issue of inefficiency.)
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This fixes test failures across the suite on glibc platforms that don't
have libc_malloc_debug.so.0.
We added support for glibc's malloc checking routines long ago in
a731fa916e (Add MALLOC_CHECK_ and MALLOC_PERTURB_ libc env to the test
suite for detecting heap corruption, 2012-09-14). Back then we didn't
need to do any checks to see if the platform supported it. We were just
setting some environment variables which would either enable it or not.
That changed in 131b94a10a (test-lib.sh: Use GLIBC_TUNABLES instead of
MALLOC_CHECK_ on glibc >= 2.34, 2022-03-04). Now that glibc split this
out into libc_malloc_debug.so, we have to add it to LD_PRELOAD. We only
do that when we detect glibc, but it's possible to have glibc but not
the malloc debug library. In that case LD_PRELOAD will complain to
stderr, and tests which check for an empty stderr will fail.
You can work around this by setting TEST_NO_MALLOC_CHECK, which disables
the feature entirely. But it's not obvious to know you need to do that.
Instead, since this malloc checking is best-effort anyway, let's just
automatically disable it when the LD_PRELOAD appears not to work. We can
check it by running something simple that should work (and produce
nothing on stderr) like "git version".
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In 88a09a557c (builtin/show-index: provide options to determine hash
algo), the flag --object-format was added to show-index builtin as a way
to provide a hash algorithm explicitly. However, we do not have tests in
place for that functionality. Add them.
Signed-off-by: Abhijeet Sonar <abhijeet.nkt@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In c8aed5e8da (repository: stop setting SHA1 as the default object
hash), we got rid of the default hash algorithm for the_repository.
Due to this change, it is now the responsibility of the callers to set
their own default when this is not present.
As stated in the docs, show-index should use SHA1 as the default hash
algorithm when run outside a repository. Make sure this promise is
met by falling back to SHA1 when the_hash_algo is not present (i.e.
when the command is run outside a repository). Also add a test that
verifies this behavior.
Signed-off-by: Abhijeet Sonar <abhijeet.nkt@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When called with '--left-right' and '--use-bitmap-index', 'rev-list'
will produce output without any left/right markers, which has been
corrected.
* jk/left-right-bitmap:
rev-list: skip bitmap traversal for --left-right
Buildfix and upgrade of Clar to a newer version.
* ps/upgrade-clar:
cmake: set up proper dependencies for generated clar headers
cmake: fix compilation of clar-based unit tests
Makefile: extract script to generate clar declarations
Makefile: adjust sed command for generating "clar-decls.h"
t/unit-tests: update clar to 206accb
Centralize documentation for repository extensions into a single place.
* cw/config-extensions:
doc: consolidate extensions in git-config documentation
Updates the '.clang-format' to match project conventions.
* kn/ci-clang-format-tidy:
clang-format: align consecutive macro definitions
clang-format: re-adjust line break penalties
Update the project's CodingGuidelines to discourage naming functions
with a "_1()" suffix.
* kn/arbitrary-suffixes:
CodingGuidelines: discourage arbitrary suffixes in function names
When trying to describe a commit, we'll traverse from the commit,
collecting candidate tags that point to its ancestors. But once we've
seen all of the tags in the repo, there's no point in traversing
further. There's nothing left to find!
For a default "git describe", this isn't usually a big problem. In a
large repo you'll probably have multiple tags, so we'll eventually find
10 candidates (the default for max_candidates) and stop there. And in a
small repo, it's quick to traverse to the root.
But you can imagine a large repo with few tags. Or, as we saw in a real
world case, explicitly limiting the set of matches like this (on
linux.git):
git describe --match=v6.12-rc4 HEAD
which goes all the way to the root before realizing that no, there are
no other tags under consideration besides the one we fed via --match.
If we add in "--candidates=1" there, it's much faster (at least as of
the previous commit).
But we should be able to speed this up without the user asking for it.
After expanding all matching tags, we know the total number of names. We
could just stop the traversal there, but as hinted at above we already
have a mechanism for doing that: the max_candidate limit. So we can just
reduce that limit to match the number of possible candidates.
Our p6100 test shows this off:
Test HEAD^ HEAD
---------------------------------------------------------------------------------------
6100.2: describe HEAD 0.71(0.65+0.06) 0.72(0.68+0.04) +1.4%
6100.3: describe HEAD with one max candidate 0.01(0.00+0.00) 0.01(0.00+0.00) +0.0%
6100.4: describe HEAD with one tag 0.72(0.66+0.05) 0.01(0.00+0.00) -98.6%
Now we are fast automatically, just as if --candidates=1 were supplied
by the user.
Reported-by: Josh Poimboeuf <jpoimboe@kernel.org>
Helped-by: Rasmus Villemoes <ravi@prevas.dk>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
By default, describe considers only 10 candidate matches, and stops
traversing when we have enough. This makes things much faster in a large
repository, where collecting all candidates requires walking all the way
down to the root (or at least to the oldest tag). This goes all the way
back to 8713ab3079 (Improve git-describe performance by reducing
revision listing., 2007-01-13).
However, we don't stop immediately when we have enough candidates. We
keep traversing and only bail when we find one more candidate that we're
ignoring. Usually this is not too expensive, if the tags are sprinkled
evenly throughout history. But if you are unlucky, you might hit the max
candidate quickly, and then have a huge swath of history before finding
the next one.
Our p6100 test has exactly this unlucky case: with a max of "1", we find
a recent tag quickly and then have to go all the way to the root to find
the old tag that will be discarded.
A more interesting real-world case is:
git describe --candidates=1 --match=v6.12-rc4 HEAD
in the linux.git repo. There we restrict the set of tags to a single
one, so there is no older candidate to find at all! But despite
--candidates=1, we keep traversing to the root only to find nothing.
So why do we keep traversing after hitting thet max? There are two
reasons I can see:
1. In theory the extra information that there was another candidate
could be useful, and we record it in the gave_up_on variable. But
we only show this information with --debug.
2. After finding the candidate, there's more processing we do in our
loop. The most important of this is propagating the "within" flags
to our parent commits, and putting them in the commit_list we'll
use for finish_depth_computation().
That function continues the traversal until we've counted all
commits reachable from the starting point but not reachable from
our best candidate tag (so essentially counting "$tag..$start", but
avoiding re-walking over the bits we've seen). If we break
immediately without putting those commits into the list, our depth
computation will be wrong (in the worst case we'll count all the
way down to the root, not realizing those commits are included in
our tag).
But we don't need to find a new candidate for (2). As soon as we finish
the loop iteration where we hit max_candidates, we can then quit on the
next iteration. This should produce the same output as the original code
(which could, after all, find a candidate on the very next commit
anyway) but ends the traversal with less pointless digging.
We still have to set "gave_up_on"; we've popped it off the list and it
has to go back. An alternative would be to re-order the loop so that it
never gets popped, but it's perhaps still useful to show in the --debug
output, so we need to know it anyway. We do have to adjust the --debug
output since it's now just a commit where we stopped traversing, and not
the max+1th candidate.
p6100 shows the speedup using linux.git:
Test HEAD^ HEAD
---------------------------------------------------------------------------------------
6100.2: describe HEAD 0.70(0.63+0.06) 0.71(0.66+0.04) +1.4%
6100.3: describe HEAD with one max candidate 0.70(0.64+0.05) 0.01(0.00+0.00) -98.6%
6100.4: describe HEAD with one tag 0.70(0.67+0.03) 0.70(0.63+0.06) +0.0%
Reported-by: Josh Poimboeuf <jpoimboe@kernel.org>
Helped-by: Rasmus Villemoes <ravi@prevas.dk>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We don't have a perf script for git-describe, despite it often being
accused of slowness. Let's add a few simple tests to start with.
Rather than use the existing tags from our test repo, we'll make our own
so that we have a known quantity and position. We'll add a "new" tag
near the tip of HEAD, and an "old" one that is at the very bottom. And
then our tests are:
1. Describing HEAD naively requires walking all the way down to the
old tag as we collect candidates. This gives us a baseline for what
"slow" looks like.
2. Doing the same with --candidates=1 can potentially be fast, because
we can quie after finding "new". But we don't, and it's also slow.
3. Likewise we should be able to quit when there are no more tags to
find. This can happen naturally if a repo has few tags, but also if
you restrict the set of tags with --match.
Here are the results running against linux.git. Note that I have a
commit-graph built for the repo, so "slow" here is ~700ms. Without a
commit graph it's more like 9s!
Test HEAD
--------------------------------------------------------------
6100.2: describe HEAD 0.70(0.66+0.04)
6100.3: describe HEAD with one max candidate 0.70(0.66+0.04)
6100.4: describe HEAD with one tag 0.70(0.64+0.06)
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Commit 30b1c7ad9d (describe: don't abort too early when searching tags,
2020-02-26) tried to fix a problem that happens when there are disjoint
histories: to accurately compare the counts for different tags, we need
to keep walking the history longer in order to find a common base.
But its fix misses a case: we may still bail early if we hit the
max_candidates limit, producing suboptimal output. You can see this in
action by adding "--candidates=2" to the tests; we'll stop traversing as
soon as we see the second tag and will produce the wrong answer. I hit
this in practice while trying to teach git-describe not to keep looking
for candidates after we've seen all tags in the repo (effectively adding
--candidates=2, since these toy repos have only two tags each).
This is probably fixable by continuing to walk after hitting the
max-candidates limit, all the way down to a common ancestor of all
candidates. But it's not clear in practice what the preformance
implications would be (it would depend on how long the branches that
hold the candidates are).
So I'm punting on that for now, but I'd like to adjust the tests to be
more resilient, and to document the findings. So this patch:
1. Adds an extra tag at the bottom of history. This shouldn't change
the output, but does mean we are more resilient to low values of
--candidates (e.g., if we start reducing it to the total number of
tags). This is arguably closer to the real world anyway, where
you're not going to have just 2 tags, but an arbitrarily long
history going back in time, possibly with multiple irrelevant tags
in it (I called the new tag "H" here for "history").
2. Run the same tests with --candidates=2, which shows that even with
the current code they can fail if we end the traversal early. That
leaves a trail for anybody interested in trying to improve the
behavior.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
By default, Windows restricts access to files when those files have been
opened by another process. As explained in the preceding commits, these
restrictions can be loosened such that reads, writes and/or deletes of
files with open handles _are_ allowed.
While we set up those sharing flags in most relevant code paths now, we
still don't properly handle POSIX-style atomic renames in case the
target path is open. This is failure demonstrated by t0610, where one of
our tests spawns concurrent writes in a reftable-enabled repository and
expects all of them to succeed. This test fails most of the time because
the process that has acquired the "tables.list" lock is unable to rename
it into place while other processes are busy reading that file.
Windows 10 has introduced the `FILE_RENAME_FLAG_POSIX_SEMANTICS` flag
that allows us to fix this usecase [1]. When set, it is possible to
rename a file over a preexisting file even when the target file still
has handles open. Those handles must have been opened with the
`FILE_SHARE_DELETE` flag, which we have ensured in the preceding
commits.
Careful readers might have noticed that [1] does not mention the above
flag, but instead mentions `FILE_RENAME_POSIX_SEMANTICS`. This flag is
not for use with `SetFileInformationByHandle()` though, which is what we
use. And while the `FILE_RENAME_FLAG_POSIX_SEMANTICS` flag exists, it is
not documented on [2] or anywhere else as far as I can tell.
Unfortunately, we still support Windows systems older than Windows 10
that do not yet have this new flag. Our `_WIN32_WINNT` SDK version still
targets 0x0600, which is Windows Vista and later. And even though that
Windows version is out-of-support, bumping the SDK version all the way
to 0x0A00, which is Windows 10 and later, is not an option as it would
make it impossible to compile on Windows 8.1, which is still supported.
Instead, we have to manually declare the relevant infrastructure to make
this feature available and have fallback logic in place in case we run
on a Windows version that does not yet have this flag.
On another note: `mingw_rename()` has a retry loop that is used in case
deleting a file failed because it's still open in another process. One
might be pressed to not use this loop anymore when we can use POSIX
semantics. But unfortunately, we have to keep it around due to our
dependence on the `FILE_SHARE_DELETE` flag. While we know to set that
sharing flag now, other applications may not do so and may thus still
cause sharing violations when we try to rename a file.
This fixes concurrent writes in the reftable backend as demonstrated in
t0610, but may also end up fixing other usecases where Git wants to
perform renames.
[1]: https://learn.microsoft.com/en-us/windows-hardware/drivers/ddi/ntifs/ns-ntifs-_file_rename_information
[2]: https://learn.microsoft.com/en-us/windows/win32/api/winbase/ns-winbase-file_rename_info
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Reviewed-by: Johannes Sixt <j6t@kdbg.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When fetching, there is a step in which sought objects are first checked
against the local repository; only objects that are not in the local
repository are then fetched. This check first looks up the commit graph
file, and returns "present" if the object is in there.
However, the action of first looking up the commit graph file is not
done everywhere in Git, especially if the type of the object at the time
of lookup is not known. This means that in a repo corruption situation,
a user may encounter an "object missing" error, attempt to fetch it, and
still encounter the same error later when they reattempt their original
action, because the object is present in the commit graph file but not in
the object DB.
Therefore, make it a fatal error when this occurs. (Note that we cannot
proceed to include this object in the list of objects to be fetched
without changing at least the fetch negotiation code: what would happen
is that the client will send "want X" and "have X" and when I tested
at $DAYJOB with a work server that uses JGit, the server reasonably
returned an empty packfile. And changing the fetch negotiation code to
only use the object DB when deciding what to report as "have" would be
an unnecessary slowdown, I think.)
This was discovered when a lazy fetch of a missing commit completed with
nothing actually fetched, and the writing of the commit graph file after
every fetch then attempted to read said missing commit, triggering a
lazy fetch of said missing commit, resulting in an infinite loop with no
user-visible indication (until they check the list of processes running
on their computer). With this fix, there is no infinite loop. Note that
although the repo corruption we discovered was caused by a bug in GC in
a partial clone, the behavior that this patch teaches Git to warn about
applies to any repo with commit graph enabled and with a missing commit,
whether it is a partial clone or not.
t5330, introduced in 3a1ea94a49 (commit-graph.c: no lazy fetch in
lookup_commit_in_graph(), 2022-07-01), tests that an interaction between
fetch and the commit graph does not cause an infinite loop. This patch
changes the exit code in that situation, so that test had to be changed.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This reverts commit a6e65fb39caf18259c660c1c7910d5bf80bc15cb.
This revert simplifies the next patch in this patch set.
The commit message of that commit mentions that the new function "will
be used for the bundle-uri client in a subsequent commit", but it seems
that eventually it wasn't used.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The documentation for the --shallow-exclude option to clone/fetch/etc.
claims that the option takes a revision, but it does not. As per
upload-pack.c's process_deepen_not(), it passes the option to
expand_ref() and dies if it does not find exactly one ref matching the
name passed. Further, this has always been the case ever since these
options were introduced by the commits merged in a460ea4a3cb1 (Merge
branch 'nd/shallow-deepen', 2016-10-10). Fix the documentation to
match the implementation.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This one is a little bit more curious. In t6112, we have a test that
exercises the `git rev-list --filter` option with invalid filters. We
execute git-rev-list(1) via `test_must_fail`, which means that we check
for leaks even though Git exits with an error code. This causes the
following leak:
Direct leak of 27 byte(s) in 1 object(s) allocated from:
#0 0x5555555e6946 in realloc.part.0 lsan_interceptors.cpp.o
#1 0x5555558fb4b6 in xrealloc wrapper.c:137:8
#2 0x5555558b6e06 in strbuf_grow strbuf.c:112:2
#3 0x5555558b7550 in strbuf_add strbuf.c:311:2
#4 0x5555557c1a88 in strbuf_addstr strbuf.h:310:2
#5 0x5555557c1d4c in parse_list_objects_filter list-objects-filter-options.c:261:3
#6 0x555555885ead in handle_revision_pseudo_opt revision.c:2899:3
#7 0x555555884e20 in setup_revisions revision.c:3014:11
#8 0x5555556c4b42 in cmd_rev_list builtin/rev-list.c:588:9
#9 0x5555555ec5e3 in run_builtin git.c:483:11
#10 0x5555555eb1e4 in handle_builtin git.c:749:13
#11 0x5555555ec001 in run_argv git.c:819:4
#12 0x5555555eaf94 in cmd_main git.c:954:19
#13 0x5555556fd569 in main common-main.c:64:11
#14 0x7ffff7ca714d in __libc_start_call_main (.../lib/libc.so.6+0x2a14d)
#15 0x7ffff7ca7208 in __libc_start_main@GLIBC_2.2.5 (.../libc.so.6+0x2a208)
#16 0x5555555ad064 in _start (git+0x59064)
This leak is valid, as we call `die()` and do not clean up the memory at
all. But what's curious is that this is the only leak reported, because
we don't clean up any other allocated memory, either, and I have no idea
why the leak sanitizer treats this buffer specially.
In any case, we can work around the leak by shuffling things around a
bit. Instead of calling `gently_parse_list_objects_filter()` and dying
after we have modified the filter spec, we simply do so beforehand. Like
this we don't allocate the buffer in the error case, which makes the
reported leak go away.
It's not pretty, but it manages to make t6112 leak free.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The `obuf` member of `struct merge_options` is used to buffer output in
some cases. In order to not discard its allocated memory we only release
its contents in `merge_finalize()` when we're not currently recursing
into a subtree.
This results in some situations where we seemingly do not release the
buffer reliably. We thus have calls to `strbuf_release()` for this
buffer scattered across the codebase. But we're missing one callsite in
git-merge(1), which causes a memory leak.
We should ideally refactor this interface so that callers don't have to
know about any such internals. But for now, paper over the issue by
adding one more `strbuf_release()` call.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We use `repo_config_get_string()` to read "status.showUntrackedFiles"
from the config subsystem. This function allocates the result, but we
never free the result after parsing it.
The value never leaves the scope of the calling function, so refactor it
to instead use `repo_config_get_string_tmp()`, which does not hand over
ownership to the caller.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We never release the local `struct strbuf base` buffer, thus leaking
memory. Fix this leak.
This leak is exposed by t7063, but plugging it alone does not make the
whole test suite pass.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
While "common-main.c" already initializes `the_repository` for us, we do
so a second time in the "read-cache" test helper. This causes a memory
leak because the old repository's contents isn't released.
Stop calling `initialize_repository()` to plug this leak.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
While we free the `fsmonitor_dirty` member of `struct index_state`, we
do not free the contents of that EWAH. Do so by using `ewah_free()`
instead of `FREE_AND_NULL()`.
This leak is exposed by t7519, but plugging it alone does not make the
test suite pass.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
There are several cases where we invalidate untracked cache directory
entries where we do not free the underlying data, but reset the number
of entries. This causes us to leak memory because `free_untracked()`
will not iterate over any potential entries which we still had in the
array.
Fix this issue by freeing old entries. The leak is exposed by t7519, but
plugging it alone does not make the whole test suite pass.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The `cnt` variable tracks the number of lines in a patch diff. It can
happen though that there are no newlines, in which case we'd still end
up allocating our array of `sline`s. In fact, we always allocate it with
`cnt + 2` entries: one extra entry for the deletion hunk at the end, and
another entry that we don't seem to ever populate at all but acts as a
kind of sentinel value.
When we loop through the array to clear it at the end of this function
we only loop until `lno < cnt`, and thus we may not end up releasing
whatever the two extra `sline`s contain. While that shouldn't matter for
the sentinel value, it does matter for the extra deletion hunk sline.
Regardless of that, plug this memory leak by releasing both extra
entries, which makes the logic a bit easier to reason about.
While at it, fix the formatting of a local comment, which incidentally
also provides the necessary context for why we overallocate the `sline`
array.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We do not free the key ID when signing a tag fails. Do so by using
the common exit path.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Fix leaking import and export marks for transport helpers.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The cleanup string set by the config is leaking when it is being
overridden by an option. Fix this by tracking these via two separate
variables such that we can free the old value.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When formatting trailer lines we iterate through each of the trailers
and munge their respective token/value pairs according to the trailer
options. When formatting a trailer that has its `item->token` pointer
set we perform the munging in two local buffers. In the case where we
figure out that the value is empty and `trim_empty` is set we just skip
over the trailer item. But the buffers are local to the loop and we
don't release their contents, leading to a memory leak.
Plug this leak by lifting the buffers outside of the loop and releasing
them on function return. This fixes the memory leaks, but also optimizes
the loop as we don't have to reallocate the buffers on every single
iteration.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Fix leaking trailer values when replacing the value with a command or
when the token value is empty.
This leak is exposed by t7513, but plugging it does not make the whole
test suite pass.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
While we free the worktree change data, we never free its contents. Fix
this.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We don't clear `struct upload_pack::uri_protocols`, which causes a
memory leak. Fix this.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The signature check in the formatting context is never getting released.
Fix this to plug the resulting memory leak.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In `do_diff_cache()` we initialize a new `rev_info` and then overwrite
its `diffopt` with a user-provided set of options. This can leak memory
because `repo_init_revisions()` may end up allocating memory for the
`diffopt` itself depending on the configuration. And since that field is
overwritten we won't ever free it.
Plug the memory leak by releasing the diffopts before we overwrite them.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The memory allocated by `prepare_to_use_bloom_filter()` is not released
by `release_revisions()`, causing a memory leak. Plug it.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When executing with `--max-count=0` we'll return early from git-grep(1)
without performing any cleanup, which causes memory leaks. Plug these.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In `grep_splice_or()` we search for the next `TRUE` node in our tree of
grep expressions and replace it with the given new expression. But we
don't free the old node, which causes a memory leak. Plug it.
This leak is exposed by t7810, but plugging it alone isn't sufficient to
make the test suite pass.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The "reach" test tool doesn't bother to clean up any of its allocated
resources, causing various leaks. Plug them.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The list of server options populated via `OPT_STRING_LIST()` is never
cleared, causing a memory leak. Plug it.
This leak is exposed by t5702, but plugging it alone does not make the
whole test suite pass.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
upload-pack.c takes any --shallow-exclude argument(s) from
clone/fetch/etc. and passes them through expand_ref(). If it does not
get back exactly one ref from the call to expand_ref(), it will die with
the following error:
fatal: git upload-pack: ambiguous deepen-not: %s
Given that the documentation suggests to users that --shallow-exclude
accepts a revision rather than a ref (which will be corrected in a
subsequent commit), users may try to pass a revision. In such a case,
expand_ref() will return 0 matches, but the error message we print will
be misleading since "ambiguous" suggests there are multiple matches.
Provide a clearer error message for such a case.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Adhere to Documentation/CodingGuidelines:
- Whitespace and redirect operator.
- Case arms indentation.
- Tabs for indentation.
Signed-off-by: Andrew Kreimer <algonell@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
A subsequent commit will change the behavior of "git index-pack
--promisor", which is exercised in "build pack index for an existing
pack", causing the unclamped and clamped versions of the --window
test to exhibit different behavior. Move the clamp test closer to the
unclamped test that it references.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
A subsequent commit will add functionality: when fetching from a
promisor remote, existing non-promisor objects that are ancestors of any
fetched object will be repacked into promisor packs (since if a promisor
remote has an object, it also has all its ancestors).
This means that sometimes, a fetch from a promisor remote results in 2
new promisor packs (instead of the 1 that you would expect). There is a
test that fetches a descendant of a local object from a promisor remote,
but also specifically tests that there is exactly 1 promisor pack as
a result of the fetch. This means that this test will fail when the
subsequent commit is added.
Since the ancestry of the fetched object is not the concern of this
test, make the fetched objects have no ancestry in common with the
objets in the client repo. This is done by making the server from
scratch, instead of using an existing repo that has objects in common
with the client.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Commit 9a4c507886 (t0410: test fetching from many promisor remotes,
2019-06-25) adds some tests that demonstrate not the automatic fetching
of missing objects, but the direct fetching from another promisor remote
(configured explicitly in one test and implicitly via --filter on the
"git fetch" CLI invocation in the other test) - thus demonstrating
support for multiple promisor remotes, as described in the commit
message.
Change the test descriptions accordingly to make this clearer.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The dumb-http code regressed when the result of re-indexing a pack
yielded an *.idx file that differs in content from the *.idx file it
downloaded from the remote. This has been corrected by no longer
relying on the *.idx file we got from the remote.
* jk/dumb-http-finalize:
packfile: use oidread() instead of hashcpy() to fill object_id
packfile: use object_id in find_pack_entry_one()
packfile: convert find_sha1_pack() to use object_id
http-walker: use object_id instead of bare hash
packfile: warn people away from parse_packed_git()
packfile: drop sha1_pack_index_name()
packfile: drop sha1_pack_name()
packfile: drop has_pack_index()
dumb-http: store downloaded pack idx as tempfile
t5550: count fetches in "previously-fetched .idx" test
midx: avoid duplicate packed_git entries
Replace various calls to atoi() with strtol_i() and strtoul_ui(), and
add improved error handling.
* ua/atoi:
imap: replace atoi() with strtol_i() for UIDVALIDITY and UIDNEXT parsing
merge: replace atoi() with strtol_i() for marker size validation
daemon: replace atoi() with strtoul_ui() and strtol_i()
Teach 'git notes add' and 'git notes append' a new '-e' flag,
instructing them to open the note in $GIT_EDITOR before saving.
* sa/notes-edit:
notes: teach the -e option to edit messages in editor
Documentation update to clarify that 'uploadpack.allowAnySHA1InWant'
implies both 'allowTipSHA1InWant' and 'allowReachableSHA1InWant'.
* ps/upload-pack-doc:
doc: document how uploadpack.allowAnySHA1InWant impact other allow options
Treat ECONNABORTED the same as ECONNRESET in 'git credential-cache' to
work around a possible Cygwin regression. This resolves a race condition
caused by changes in Cygwin's handling of socket closures, allowing the
client to exit cleanly when encountering ECONNABORTED.
* rj/cygwin-exit:
credential-cache: treat ECONNABORTED like ECONNRESET
Test update.
* ua/t3404-cleanup:
t3404: replace test with test_line_count()
t3404: avoid losing exit status with focus on `git show` and `git cat-file`
Various platform compatibility fixes split out of the larger effort
to use Meson as the primary build tool.
* ps/platform-compat-fixes:
t6006: fix prereq handling with `test_format ()`
http: fix build error on FreeBSD
builtin/credential-cache: fix missing parameter for stub function
t7300: work around platform-specific behaviour with long paths on MinGW
t5500, t5601: skip tests which exercise paths with '[::1]' on Cygwin
t3404: work around platform-specific behaviour on macOS 10.15
t1401: make invocation of tar(1) work with Win32-provided one
t/lib-gpg: fix setup of GNUPGHOME in MinGW
t/lib-gitweb: test against the build version of gitweb
t/test-lib: wire up NO_ICONV prerequisite
t/test-lib: fix quoting of TEST_RESULTS_SAN_FILE
Running:
git rev-list --left-right --use-bitmap-index one...two
will produce output without any left-right markers, since the bitmap
traversal returns only a single set of reachable commits. Instead we
should refuse to use bitmaps here and produce the correct output using a
traditional traversal.
This is probably not the only remaining option that misbehaves with
bitmaps, but it's particularly egregious in that it feels like it
_could_ work. Doing two separate traversals for the left/right sides and
then taking the symmetric set differences should yield the correct
answer, but our traversal code doesn't know how to do that.
It's not clear if naively doing two separate traversals would always be
a performance win. A traditional traversal only needs to walk down to
the merge base, but bitmaps always fill out the full reachability set.
So depending on your bitmap coverage, we could end up walking old bits
of history twice to fill out the same uninteresting bits on both sides.
We'd also of course end up with a very large --boundary set, if the user
asked for that.
So this might or might not be something worth implementing later. But
for now, let's make sure we don't produce the wrong answer if somebody
tries it.
The test covers this, but also the same thing with "--count" (which is
what I originally tried in a real-world case). Ironically the
try_bitmap_count() code already realizes that "--left-right" won't work
there. But that just causes us to fall back to the regular bitmap
traversal code, which itself doesn't handle counting (we produce a list
of objects rather than a count).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
In general, we'd like to make sure Git works on the LTS versions of
major Linux distributions. To do that, let's add CI jobs for the oldest
regular (non-extended) LTS versions of the major distributions: Ubuntu
20.04, Debian 11, and RHEL 8. Because RHEL isn't available to the
public at no charge, use AlmaLinux, which is binary compatible with it.
Note that Debian does not offer the language-pack packages, but suitable
locale support can be installed with the locales-all package.
Otherwise, use the set of installation instructions which exist and are
most similar to the existing supported distros.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
We're no longer testing this version and it's well beyond regular LTS
support now, so remove the stanza for it from the case statement in our
CI code.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Ubuntu 16.04 is past its normal LTS lifespan, so let's switch to Ubuntu
20.04 instead, which is the latest regular LTS version.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Avoid losing exit status by having Git command being tested on the
upstream side of a pipe.
* co/t6050-pipefix:
t6050: avoid pipes with upstream Git commands
Teaches the ref-filter machinery to recognize and avoid cases where
sorting would be redundant.
* ps/ref-filter-sort:
ref-filter: format iteratively with lexicographic refname sorting
Implements a new reftable-specific strbuf replacement to reduce
reftable's dependency on Git-specific data structures.
* ps/reftable-strbuf:
reftable: handle trivial `reftable_buf` errors
reftable/stack: adapt `stack_filename()` to handle allocation failures
reftable/record: adapt `reftable_record_key()` to handle allocation failures
reftable/stack: adapt `format_name()` to handle allocation failures
t/unit-tests: check for `reftable_buf` allocation errors
reftable/blocksource: adapt interface name
reftable: convert from `strbuf` to `reftable_buf`
reftable/basics: provide new `reftable_buf` interface
reftable: stop using `strbuf_addf()`
reftable: stop using `strbuf_addbuf()`
The planet keeps revolving, and CI definitions (even old ones) need to
be kept up to date, even if they worked unchanged before (because now
they don't).
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Windows by default has a restriction in place to only allow paths up to
260 characters. This restriction can nowadays be lifted by setting a
registry key, but is still active by default.
In t7300 we have one test that exercises the behaviour of git-clean(1)
with such long paths. Interestingly enough, this test fails on my system
that uses Windows 10 with mingw-w64 installed via MSYS2: instead of
observing ENAMETOOLONG, we observe ENOENT. This behaviour is consistent
across multiple different environments I have tried.
I cannot say why exactly we observe a different error here, but I would
not be surprised if this was either dependent on the Windows version,
the version of MinGW, the current working directory of Git or any kind
of combination of these.
Work around the issue by handling both errors.
[Backported from 106834e34a2 (t7300: work around platform-specific
behaviour with long paths on MinGW, 2024-10-09).]
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Windows compiler suddenly started complaining that calloc(3) takes
its arguments in <nmemb, size> order. Indeed, there are many calls
that has their arguments in a _wrong_ order.
Fix them all.
A sample breakage can be seen at
https://github.com/git/git/actions/runs/9046793153/job/24857988702#step:4:272
[Backported from f01301aabe1 (compat/regex: fix argument order to
calloc(3), 2024-05-11).]
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Johannes Schindelin <Johannes.Schindelin@gmx.de>
In 08809c09aa13 (mingw: add a helper function to attach GDB to the
current process, 2020-02-13), I added a declaration that was not needed.
Back then, that did not matter, but now that the declaration of that
symbol was changed in mingw-w64's headers, it causes the following
compile error:
CC compat/mingw.o
compat/mingw.c: In function 'open_in_gdb':
compat/mingw.c:35:9: error: function declaration isn't a prototype [-Werror=strict-prototypes]
35 | extern char *_pgmptr;
| ^~~~~~
In file included from C:/git-sdk-64/usr/src/git/build-installers/mingw64/lib/gcc/x86_64-w64-mingw32/14.1.0/include/mm_malloc.h:27,
from C:/git-sdk-64/usr/src/git/build-installers/mingw64/lib/gcc/x86_64-w64-mingw32/14.1.0/include/xmmintrin.h:34,
from C:/git-sdk-64/usr/src/git/build-installers/mingw64/lib/gcc/x86_64-w64-mingw32/14.1.0/include/immintrin.h:31,
from C:/git-sdk-64/usr/src/git/build-installers/mingw64/lib/gcc/x86_64-w64-mingw32/14.1.0/include/x86intrin.h:32,
from C:/git-sdk-64/usr/src/git/build-installers/mingw64/include/winnt.h:1658,
from C:/git-sdk-64/usr/src/git/build-installers/mingw64/include/minwindef.h:163,
from C:/git-sdk-64/usr/src/git/build-installers/mingw64/include/windef.h:9,
from C:/git-sdk-64/usr/src/git/build-installers/mingw64/include/windows.h:69,
from C:/git-sdk-64/usr/src/git/build-installers/mingw64/include/winsock2.h:23,
from compat/../git-compat-util.h:215,
from compat/mingw.c:1:
compat/mingw.c:35:22: error: '__p__pgmptr' redeclared without dllimport attribute: previous dllimport ignored [-Werror=attributes]
35 | extern char *_pgmptr;
| ^~~~~~~
Let's just drop the declaration and get rid of this compile error.
[Backported from 3c295c87c25 (mingw: drop bogus (and unneeded)
declaration of `_pgmptr`, 2024-06-19).]
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Linux32 jobs seem to be getting:
Error: This request has been automatically failed because it uses a
deprecated version of `actions/upload-artifact: v1`. Learn more:
https://github.blog/changelog/2024-02-13-deprecation-notice-v1-and-v2-of-the-artifact-actions/
before doing anything useful. For now, disable the step.
Ever since actions/upload-artifact@v1 got disabled, mentioning the
offending version of it seems to stop anything from happening. At
least this should run the same build and test.
See
https://github.com/git/git/actions/runs/10780030750/job/29894867249
for example.
[Backported from 90f2c7240cc (ci: remove 'Upload failed tests'
directories' step from linux32 jobs, 2024-09-09).]
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
In df383b5842 (t/test-lib: wire up NO_ICONV prerequisite, 2024-10-16) we
have introduced a new NO_ICONV prerequisite that makes us skip tests in
case Git is not compiled with support for iconv. This change subtly
broke t6006: while the test suite still passes, some of its tests won't
execute because they run into an error.
./t6006-rev-list-format.sh: line 92: test_expect_%e: command not found
The broken tests use `test_format ()`, and the mentioned commit simply
prepended the new prerequisite to its arguments. But that does not work,
as the function is not aware of prereqs at all and will now treat all of
its arguments incorrectly.
Fix this by making the function aware of prereqs by accepting an
optional fourth argument. Adapt the callsites accordingly.
Reported-by: Josh Steadmon <steadmon@google.com>
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
On Windows, we emulate open(3p) via `mingw_open()`. This function
implements handling of some platform-specific quirks that are required
to make it behave as closely as possible like open(3p) would, but for
most cases we just call the Windows-specific `_wopen()` function.
This function has a major downside though: it does not allow us to
specify the sharing mode. While there is `_wsopen()` that allows us to
pass sharing flags, those sharing flags are not the same `FILE_SHARE_*`
flags as `CreateFileW()` accepts. Instead, `_wsopen()` only allows
concurrent read- and write-access, but does not allow for concurrent
deletions. Unfortunately though, we have to allow concurrent deletions
if we want to have POSIX-style atomic renames on top of an existing file
that has open file handles.
Implement a new function that emulates open(3p) for existing files via
`CreateFileW()` such that we can set the required sharing flags.
While we have the same issue when calling open(3p) with `O_CREAT`,
implementing that mode would be more complex due to the required
permission handling. Furthermore, atomic updates via renames typically
write to exclusive lockfile and then perform the rename, and thus we
don't have to handle the case where the locked path has been created
with `O_CREATE`. So while it would be nice to have proper POSIX
semantics in all paths, we instead aim for a minimum viable fix here.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Unless told otherwise, Windows will keep other processes from reading,
writing and deleting files when one has an open handle that was created
via `CreateFileW()`. This behaviour can be altered via `FILE_SHARE_*`
flags:
- `FILE_SHARE_READ` allows a concurrent process to open the file for
reading.
- `FILE_SHARE_WRITE` allows a concurrent process to open the file for
writing.
- `FILE_SHARE_DELETE` allows a concurrent process to delete the file
or to replace it via an atomic rename.
This sharing mechanism is quite important in the context of Git, as we
assume POSIX semantics all over the place. But there are two callsites
where we don't pass all three of these flags:
- We don't set `FILE_SHARE_DELETE` when creating a file for appending
via `mingw_open_append()`. This makes it impossible to delete the
file from another process or to replace it via an atomic rename. The
function was introduced via d641097589 (mingw: enable atomic
O_APPEND, 2018-08-13) and has been using `FILE_SHARE_READ |
FILE_SHARE_WRITE` since the inception. There aren't any indicators
that the omission of `FILE_SHARE_DELETE` was intentional.
- We don't set any sharing flags in `mingw_utime()`, which changes the
access and modification of a file. This makes it impossible to
perform any kind of operation on this file at all from another
process. While we only open the file for a short amount of time to
update its timestamps, this still opens us up for a race condition
with another process.
`mingw_utime()` was originally implemented via `_wopen()`, which
doesn't give you full control over the sharing mode. Instead, it
calls `_wsopen()` with `_SH_DENYNO`, which ultimately translates to
`FILE_SHARE_READ | FILE_SHARE_WRITE`. It was then refactored via
090a3085bc (t/helper/test-chmtime: update mingw to support chmtime
on directories, 2022-03-02) to use `CreateFileW()`, but we stopped
setting any sharing flags at all, which seems like an unintentional
side effect. By restoring `FILE_SHARE_READ | FILE_SHARE_WRITE` we
thus fix this and get back the old behaviour of `_wopen()`.
The fact that we didn't set the equivalent of `FILE_SHARE_DELETE`
can be explained, as well: neither `_wopen()` nor `_wsopen()` allow
you to do so. So overall, it doesn't seem intentional that we didn't
allow deletions here, either.
Adapt both of these callsites to pass all three sharing flags.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
When chasing a REF_DELTA, we need to pull the raw hash bytes out of the
mmap'd packfile into an object_id struct. We do that with a raw
hashcpy() of the appropriate length (that happens directly now, though
before the previous commit it happened inside find_pack_entry_one(),
also using a hashcpy).
But I think this creates a potentially dangerous situation due to
d4d364b2c7 (hash: convert `oidcmp()` and `oideq()` to compare whole
hash, 2024-06-14). When using sha1, we'll have uninitialized bytes in
the latter part of the object_id.hash buffer, which could fool oideq(),
etc.
We should use oidread() instead, which correctly zero-pads the extra
bytes, as of c98d762ed9 (global: ensure that object IDs are always
padded, 2024-06-14).
As far as I can see, this has not been a problem in practice because the
object_id we feed to find_pack_entry_one() is never used with oideq(),
etc. It is being compared to the bytes mmap'd from a pack idx file,
which of course do not have the extra padding bytes themselves. So
there's no bug here, but this just puzzled me while looking at the code.
We should do the more obviously safe thing, both for future-proofing and
to avoid confusing readers.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
The main function we use to search a pack index for an object is
find_pack_entry_one(). That function still takes a bare pointer to the
hash, despite the fact that its underlying bsearch_pack() function needs
an object_id struct. And so we end up making an extra copy of the hash
into the struct just to do a lookup.
As it turns out, all callers but one already have such an object_id. So
we can just take a pointer to that struct and use it directly. This
avoids the extra copy and provides a more type-safe interface.
The one exception is get_delta_base() in packfile.c, when we are chasing
a REF_DELTA from inside the pack (and thus we have a pointer directly to
the mmap'd pack memory, not a struct). We can just bump the hashcpy()
from inside find_pack_entry_one() to this one caller that needs it.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
The find_sha1_pack() function has a few problems:
- it's badly named, since it works with any object hash
- it takes the hash as a bare pointer rather than an object_id struct
We can fix both of these easily, as all callers actually have a real
object_id anyway.
I also found the existence of this function somewhat confusing, as it is
about looking in an arbitrary set of linked packed_git structs. It's
good for things like dumb-http which are looking in downloaded remote
packs, and not our local packs. But despite the name, it is not a good
way to find the pack which contains a local object (it skips the use of
the midx, the pack mru list, and so on).
So let's also add an explanatory comment above the declaration that may
point people in the right direction.
I suspect the calls in fast-import.c, which use the packed_git list from
the repository struct, could actually just be using find_pack_entry().
But since we'd need to keep it anyway for dumb-http, I didn't dig
further there. If we eventually drop dumb-http support, then it might be
worth examining them to see if we can get rid of the function entirely.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
We long ago switched most code to using object_id structs instead of
bare "unsigned char *" hashes. This gives us more type safety from the
compiler, and generally makes it easier to understand what we expect in
each parameter.
But the dumb-http code has lagged behind. And indeed, the whole "walker"
subsystem interface has the same problem, though http-walker is the only
user left.
So let's update the walker interface to pass object_id structs (which we
already have anyway at all call sites!), and likewise use those within
the http-walker methods that it calls.
This cleans up the dumb-http code a bit, but will also let us fix a few
more commonly used helper functions.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
With a name like parse_packed_git(), you might think it's the right way
to access a local pack index and its associated objects. But not so!
It's a one-off used by the dumb-http code to access pack idx files we've
downloaded from the remote, but whose packs we might not have.
There's only one caller left for this function, and ideally we'd drop it
completely and just inline it there. But that would require exposing
other internals from packfile.[ch], like alloc_packed_git() and
check_packed_git_idx().
So let's leave it be for now, and just warn people that it's probably
not what they're looking for. Perhaps in the long run if we eventually
drop dumb-http support, we can remove the function entirely then.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Like sha1_pack_name() that we dropped in the previous commit, this
function uses an error-prone static strbuf and the somewhat misleading
name "sha1".
The only caller left is in pack-redundant.c. While this command is
marked for potential removal in our BreakingChanges document, we still
have it for now. But it's simple enough to convert it to use its own
strbuf with the underlying odb_pack_name() function, letting us drop the
otherwise obsolete function.
Note that odb_pack_name() does its own strbuf_reset(), so it's safe to
use directly within a loop like this.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
The sha1_pack_name() function has a few ugly bits:
- it writes into a static strbuf (and not even a ring buffer of them),
which can lead to subtle invalidation problems
- it uses the term "sha1", but it's really using the_hash_algo, which
could be sha256
There's only one caller of it left. And in fact that caller is better
off using the underlying odb_pack_name() function itself, since it's
just copying the result into its own strbuf anyway.
Converting that caller lets us get rid of this now-obselete function.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
The has_pack_index() function has several oddities that may make it
surprising if you are trying to find out if we have a pack with some
$hash:
- it is not looking for a valid pack that we found while searching
object directories. It just looks for any pack-$hash.idx file in the
pack directory.
- it only looks in the local directory, not any alternates
- it takes a bare "unsigned char" hash, which we try to avoid these
days
The only caller it has is in the dumb http code; it wants to know if we
already have the pack idx in question. This can happen if we downloaded
the pack (and generated its index) during a previous fetch.
Before the previous patch ("dumb-http: store downloaded pack idx as
tempfile"), it could also happen if we downloaded the .idx from the
remote but didn't get the matching .pack. But since that patch, we don't
hold on to those .idx files. So there's no need to look for the .idx
file in the filesystem; we can just scan through the packed_git list to
see if we have it.
That lets us simplify the dumb http code a bit, as we know that if we
have the .idx we have the matching .pack already. And it lets us get rid
of this odd function that is unlikely to be needed again.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
This patch fixes a regression in b1b8dfde69 (finalize_object_file():
implement collision check, 2024-09-26) where fetching a v1 pack idx file
over the dumb-http protocol would cause the fetch to fail.
The core of the issue is that dumb-http stores the idx we fetch from the
remote at the same path that will eventually hold the idx we generate
from "index-pack --stdin". The sequence is something like this:
0. We realize we need some object X, which we don't have locally, and
nor does the other side have it as a loose object.
1. We download the list of remote packs from objects/info/packs.
2. For each entry in that file, we download each pack index and store
it locally in .git/objects/pack/pack-$hash.idx (the $hash is not
something we can verify yet and is given to us by the remote).
3. We check each pack index we got to see if it has object X. When we
find a match, we download the matching .pack file from the remote
to a tempfile. We feed that to "index-pack --stdin", which
reindexes the pack, rather than trusting that it has what the other
side claims it does. In most cases, this will end up generating the
exact same (byte-for-byte) pack index which we'll store at the same
pack-$hash.idx path, because the index generation and $hash id are
computed based on what's in the packfile. But:
a. The other side might have used other options to generate the
index. For instance we use index v2 by default, but long ago
it was v1 (and you can still ask for v1 explicitly).
b. The other side might even use a different mechanism to
determine $hash. E.g., long ago it was based on the sorted
list of objects in the packfile, but we switched to using the
pack checksum in 1190a1acf8 (pack-objects: name pack files
after trailer hash, 2013-12-05).
The regression we saw in the real world was (3a). A recent client
fetching from a server with a v1 index downloaded that index, then
complained about trying to overwrite it with its own v2 index. This
collision is otherwise harmless; we know we want to replace the remote
version with our local one, but the collision check doesn't realize
that.
There are a few options to fix it:
- we could teach index-pack a command-line option to ignore only pack
idx collisions, and use it when the dumb-http code invokes
index-pack. This would be an awkward thing to expose users to and
would involve a lot of boilerplate to get the option down to the
collision code.
- we could delete the remote .idx file right before running
index-pack. It should be redundant at that point (since we've just
downloaded the matching pack). But it feels risky to delete
something from our own .git/objects based on what the other side has
said. I'm not entirely positive that a malicious server couldn't lie
about which pack-$hash.idx it has and get us to delete something
precious.
- we can stop co-mingling the downloaded idx files in our local
objects directory. This is a slightly bigger change but I think
fixes the root of the problem more directly.
This patch implements the third option. The big design questions are:
where do we store the downloaded files, and how do we manage their
lifetimes?
There are some additional quirks to the dumb-http system we should
consider. Remember that in step 2 we downloaded every pack index, but in
step 3 we may only download some of the matching packs. What happens to
those other idx files now? They sit in the .git/objects/pack directory,
possibly waiting to be used at a later date. That may save bandwidth for
a subsequent fetch, but it also creates a lot of weird corner cases:
- our local object directory now has semi-untrusted .idx files sitting
around, without their matching .pack
- in case 3b, we noted that we might not generate the same hash as the
other side. In that case even if we download the matching pack,
our index-pack invocation will store it in a different
pack-$hash.idx file. And the unmatched .idx will sit there forever.
- if the server repacks, it may delete the old packs. Now we have
these orphaned .idx files sitting around locally that will never be
used (nor deleted).
- if we repack locally we may delete our local version of the server's
pack index and not realize we have it. So we'll download it again,
even though we have all of the objects it mentions.
I think the right solution here is probably some more complex cache
management system: download the remote .idx files to their own storage
directory, mark them as "seen" when we get their matching pack (to avoid
re-downloading even if we repack), and then delete them when the
server's objects/info/refs no longer mentions them.
But since the dumb http protocol is so ancient and so inferior to the
smart http protocol, I don't think it's worth spending a lot of time
creating such a system. For this patch I'm just downloading the idx
files to .git/objects/tmp_pack_*, and marking them as tempfiles to be
deleted when we exit (and due to the name, any we miss due to a crash,
etc, should eventually be removed by "git gc" runs based on timestamps).
That is slightly worse for one case: if we download an idx but not the
matching pack, we won't retain that idx for subsequent runs. But the
flip side is that we're making other cases better (we never hold on to
useless idx files forever). I suspect that worse case does not even come
up often, since it implies that the packs are generated to match
distinct parts of history (i.e., in practice even in a repo with many
packs you're going to end up grabbing all of those packs to do a clone).
If somebody really cares about that, I think the right path forward is a
managed cache directory as above, and this patch is providing the first
step in that direction anyway (by moving things out of the objects/pack/
directory).
There are two test changes. One demonstrates the broken v1 index case
(it double-checks the resulting clone with fsck to be careful, but prior
to this patch it actually fails at the clone step). The other tweaks the
expectation for a test that covers the "slightly worse" case to
accommodate the extra index download.
The code changes are fairly simple. We stop using finalize_object_file()
to copy the remote's index file into place, and leave it as a tempfile.
We give the tempfile a real ".idx" name, since the packfile code expects
that, and thus we make sure it is out of the usual packs/ directory (so
we'd never mistake it for a real local .idx).
We also have to change parse_pack_index(), which creates a temporary
packed_git to access our index (we need this because all of the pack idx
code assumes we have that struct). It reads the index data from the
tempfile, but prior to this patch would speculatively write the
finalized name into the packed_git struct using the pack-$hash we expect
to use.
I was mildly surprised that this worked at all, since we call
verify_pack_index() on the packed_git which mentions the final name
before moving the file into place! But it works because
parse_pack_index() leaves the mmap-ed data in the struct, so the
lazy-open in verify_pack_index() never triggers, and we read from the
tempfile, ignoring the filename in the struct completely. Hacky, but it
works.
After this patch, parse_pack_index() now uses the index filename we pass
in to derive a matching .pack name. This is OK to change because there
are only two callers, both in the dumb http code (and the other passes
in an existing pack-$hash.idx name, so the derived name is going to be
pack-$hash.pack, which is what we were using anyway).
I'll follow up with some more cleanups in that area, but this patch is
sufficient to fix the regression.
Reported-by: fox <fox.gbr@townlong-yak.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
We have a test in t5550 that looks at index fetching over dumb http. It
creates two branches, each of which is completely stored in its own
pack, then fetches the branches independently. What should (and does)
happen is that the first fetch grabs both .idx files and one .pack file,
and then the fetch of the second branch re-uses the previously
downloaded .idx files (fetching none) and grabs the now-required .pack
file.
Since the next few patches will be touching this area of the code, let's
beef up the test a little by checking that we're downloading the
expected items at each step.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
When we scan a pack directory to load the idx entries we find into the
packed_git list, we skip any of them that are contained in a midx. We
then load them later lazily if we actually need to access the
corresponding pack, referencing them both from the midx struct and the
packed_git list.
The lazy-load in the midx code checks to see if the midx already
mentions the pack, but doesn't otherwise check the packed_git list. This
makes sense, since we should have added any pack to both lists.
But there's a loophole! If we call close_object_store(), that frees the
midx entirely, but _not_ the packed_git structs, which we must keep
around for Reasons[1]. If we then try to look up more objects, we'll
auto-load the midx again, which won't realize that we've already loaded
those packs, and will create duplicate entries in the packed_git list.
This is possibly inefficient, because it means we may open and map the
pack redundantly. But it can also lead to weird user-visible behavior.
The case I found is in "git repack", which closes and reopens the midx
after repacking and then calls update_server_info(). We end up writing
the duplicate entries into objects/info/packs.
We could obviously de-dup them while writing that file, but it seems
like a violation of more core assumptions that we end up with these
duplicate structs at all.
We can avoid the duplicates reasonably efficiently by checking their
names in the pack_map hash. This annoyingly does require a little more
than a straight hash lookup due to the naming conventions, but it should
only happen when we are about to actually open a pack. I don't think one
extra malloc will be noticeable there.
[1] I'm not entirely sure of all the details, except that we generally
assume the packed_git structs never go away. We noted this
restriction in the comment added by 6f1e9394e2 (object: fix leaking
packfiles when closing object store, 2024-08-08), but it's somewhat
vague. At any rate, if you try freeing the structs in
close_object_store(), you can observe segfaults all over the test
suite. So it might be fixable, but it's not trivial.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Teaches 'shortlog' to explicitly use SHA-1 when operating outside of
a repository.
* wm/shortlog-hash:
builtin/shortlog: explicitly set hash algo when there is no repo
Commands that can also work outside Git have learned to take the
repository instance "repo" when we know we are in a repository, and
NULL when we are not, in a parameter. The uses of the_repository
variable in a few of them have been removed using the new calling
convention.
* jc/a-commands-without-the-repo:
archive: remove the_repository global variable
annotate: remove usage of the_repository global
git: pass in repo to builtin based on setup_git_directory_gently
Enable Windows-based CI in GitLab.
* ps/ci-gitlab-windows:
gitlab-ci: exercise Git on Windows
gitlab-ci: introduce stages and dependencies
ci: handle Windows-based CI jobs in GitLab CI
ci: create script to set up Git for Windows SDK
t7300: work around platform-specific behaviour with long paths on MinGW
A "git fetch" from the superproject going down to a submodule used
a wrong remote when the default remote names are set differently
between them.
* db/submodule-fetch-with-remote-name-fix:
submodule: correct remote name with fetch
Replace unsafe uses of atoi() with strtol_i() to improve error handling
when parsing UIDVALIDITY, UIDNEXT, and APPENDUID in IMAP commands.
Invalid values, such as those with letters, now trigger error messages and
prevent malformed status responses.
I did not add any test for this commit as we do not have any test
for git-imap-send(1) at this point.
Signed-off-by: Usman Akinyemi <usmanakinyemi202@gmail.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Replace atoi() with strtol_i() for parsing conflict-marker-size to
improve error handling. Invalid values, such as those containing letters
now trigger a clear error message.
Update the test to verify invalid input handling.
Signed-off-by: Usman Akinyemi <usmanakinyemi202@gmail.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Replace atoi() with strtoul_ui() for --timeout and --init-timeout
(non-negative integers) and with strtol_i() for --max-connections
(signed integers). This improves error handling and input validation
by detecting invalid values and providing clear error messages.
Update tests to ensure these arguments are properly validated.
Signed-off-by: Usman Akinyemi <usmanakinyemi202@gmail.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
We often name functions with arbitrary suffixes like `_1` as an
extension of another existing function. This creates confusion and
doesn't provide good clarity into the functions purpose. Let's document
good function naming etiquette in our CodingGuidelines.
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Fix typos and grammar in documentation, comments, etc.
Via codespell.
Reported-by: Kristoffer Haugsbakk <kristofferhaugsbakk@fastmail.com>
Signed-off-by: Andrew Kreimer <algonell@gmail.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
`git mv a/a.txt a b/` is a nonsense instruction. Instead of failing
gracefully the command trips over itself,[1] leaving behind unfinished
work:
1. first it moves `a/a.txt` to `b/a.txt`; then
2. tries to move `a/`, including `a/a.txt`; then
3. figures out that it’s in a bad state (assertion); and finally
4. aborts.
Now you’re left with a partially-updated index.
The command should instead fail gracefully and make no changes to the
index until it knows that it can complete a sensible action.
For now just add a failing test since this has been known about for
a while.[2]
† 1: Caused by a `pos >= 0` assertion
[2]: https://lore.kernel.org/git/d1f739fe-b28e-451f-9e01-3d2e24a0fe0d@app.fastmail.com/
Helped-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Kristoffer Haugsbakk <code@khaugsbakk.name>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
In Perl 5.14, released in May 2011, the r modifier was added to the s///
operator to allow it to return the modified string instead of modifying
the string in place. This allows to write nicer, more succinct code in
several cases, so let's do that here.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Our platform support policy states that we require "versions of
dependencies which are generally accepted as stable and supportable,
e.g., in line with the version used by other long-term-support
distributions". Of Debian, Ubuntu, RHEL, and SLES, the four most common
distributions that provide LTS versions, the version with mainstream
long-term security support with the oldest Perl is 5.26.0 in SLES 15.6.
This is a major upgrade, since Perl 5.8.1, according to the Perl
documentation, was released in September of 2003. It brings a lot of
new features that we can choose to use, such as s///r to return the
modified string, the postderef functionality, and subroutine signatures,
although the latter was still considered experimental until 5.36.
This change was made with the following one-liner, which intentionally
excludes modifying the vendored modules we include to avoid conflicts:
git grep -l 'use 5.008001' | grep -v 'LoadCPAN/' | xargs perl -pi -e 's/use 5.008001/require v5.26/'
Use require instead of use to avoid changing the behavior as the latter
enables features and the former does not.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Our platform support policy states that we require "versions of
dependencies which are generally accepted as stable and supportable,
e.g., in line with the version used by other long-term-support
distributions". Of Debian, Ubuntu, and RHEL, the three most common
distributions that provide LTS versions, the version with mainstream
long-term security support with the oldest libcurl is 7.61.0 in RHEL 8.
Update the documentation to state that this is the new base version for
libcurl. Remove text that is no longer applicable to older versions.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
libcurl 7.56.0 was released in September 2017, which is over seven years
ago, and no major operating system vendor is still providing security
support for it. Debian 10, which is out of mainstream security support,
has supported a newer version, and Ubuntu 20.04 and RHEL 8, which are
still in support, also have a newer version.
Remove the check for this version and use this functionality
unconditionally.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
libcurl 7.53.0 was released in February 2017, which is over seven years
ago, and no major operating system vendor is still providing security
support for it. Debian 10 and Ubuntu 18.04, both of which are out of
mainstream security support, have supported a newer version, and RHEL 8,
which is still in support, also has a newer version.
Remove the check for this version and use this functionality
unconditionally.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
libcurl 7.52.0 was released in August 2017, which is over seven years
ago, and no major operating system vendor is still providing security
support for it. Debian 9 and Ubuntu 18.04, both of which are out of
mainstream security support, have supported a newer version, and RHEL 8,
which is still in support, also has a newer version.
Remove the check for this version and use this functionality
unconditionally.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
libcurl 7.44.0 was released in August 2015, which is over nine years
ago, and no major operating system vendor is still providing security
support for it. Debian 9 and Ubuntu 16.04, both of which are out of
mainstream security support, have supported a newer version, and RHEL 8,
which is still in support, also has a newer version.
Remove the check for this version and use this functionality
unconditionally.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
libcurl 7.43.0 was released in June 2015, which is over nine years
ago, and no major operating system vendor is still providing security
support for it. Debian 9 and Ubuntu 16.04, both of which are out of
mainstream security support, have supported a newer version, and RHEL 8,
which is still in support, also has a newer version.
Remove the check for this version and use this functionality
unconditionally.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
libcurl 7.39.0 was released in November 2014, which is almost ten years
ago, and no major operating system vendor is still providing security
support for it. Debian 9 and Ubuntu 16.04, both of which are out of
mainstream security support, have supported a newer version, and RHEL 8,
which is still in support, also has a newer version.
Remove the check for this version and use this functionality
unconditionally.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
libcurl 7.34.0 was released in December 2013, which is well over ten
years ago, and no major operating system vendor is still providing
security support for it. Debian 8 and Ubuntu 14.04, both of which are
out of mainstream security support, have supported a newer version, and
RHEL 8, which is still in support, also has a newer version.
Remove the check for this version and use this functionality
unconditionally.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
libcurl 7.25.0 was released in March 2012, which is well over ten years
ago, and no major operating system vendor is still providing security
support for it. Debian 8, RHEL 7, and Ubuntu 12.10, all of which are
out of mainstream security support, have all supported a newer version.
Remove the check for this version and use this functionality
unconditionally.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
libcurl 7.21.5 was released in April 2011, which is well over ten years
ago, and no major operating system vendor is still providing security
support for it. Debian 7, RHEL 7, and Ubuntu 12.04, all of which are
out of mainstream security support, have all supported a newer version.
Remove the check for this version and use this functionality
unconditionally.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
This change updates the script to conform to the coding
standards outlined in the Git project's documentation. According to the
guidelines in Documentation/CodingGuidelines under "Redirection
operators", there should be no whitespace after redirection operators.
Signed-off-by: Seyi Kuforiji <kuforiji98@gmail.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
This change updates the script to conform to the coding standards
outlined in the Git project's documentation. According to the guidelines
in Documentation/CodingGuidelines under "Redirection operators", there
should be no whitespace after redirection operators.
Signed-off-by: Seyi Kuforiji <kuforiji98@gmail.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
An extra worktree attached to a repository points at each other to
allow finding the repository from the worktree and vice versa
possible. Turn this linkage to relative paths.
* cw/worktree-relative:
worktree: add test for path handling in linked worktrees
worktree: link worktrees with relative paths
worktree: refactor infer_backlink() to use *strbuf
worktree: repair copied repository and linked worktrees
Fail gracefully instead of crashing when attempting to write the
contents of a corrupt in-core index as a tree object.
* ps/cache-tree-w-broken-index-entry:
unpack-trees: detect mismatching number of cache-tree/index entries
cache-tree: detect mismatching number of index entries
cache-tree: refactor verification to return error codes
The `technical/repository-version.txt` document originally served as the
master list for extensions, requiring that any new extensions be defined
there. However, the `config/extensions.txt` file was introduced later
and has since become the de facto location for describing extensions,
with several extensions listed there but missing from
`repository-version.txt`.
This consolidates all extension definitions into `config/extensions.txt`,
making it the authoritative source for extensions. The references in
`repository-version.txt` are updated to point to `config/extensions.txt`,
and cross-references to related documentation such as
`gitrepository-layout[5]` and `git-config[1]` are added.
Suggested-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Caleb White <cdwhite3@pm.me>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
In pipes, the exit code of a chain of commands is determined by
the final command. In order not to miss the exit code of a failed
Git command, avoid pipes instead write output of Git commands
into a file.
For better debugging experience, instances of "grep" were changed
to "test_grep". "test_grep" provides more context in case of a
failed "grep".
Signed-off-by: Chizoba ODINAKA <chizobajames21@gmail.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
regexec(3) can fail. E.g. on macOS it fails if it is used with an UTF-8
locale to match a valid regex against a buffer containing invalid UTF-8
characters.
git grep has two ways to search for matches in a file: Either it splits
its contents into lines and matches them separately, or it matches the
whole content and figures out line boundaries later. The latter is done
by look_ahead() and it's quicker in the common case where most files
don't contain a match.
Fall back to line-by-line matching if look_ahead() encounters an
regexec(3) error by propagating errors out of patmatch() and bailing out
of look_ahead() if there is one. This way we at least can find matches
in lines that contain only valid characters. That matches the behavior
of grep(1) on macOS.
pcre2match() dies if pcre2_jit_match() or pcre2_match() fail, but since
we use the flag PCRE2_MATCH_INVALID_UTF it handles invalid UTF-8
characters gracefully. So implement the fall-back only for regexec(3)
and leave the PCRE2 matching unchanged.
Reported-by: David Gstir <david@sigma-star.at>
Signed-off-by: René Scharfe <l.s.r@web.de>
Tested-by: David Gstir <david@sigma-star.at>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Use `test_config`.
Remove whitespace after redirect operator.
Reported-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Andrew Kreimer <algonell@gmail.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Correct "expected" to rightly terminate with NUL ie '\0' instead of '0'
which may have been typoed.
We didn't notice this before because the test is run with
"test_expect_failure", meaning the test would have been marked broken
anyways.
Signed-off-by: Kousik Sanagavarapu <five231003@gmail.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
“Trailer” is the preferred nomenclature in this project. Also add a
definite article where I think it makes sense.
As we can see the rest of the document already prefers this term. This
just gets rid of the last stragglers.
Signed-off-by: Kristoffer Haugsbakk <code@khaugsbakk.name>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
The auto-generated headers used by clar are written at configure time
and thus do not get regenerated automatically. Refactor the build
recipes such that we use custom commands instead, which also has the
benefit that we can reuse the same infrastructure as our Makefile.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
The compilation of clar-based unit tests is broken because we do not
add the binary directory into which we generate the "clar-decls.h" and
"clar.suite" files as include directories. Instead, we accidentally set
up the source directory as include directory.
Fix this by including the binary directory instead of the source
directory. Furthermore, set up the include directories as PUBLIC instead
of PRIVATE such that they propagate from "unit-tests.lib" to the
"unit-tests" executable, which needs to include the same directory.
Reported-by: Ed Reel <edreel@gmail.com>
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Extract the script to generate function declarations for the clar unit
testing framework into a standalone script. This is done such that we
can reuse it in other build systems.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
This moves the end-of-line marker out of the captured group, matching
the start-of-line marker and for some reason fixing generation of
"clar-decls.h" on some older, more esoteric platforms.
Signed-off-by: Alejandro R. Sedeño <asedeno@mit.edu>
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Update clar from:
- 1516124 (Merge pull request #97 from pks-t/pks-whitespace-fixes, 2024-08-15).
To:
- 206accb (Merge pull request #108 from pks-t/pks-uclibc-without-wchar, 2024-10-21)
This update includes a bunch of fixes and improvements that we have
discussed in Git when initial support for clar was merged:
- There is a ".editorconfig" file now.
- Compatibility with Windows has been improved so that the clar
compiles on this platform without an issue. This has been tested
with Cygwin, MinGW and Microsoft Visual Studio.
- clar now uses CMake. This does not impact us at all as we wire up
the clar into our own build infrastructure anyway. This conversion
was done such that we can easily run CI jobs against Windows.
- Allocation failures are now checked for consistently.
- We now define feature test macros in "clar.c", which fixes
compilation on some platforms that didn't previously pull in
non-standard functions like lstat(3p) or strdup(3p). This was
reported by a user of OpenSUSE Leap.
- We stop using `struct timezone`, which is undefined behaviour
nowadays and results in a compilation error on some platforms.
- We now use the combination of mktemp(3) and mkdir(3) on SunOS, same
as we do on NonStop.
- We now support uClibc without support for <wchar.h>.
The most important bits here are the improved platform compatibility
with Windows, OpenSUSE, SunOS and uClibc.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
These two commands are similar enough to acknowledge each other on their
documentation pages.
See the previous commit where we discussed that option-less update-ref
does not support updating symbolic refs but symbolic-ref does.
Signed-off-by: Kristoffer Haugsbakk <code@khaugsbakk.name>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Add a paragraph which just emphasizes that the command without any
options does not support refs in the final arguments. This is clear
already from the names `<new-oid>` and `<old-oid>` but the right balance
of redundancy makes documentation robust against stray interpretation.
This is also a good place to mention why `--stdin` has those `symref-*`
commands.
Suggested-by: Bence Ferdinandy <bence@ferdinandy.com>
Signed-off-by: Kristoffer Haugsbakk <code@khaugsbakk.name>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
This paragraph interrupts the flow of the section by going into detail
about what a symbolic ref file is and how it is implemented. It is not
clear what the purpose is since symbolic refs were already mentioned
prior (“possibly dereferencing the symbolic refs”). Worse, it can
confuse the reader about what argument can be a symbolic ref since it
just says “it” and not which of the parameters; in turn the reader can
be lead to try `<new-oid>` and then get a confusing error since
update-ref will just say that it is not a valid SHA1.
gitglossary(7) already documents what a symref is, concretely, and quite
well at that.
Reported-by: Bence Ferdinandy <bence@ferdinandy.com>
Signed-off-by: Kristoffer Haugsbakk <code@khaugsbakk.name>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Move the discussion of file system symbolic links to a new “Notes”
section (inspired by the one in git-symbolic-ref(1)) since this is
mostly of historical note at this point, not something that is needed in
the main section of the documentation.
Signed-off-by: Kristoffer Haugsbakk <code@khaugsbakk.name>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Remove paragraphs which explain that using this command is safer than
echoing the branch name into `HEAD`.
Evoking the echo strategy is wrong now under the reftable backend since
this file does not exist. And the ref file backend majority user base
use porcelain commands to manage `HEAD` unless they are intentionally
poking at the implementation.
Maybe this warning was relevant for the usage patterns when it was
added[1] but now it just takes up space.
† 1: 129056370ab (Add missing documentation., 2005-10-04)
Signed-off-by: Kristoffer Haugsbakk <code@khaugsbakk.name>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
The other paragraphs on options say “With <option>,”. Let’s be uniform.
Also add missing word “that”.
Signed-off-by: Kristoffer Haugsbakk <code@khaugsbakk.name>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
In bd98f9774e (ref-filter.c: filter & format refs in the same callback,
2023-11-14), we have introduced logic into the ref-filter subsystem that
determines whether or not we can output references iteratively instead
of first collecting all references, post-processing them and printing
them once done. This has the advantage that we don't have to store all
refs in memory and, when used with e.g. `--count=1`, that we don't have
to read all refs in the first place.
One restriction we have in place for that is that caller must not ask
for sorted refs, because there is no way to sort the refs without first
reading them all into an array. So the benefits can only be reaped when
explicitly asking for output not to be sorted.
But there is one exception here where we _can_ get away with sorting
refs while streaming: ref backends sort references returned by their
iterators in lexicographic order. So if the following conditions are all
true we can do iterative streaming:
- There must be at most a single sorting specification, as otherwise
we're not using plain lexicographic ordering.
- The sorting specification must use the "refname".
- The sorting specification must not be using any flags, like
case-insensitive sorting.
Now the resulting logic does feel quite fragile overall, which makes me
a bit uneasy. But after thinking about this for a while I couldn't find
any obvious gaps in my reasoning. Furthermore, given that lexicographic
sorting order is the default in git-for-each-ref(1), this is likely to
benefit a whole lot of usecases out there.
The following benchmark executes git-for-each-ref(1) in a crafted repo
with 1 million references:
Benchmark 1: git for-each-ref (revision = HEAD~)
Time (mean ± σ): 6.756 s ± 0.014 s [User: 3.004 s, System: 3.541 s]
Range (min … max): 6.738 s … 6.784 s 10 runs
Benchmark 2: git for-each-ref (revision = HEAD)
Time (mean ± σ): 6.479 s ± 0.017 s [User: 2.858 s, System: 3.422 s]
Range (min … max): 6.450 s … 6.519 s 10 runs
Summary
git for-each-ref (revision = HEAD)
1.04 ± 0.00 times faster than git for-each-ref (revision = HEAD~)
The change results in a slight performance improvement, but nothing that
would really stand out. Something that cannot be seen in the benchmark
though is peak memory usage, which went from 404.5MB to 68.96kB.
A more interesting benchmark is printing a single referenence with
`--count=1`:
Benchmark 1: git for-each-ref --count=1 (revision = HEAD~)
Time (mean ± σ): 6.655 s ± 0.018 s [User: 2.865 s, System: 3.576 s]
Range (min … max): 6.630 s … 6.680 s 10 runs
Benchmark 2: git for-each-ref --count=1 (revision = HEAD)
Time (mean ± σ): 8.6 ms ± 1.3 ms [User: 2.3 ms, System: 6.1 ms]
Range (min … max): 6.7 ms … 14.4 ms 266 runs
Summary
git git for-each-ref --count=1 (revision = HEAD)
770.58 ± 116.19 times faster than git for-each-ref --count=1 (revision = HEAD~)
Whereas we scaled with the number of references before, we now print the
first reference and exit immediately, which provides a massive win.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Used regex to find these typos:
(?<!struct )(?<=\s)([a-z]{1,}) \1(?=\s)
Signed-off-by: Sven Strickroth <email@cs-ware.de>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Notes can be added to a commit using:
- "-m" to provide a message on the command line.
- -C to copy a note from a blob object.
- -F to read the note from a file.
When these options are used, Git does not open an editor,
it simply takes the content provided via these options and
attaches it to the commit as a note.
Improve flexibility to fine-tune the note before finalizing it
by allowing the messages to be prefilled in the editor and edited
after the messages have been provided through -[mF].
Signed-off-by: Abraham Samuel Adekunle <abrahamadekunle50@gmail.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Document how setting of `uploadpack.allowAnySHA1InWant`
influences other `uploadpack` options - `allowTipSHA1InWant`
and `allowReachableSHA1InWant`.
Signed-off-by: Piotr Szlazak <piotr.szlazak@gmail.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
In 42efde4c29 (clang-format: adjust line break penalties, 2017-09-29) we
adjusted the line break penalties to really fine tune what we care about
while doing line breaks. Modify some of those to be more inline with
what we care about in the Git project now.
We need to understand that the values set to penalties in
'.clang-format' are relative to each other and do not hold any absolute
value. The penalty arguments take an 'Unsigned' value, so we have some
liberty over the values we can set.
First, in that commit, we decided, that under no circumstances do we
want to exceed 80 characters. This seems a bit too strict. We do
overshoot this limit from time to time to prioritize readability. So
let's reduce the value for 'PenaltyExcessCharacter' to 10. This means we
that we add a penalty of 10 for each character that exceeds the column
limit. By itself this is enough to restrict to column limit. Tuning
other penalties in relation to this is what is important.
The penalty `PenaltyBreakAssignment` talks about the penalty for
breaking an assignment operator on to the next line. In our project, we
are okay with this, so giving a value of 5, which is below the value for
'PenaltyExcessCharacter' ensures that in the end, even 1 character over
the column limit is not worth keeping an assignment on the same line.
Similarly set the penalty for breaking before the first call parameter
'PenaltyBreakBeforeFirstCallParameter' and the penalty for breaking
comments 'PenaltyBreakComment' and the penalty for breaking string
literals 'PenaltyBreakString' also to 5.
Finally, we really care about not breaking the return type into its own
line and we really care about not breaking before an open parenthesis.
This avoids weird formatting like:
static const struct strbuf *
a_really_really_large_function_name(struct strbuf resolved,
const char *path, int flags)
or
static const struct strbuf *a_really_really_large_function_name(
struct strbuf resolved, const char *path, int flags)
to instead have something more readable like:
static const struct strbuf *a_really_really_large_function_name(struct strbuf resolved,
const char *path,
int flags)
(note: the tabs here have been replaced by spaces for easier reading)
This is done by bumping the values of 'PenaltyReturnTypeOnItsOwnLine'
and 'PenaltyBreakOpenParenthesis' to 300. This is so that we can allow a
few characters above the 80 column limit to make code more readable.
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
On Cygwin, t0301 fails because "git credential-cache exit" returns a
non-zero exit code. What's supposed to happen here is:
1. The client (the "credential-cache" invocation above) connects to a
previously-spawned credential-cache--daemon.
2. The client sends an "exit" command to the daemon.
3. The daemon unlinks the socket and then exits, closing the
descriptor back to the client.
4. The client sees EOF on the descriptor and exits successfully.
That works on most platforms, and even _used_ to work on Cygwin. But
that changed in Cygwin's ef95c03522 (Cygwin: select: Fix FD_CLOSE
handling, 2021-04-06). After that commit, the client sees a read error
with errno set to ECONNABORTED, and it reports the error and dies.
It's not entirely clear if this is a Cygwin bug. It seems that calling
fclose() on the filehandles pointing to the sockets is sufficient to
avoid this error return, even though exiting should in general look the
same from the client's perspective.
However, we can't just call fclose() here. It's important in step 3
above to unlink the socket before closing the descriptor to avoid the
race mentioned by 7d5e9c9849 (credential-cache--daemon: clarify "exit"
action semantics, 2016-03-18). The client will exit as soon as it sees
the descriptor close, and the daemon may or may not have actually
unlinked the socket by then. That makes test code like this:
git credential exit &&
test_path_is_missing .git-credential-cache
racy.
So we probably _could_ fix this by calling:
delete_tempfile(&socket_file);
fclose(in);
fclose(out);
before we exit(). Or by replacing the exit() with a return up the stack,
in which case the fclose() happens as we unwind. But in that case we'd
still need to call delete_tempfile() here to avoid the race.
But simpler still is that we can notice that we already special-case
ECONNRESET on the client side, courtesy of 1f180e5eb9 (credential-cache:
interpret an ECONNRESET as an EOF, 2017-07-27). We can just do the same
thing here (I suspect that prior to the Cygwin commit that introduced
this problem, we were really just seeing ECONNRESET instead of
ECONNABORTED, so the "new" problem is just the switch of the errno
values).
There's loads more debugging in this thread:
https://lore.kernel.org/git/9dc3e85f-a532-6cff-de11-1dfb2e4bc6b6@ramsayjones.plus.com/
but I've tried to summarize the useful bits in this commit message.
[jk: commit message]
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Ramsay Jones <ramsay@ramsayjones.plus.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
"git maintenance start" crashed due to an uninitialized variable
reference, which has been corrected.
* ps/maintenance-start-crash-fix:
builtin/gc: fix crash when running `git maintenance start`
"git rebase --rebase-merges" now uses branch names as labels when
able.
* ng/rebase-merges-branch-name-as-label:
rebase-merges: try and use branch names as labels
rebase-update-refs: extract load_branch_decorations
load_branch_decorations: fix memory leak with non-static filters
Convert the reftable library such that we handle failures with the
new `reftable_buf` interfaces.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
The `stack_filename()` function cannot pass any errors to the caller as it
has a `void` return type. Adapt it and its callers such that we can
handle errors and start handling allocation failures.
There are two interesting edge cases in `reftable_stack_destroy()` and
`reftable_addition_close()`. Both of these are trying to tear down their
respective structures, and while doing so they try to unlink some of the
tables they have been keeping alive. Any earlier attempts to do that may
fail on Windows because it keeps us from deleting such tables while they
are still open, and thus we re-try on close. It's okay and even expected
that this can fail when the tables are still open by another process, so
we handle the allocation failures gracefully and just skip over any file
whose name we couldn't figure out.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
The `reftable_record_key()` function cannot pass any errors to the
caller as it has a `void` return type. Adapt it and its callers such
that we can handle errors and start handling allocation failures.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
The `format_name()` function cannot pass any errors to the caller as it
has a `void` return type. Adapt it and its callers such that we can
handle errors and start handling allocation failures.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Adapt the name of the `strbuf` block source to no longer relate to this
interface, but instead to the `reftable_buf` interface.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Convert the reftable library to use the `reftable_buf` interface instead
of the `strbuf` interface. This is mostly a mechanical change via sed(1)
with some manual fixes where functions for `strbuf` and `reftable_buf`
differ. The converted code does not yet handle allocation failures. This
will be handled in subsequent commits.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Implement a new `reftable_buf` interface that will replace Git's own
`strbuf` interface. This is done due to three reasons:
- The `strbuf` interfaces do not handle memory allocation failures and
instead causes us to die. This is okay in the context of Git, but is
not in the context of the reftable library, which is supposed to be
usable by third-party applications.
- The `strbuf` interface is quite deeply tied into Git, which makes it
hard to use the reftable library as a standalone library. Any
dependent would have to carefully extract the relevant parts of it
to make things work, which is not all that sensible.
- The `strbuf` interface does not use the pluggable allocators that
can be set up via `reftable_set_alloc()`.
So we have good reasons to use our own type, and the implementation is
rather trivial. Implement our own type. Conversion of the reftable
library will be handled in subsequent commits.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
We're about to introduce our own `reftable_buf` type to replace
`strbuf`. One function we'll have to convert is `strbuf_addf()`, which
is used in a handful of places. This function uses `snprintf()`
internally, which makes porting it a bit more involved:
- It is not available on all platforms.
- Some platforms like Windows have broken implementations.
So by using `snprintf()` we'd also push the burden on downstream users
of the reftable library to make available a properly working version of
it.
Most callsites of `strbuf_addf()` are trivial to convert to not using
it. We do end up using `snprintf()` in our unit tests, but that isn't
much of a problem for downstream users of the reftable library.
While at it, remove a useless call to `strbuf_reset()` in
`t_reftable_stack_auto_compaction_with_locked_tables()`. We don't write
to the buffer before this and initialize it with `STRBUF_INIT`, so there
is no need to reset anything.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
We're about to introduce our own `reftable_buf` type to replace
`strbuf`. Get rid of the seldomly-used `strbuf_addbuf()` function such
that we have to reimplement one less function.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Fix typos in documentation, comments, etc.
Via codespell.
Signed-off-by: Andrew Kreimer <algonell@gmail.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Whilst git-shortlog(1) does not explicitly need any repository
information when run without reference to one, it still parses some of
its arguments with parse_revision_opt() which assumes that the hash
algorithm is set. However, in c8aed5e8da (repository: stop setting SHA1
as the default object hash, 2024-05-07) we stopped setting up a default
hash algorithm and instead require commands to set it up explicitly.
This was done for most other commands like in ab274909d4 (builtin/diff:
explicitly set hash algo when there is no repo, 2024-05-07) but was
missed for builtin/shortlog, making git-shortlog(1) segfault outside of
a repository when given arguments like --author that trigger a call to
parse_revision_opt().
Fix this for now by explicitly setting the hash algorithm to SHA1. Also
add a regression test for the segfault.
Thanks-to: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Wolfgang Müller <wolf@oriole.systems>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Remove some complier warnings from msvc in compat/mingw.c for value
truncation from 64 bit to 32 bit integers.
Compiling compat/mingw.c under a 64 bit version of msvc produces
warnings. An "int" is 32 bit, and ssize_t or size_t should be 64 bit
long. Prepare compat/vcbuild/include/unistd.h to have a 64 bit type
_ssize_t, when _WIN64 is defined and 32 bit otherwise.
Further down in this include file, as before, ssize_t is defined as
_ssize_t, if needed.
Use size_t instead of int for all variables that hold the result of
strlen() or wcslen() (which cannot be negative).
Use ssize_t to hold the return value of read().
Signed-off-by: Sören Krecker <soekkle@freenet.de>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Git's fuzz tests are run continuously as part of OSS-Fuzz [1]. Several
additional fuzz tests have been contributed directly to OSS-Fuzz;
however, these tests are vulnerable to bitrot because they are not built
during Git's CI runs, and thus breaking changes are much less likely to
be noticed by Git contributors.
Port one of these tests back to the Git project:
fuzz-url-decode-mem
This test was originally written by Eric Sesterhenn as part of a
security audit of Git [2]. It was then contributed to the OSS-Fuzz repo
in commit c58ac4492 (Git fuzzing: uncomment the existing and add new
targets. (#11486), 2024-02-21) by Jaroslav Lobačevski. I (Josh Steadmon)
have verified with both Eric and Jaroslav that they're OK with moving
this test to the Git project.
[1] https://github.com/google/oss-fuzz
[2] https://ostif.org/wp-content/uploads/2023/01/X41-OSTIF-Gitlab-Git-Security-Audit-20230117-public.pdf
Co-authored-by: Jaroslav Lobačevski <jarlob@gmail.com>
Co-authored-by: Josh Steadmon <steadmon@google.com>
Signed-off-by: Josh Steadmon <steadmon@google.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Git's fuzz tests are run continuously as part of OSS-Fuzz [1]. Several
additional fuzz tests have been contributed directly to OSS-Fuzz;
however, these tests are vulnerable to bitrot because they are not built
during Git's CI runs, and thus breaking changes are much less likely to
be noticed by Git contributors.
Port one of these tests back to the Git project:
fuzz-parse-attr-line
This test was originally written by Eric Sesterhenn as part of a
security audit of Git [2]. It was then contributed to the OSS-Fuzz repo
in commit c58ac4492 (Git fuzzing: uncomment the existing and add new
targets. (#11486), 2024-02-21) by Jaroslav Lobačevski. I (Josh Steadmon)
have verified with both Eric and Jaroslav that they're OK with moving
this test to the Git project.
[1] https://github.com/google/oss-fuzz
[2] https://ostif.org/wp-content/uploads/2023/01/X41-OSTIF-Gitlab-Git-Security-Audit-20230117-public.pdf
Co-authored-by: Jaroslav Lobačevski <jarlob@gmail.com>
Co-authored-by: Josh Steadmon <steadmon@google.com>
Signed-off-by: Josh Steadmon <steadmon@google.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Git's fuzz tests are run continuously as part of OSS-Fuzz [1]. Several
additional fuzz tests have been contributed directly to OSS-Fuzz;
however, these tests are vulnerable to bitrot because they are not built
during Git's CI runs, and thus breaking changes are much less likely to
be noticed by Git contributors.
Port one of these tests back to the Git project:
fuzz-credential-from-url-gently
This test was originally written by Eric Sesterhenn as part of a
security audit of Git [2]. It was then contributed to the OSS-Fuzz repo
in commit c58ac4492 (Git fuzzing: uncomment the existing and add new
targets. (#11486), 2024-02-21) by Jaroslav Lobačevski. I (Josh Steadmon)
have verified with both Eric and Jaroslav that they're OK with moving
this test to the Git project.
[1] https://github.com/google/oss-fuzz
[2] https://ostif.org/wp-content/uploads/2023/01/X41-OSTIF-Gitlab-Git-Security-Audit-20230117-public.pdf
Co-authored-by: Jaroslav Lobačevski <jarlob@gmail.com>
Co-authored-by: Josh Steadmon <steadmon@google.com>
Signed-off-by: Josh Steadmon <steadmon@google.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
The `result` parameter passed to `http_request_reauth()` may either
point to a `struct strbuf` or a `FILE *`, where the `target` parameter
tells us which of either it actually is. To accommodate for both types
the pointer is a `void *`, which we then pass directly to functions
without doing a cast.
This is fine on most platforms, but it breaks on FreeBSD because
`fileno()` is implemented as a macro that tries to directly access the
`FILE *` structure.
Fix this issue by storing the `FILE *` in a local variable before we
pass it on to other functions.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
When not compiling the credential cache we may use a stub function for
`cmd_credential_cache()`. With commit 9b1cb5070f (builtin: add a
repository parameter for builtin functions, 2024-09-13), we have added a
new parameter to all of those top-level `cmd_*()` functions, and did
indeed adapt the non-stubbed-out `cmd_credential_cache()`. But we didn't
adapt the stubbed-out variant, so the code does not compile.
Fix this by adding the missing parameter.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Windows by default has a restriction in place to only allow paths up to
260 characters. This restriction can nowadays be lifted by setting a
registry key, but is still active by default.
In t7300 we have one test that exercises the behaviour of git-clean(1)
with such long paths. Interestingly enough, this test fails on my system
that uses Windows 10 with mingw-w64 installed via MSYS2: instead of
observing ENAMETOOLONG, we observe ENOENT. This behaviour is consistent
across multiple different environments I have tried.
I cannot say why exactly we observe a different error here, but I would
not be surprised if this was either dependent on the Windows version,
the version of MinGW, the current working directory of Git or any kind
of combination of these.
Work around the issue by handling both errors.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Parsing repositories which contain '[::1]' is broken on Cygwin. It seems
as if Cygwin is confusing those as drive letter prefixes or something
like this, but I couldn't deduce the actual root cause.
Mark those tests as broken for now.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Two of our tests in t3404 use indented HERE docs where leading tabs on
some of the lines are actually relevant. The tabs do get removed though,
and we try to fix this up by using sed(1) to replace leading tabs in the
actual output, as well. But macOS 10.15 uses an oldish version of sed(1)
that has BSD lineage, which does not understand "\t", and thus we fail
to strip those leading tabs and fail the test.
Address this issue by using `q_to_tab` such that we do not have to strip
leading tabs from the actual output.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Windows nowadays provides a tar(1) binary in "C:\Windows\system32". This
version of tar(1) doesn't seem to handle the case where directory paths
end with a trailing forward slash. And as we do that in t1401 the result
is that the test fails.
Drop the trailing slash. Other tests that use tar(1) work alright, this
is the only instance where it has been failing.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
In "t/lib-gpg.sh" we set up the "GNUPGHOME" environment variable to
point to a test-specific directory. This is done by using "$PWD/gpghome"
as value, where "$PWD" is the current test's trash directory.
This is broken for MinGW though because "$PWD" will use Windows-style
paths that contain drive letters. What we really want in this context is
a Unix-style path, which we can get by using `$(pwd)` instead. It is
somewhat puzzling that nobody ever hit this issue, but it may easily be
that nobody ever tests on Windows with GnuPG installed, which would make
us skip those tests.
Adapt the code accordingly to fix tests using this library.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
When testing gitweb we set up the CGI script as "gitweb.perl", which is
the source file of the build target "gitweb.cgi". This file doesn't have
a patched shebang and still contains `++REPLACEMENT++` markers, but
things generally work because we replace the configuration with our own
test configuration.
But this only works as long as "$GIT_BUILD_DIR" actually points to the
source tree, because "gitweb.cgi" and "gitweb.perl" happen to sit next
to each other. This is not the case though once you have out-of-tree
builds like with CMake, where the source and built versions live in
different directories. Consequently, "$GIT_BUILD_DIR/gitweb/gitweb.perl"
won't exist there.
While we could ask build systems with out-of-tree builds to instead set
up GITWEB_TEST_INSTALLED, which allows us to override the location of
the script, it goes against the spirit of this environment variable. We
_don't_ want to test against an installed version, we want to use the
version we have just built.
Fix this by using "gitweb.cgi" instead. This means that you cannot run
test scripts without building that file, but in general we do expect
developers to build stuff before they test it anyway.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
The iconv library is used by Git to reencode files, commit messages and
other things. As such it is a rather integral part, but given that many
platforms nowadays use UTF-8 everywhere you can live without support for
reencoding in many situations. It is thus optional to build Git with
iconv, and some of our platforms wired up in "config.mak.uname" disable
it. But while we support building without it, running our test suite
with "NO_ICONV=Yes" causes many test failures.
Wire up a new test prerequisite ICONV that gets populated via our
GIT-BUILD-OPTIONS. Annotate failing tests accordingly.
Note that this commit does not do a deep dive into every single test to
assess whether the failure is expected or not. Most of the tests do
smell like the expected kind of failure though.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
When assembling our LSAN_OPTIONS that configure the leak sanitizer we
end up prepending the string with various different colon-separated
options via calls to `prepend_var`. One of the settings we add is the
path where the sanitizer should store logs, which can be an arbitrary
filesystem path.
Naturally, filesystem paths may contain whitespace characters. And while
it does seem as if we were quoting the value, we use escaped quotes and
consequently split up the value if it does contain spaces. This leads to
the following error in t0000 when having a value with whitespaces:
.../t/test-lib.sh: eval: line 64: unexpected EOF while looking for matching `"'
++ return 1
error: last command exited with $?=1
not ok 5 - subtest: 3 passing tests
The error itself is a bit puzzling at first. The basic problem is that
the code sees the leading escaped quote during eval, but because we
truncate everything after the space character it doesn't see the
trailing escaped quote and thus fails to parse the string.
Properly quote the value to fix the issue while using single-quotes to
quote the inner value passed to eval. The issue can be reproduced by
t0000 with such a path that contains spaces.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
On macOS, fsmonitor can fall into a race condition that results in
a client waiting forever to be notified for an event that have
already happened. This problem has been corrected.
* jk/fsmonitor-event-listener-race-fix:
fsmonitor: initialize fs event listener before accepting clients
simple-ipc: split async server initialization and running
A new configuration variable remote.<name>.serverOption makes the
transport layer act as if the --serverOption=<value> option is
given from the command line.
* xx/remote-server-option-config:
ls-remote: leakfix for not clearing server_options
fetch: respect --server-option when fetching multiple remotes
transport.c:🤝 make use of server options from remote
remote: introduce remote.<name>.serverOption configuration
transport: introduce parse_transport_option() method
Deprecate the "trailer_info" struct name and replace it with
"trailer_block". This is more readable, for two reasons:
1. "trailer_info" on the surface sounds like it's about a single
trailer when in reality it is a collection of one or more trailers,
and
2. the "*_block" suffix is more informative than "*_info", because it
describes a block (or region) of contiguous text which has trailers
in it, which has been parsed into the trailer_block structure.
Rename the
size_t trailer_block_start, trailer_block_end;
members of trailer_info to just "start" and "end". Rename the "info"
pointer to "trailer_block" because it is more descriptive. Update
comments accordingly.
Signed-off-by: Linus Arver <linus@ucla.edu>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Refactor t3404 to replace instances of `test` with `test_line_count()`
for checking line counts. This improves readability and aligns with Git's
current test practices.
Signed-off-by: Usman Akinyemi <usmanakinyemi202@gmail.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
The exit code of the preceding command in a pipe is disregarded. So
if that preceding command is a Git command that fails, the test would
not fail. Instead, by saving the output of that Git command to a file,
and removing the pipe, we make sure the test will fail if that Git
command fails. This particular patch focuses on all `git show` and
some instances of `git cat-file`.
Signed-off-by: Usman Akinyemi <usmanakinyemi202@gmail.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Discussing the desire to make breaking changes, declaring that
breaking changes are made at a certain version boundary, and
recording these decisions in this document, are necessary but not
sufficient. We need to make sure that we can implement, test, and
deploy such impactful changes.
Earlier we considered to guard the breaking changes with a run-time
check of the `feature.git<version>` configuration to allow brave
users and developers to opt into them as early adoptors. But the
engineering cost to support such a run-time switch, covering new and
disappearing git subcommands and how "git help" would adjust the
documentation to the run-time switch, would be unrealistically high
to be worth it.
Formalize the mechanism based on a compile-time switch to allow
early adopters to opt into the breaking change in a version of Git
before the planned version for the breaking change.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The documentation only lists "files" as a possible value, but
"reftable" is also valid.
Signed-off-by: Bence Ferdinandy <bence@ferdinandy.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The clar source file '$(UNIT_TEST_DIR)/clar/clar.c' includes the
generated 'clar.suite', but this dependency is not taken into account by
our Makefile, so that it is possible for a parallel build to fail if
Make tries to build 'clar.o' before 'clar.suite' is generated.
Correctly specify the dependency.
Signed-off-by: Philippe Blain <levraiphilippeblain@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
As part of the effort to get rid of global state due to the global
the_repository variable, replace the_repository with the repository
argument that gets passed down through the builtin function.
The repo might be NULL, but we should be safe in write_archive() because
it detects if we are outside of a repository and calls
setup_git_directory() which will error.
Signed-off-by: John Cai <johncai86@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
As part of the effort to get rid of global state due to the_repository
variable, remove the the_repository with the repository argument that
gets passed down through the builtin function.
Signed-off-by: John Cai <johncai86@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The current code in run_builtin() passes in a repository to the builtin
based on whether cmd_struct's option flag has RUN_SETUP.
This is incorrect, however, since some builtins that only have
RUN_SETUP_GENTLY can potentially take a repository.
setup_git_directory_gently() tells us whether or not a command is being
run inside of a repository.
Use the output of setup_git_directory_gently() to help determine whether
or not there is a repository to pass to the builtin. If not, then we
just pass NULL.
As part of this patch, we need to modify add to check for a NULL repo
before calling repo_git_config(), since add -h can be run outside of a
repository.
Signed-off-by: John Cai <johncai86@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Code clean-up.
* jk/output-prefix-cleanup:
diff: store graph prefix buf in git_graph struct
diff: return line_prefix directly when possible
diff: return const char from output_prefix callback
diff: drop line_prefix_length field
line-log: use diff_line_prefix() instead of custom helper
Use after free and double freeing at the end in "git log -L... -p"
had been identified and fixed.
* ds/line-log-asan-fix:
line-log: protect inner strbuf from free
Doc update to clarify how periodical maintenance are scheduled,
spread across time to avoid thundering hurds.
* sk/doc-maintenance-schedule:
doc: add a note about staggering of maintenance
The reftable library is now prepared to expect that the memory
allocation function given to it may fail to allocate and to deal
with such an error.
* ps/reftable-alloc-failures: (26 commits)
reftable/basics: fix segfault when growing `names` array fails
reftable/basics: ban standard allocator functions
reftable: introduce `REFTABLE_FREE_AND_NULL()`
reftable: fix calls to free(3P)
reftable: handle trivial allocation failures
reftable/tree: handle allocation failures
reftable/pq: handle allocation failures when adding entries
reftable/block: handle allocation failures
reftable/blocksource: handle allocation failures
reftable/iter: handle allocation failures when creating indexed table iter
reftable/stack: handle allocation failures in auto compaction
reftable/stack: handle allocation failures in `stack_compact_range()`
reftable/stack: handle allocation failures in `reftable_new_stack()`
reftable/stack: handle allocation failures on reload
reftable/reader: handle allocation failures in `reader_init_iter()`
reftable/reader: handle allocation failures for unindexed reader
reftable/merged: handle allocation failures in `merged_table_init_iter()`
reftable/writer: handle allocation failures in `reftable_new_writer()`
reftable/writer: handle allocation failures in `writer_index_hash()`
reftable/record: handle allocation failures when decoding records
...
The way AsciiDoc is used for SYNOPSIS part of the manual pages has
been revamped. The sources, at least for the simple cases, got
vastly pleasant to work with.
* ja/doc-synopsis-markup:
doc: apply synopsis simplification on git-clone and git-init
doc: update the guidelines to reflect the current formatting rules
doc: introduce a synopsis typesetting
Fix typos and grammar.
Reported-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Andrew Kreimer <algonell@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We can only check out commits or branches, not refs in general. And the
problem here is if another worktree is using the branch that we want to
check out.
Let’s be more direct and just talk about branches instead of refs.
Also replace “be held” with “in use”. Further, “in use” is not
restricted to a branch being checked out (e.g. the branch could be busy
on a rebase), hence generalize to “or otherwise in use” in the option
description.
Signed-off-by: Kristoffer Haugsbakk <code@khaugsbakk.name>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The function `unbundle_from_file()` has two memory leaks:
- We do not release the `struct bundle_header header` when hitting
errors because we return early without any cleanup.
- We do not release the `struct strbuf bundle_ref` at all.
Plug these leaks by creating a common exit path where both of these
variables are released.
While at it, refactor the code such that the variable assignments do not
happen inside the conditional statement itself according to our coding
style.
Signed-off-by: Toon Claes <toon@iotcl.com>
Acked-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
It was reported on the mailing list that running `git maintenance start`
immediately segfaults starting with b6c3f8e12c (builtin/maintenance: fix
leak in `get_schedule_cmd()`, 2024-09-26). And indeed, this segfault is
trivial to reproduce up to a point where one is scratching their head
why we didn't catch this regression in our test suite.
The root cause of this error is `get_schedule_cmd()`, which does not
populate the `out` parameter in all cases anymore starting with the
mentioned commit. Callers do assume it to always be populated though and
will e.g. call `strvec_split()` on the returned value, which will of
course segfault when the variable is uninitialized.
So why didn't we catch this trivial regression? The reason is that our
tests always set up the "GIT_TEST_MAINT_SCHEDULER" environment variable
via "t/test-lib.sh", which allows us to override the scheduler command
with a custom one so that we don't accidentally modify the developer's
system. But the faulty code where we don't set the `out` parameter will
only get hit in case that environment variable is _not_ set, which is
never the case when executing our tests.
Fix the regression by again unconditionally allocating the value in the
`out` parameter, if provided. Add a test that unsets the environment
variable to catch future regressions in this area.
Reported-by: Shubham Kanodia <shubham.kanodia10@gmail.com>
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We explicitly avoid saying "ref <src>" when introducing the source
side of a refspec, because it can be a fully-spelled hexadecimal
object name, and it also can be a pattern that is not quite a "ref".
But we are loose when we introduce <dst> and say "ref <dst>", even
though it can also be a pattern. Let's omit "ref" also from the
destination side.
Clarify that <src> can be a ref, a (limited glob) pattern, or an
object name.
Even though the very original design of refspec expected that '*'
was used only at the end (e.g., "refs/heads/*" was expected, but not
"refs/heads/*-wip"), the code and its use evolved to handle a single
'*' anywhere in the pattern. Update the text to remove the mention
of "the same prefix". Anything that matches the pattern are named
by such a (limited glob) pattern in <src>.
Also put a bit more stress on the fact that we accept only one '*'
in the pattern by saying "one and only one `*`".
Helped-by: Monika Kairaitytė <monika@kibit.lt>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This test script uses "test - [def]", but when a test fails because
the file passed to it does not exist,
it fails silently without an error message.
Use test_path_* helper functions, which are designed to give better
error messages when their expectations are not met.
I have added a mechanical validation that applies the same transformation
done in this patch, when the test script is passed to a sed script as shown
below.
sed -e 's/^\( *\)test -f /\1test_path_is_file /' \
-e 's/^\( *\)test -d /\1test_path_is_dir /' \
-e 's/^\( *\)test -e /\1test_path_exists /' \
-e 's/^\( *\)! test -[edf] /\1test_path_is_missing /' \
-e 's/^\( *\)test ! -[edf] /\1test_path_is_missing /' \
"$1" >foo.sh
Reviewers can use the sed script to tranform the original test script and
compare the result in foo.sh with the results of applying the patch.
You will see an instance of "!(test -e 3)" which was manually replaced with
""test_path_is_missing 3", and everything else should match.
Careful and deliberate observation was done to check instances where
"test ! - [df] foo" was used in the test script to make sure that the test
instances were expecting foo to EITHER be a file or a directory, and NOT a
possibility of being both as this would make replacing "test ! -f foo" with
"test_path_is_missing foo" unreasonable.
In the tests control flow, foo has been created as EITHER a
reguar file OR a directory and should NOT exist
after "git clean" or "git clean -d", as the case maybe, has been called.
This made it reasonable to replace
"test ! -[df] foo" with "test_path_is_missing foo".
Signed-off-by: Abraham Samuel Adekunle <abrahamadekunle50@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In `loose.c`, we rely on the global variable `the_hash_algo`. As such we
have guarded the file with the 'USE_THE_REPOSITORY_VARIABLE' definition.
Let's derive the hash algorithm from the available repository variable
and remove this guard. This brings us one step closer to removing the
global 'the_repository' variable.
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add jobs that exercise Git on Windows. Unfortunately, building and
especially testing Git on Windows is inherently slower compared to other
Unix-like systems, mostly because spawning processes is way slower. We
thus use the same layout as we use in GitHub Actions, where we have one
build job, and then pass on the resulting build artifacts to ten test
jobs that split up the work across each other.
Unfortunately, the GitLab runners for Windows machines are embarassingly
slow by themselves. So while this strategy leads to around 20 minutes of
build time in GitHub Actions, the same pipeline takes around an hour in
GitLab CI. Still, having late coverage is certainly better than having
none at all.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We're about to add a couple of jobs for Windows. As the Windows runners
are quite slow, we will split those up across two stages: one stage to
build the artifacts, and one stage that runs test slices in parallel.
Introduce stages and "needs" dependencies for the preexisting jobs as a
preparatory step. The stages will lead to a more natural representation
of jobs in the UI, whereas the "needs" dependency ensures that jobs do
not have to wait for all jobs in the preceding stage to finish.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We try to abstract away any differences between different CI platforms
in "ci/lib.sh", such that knowledge specific to e.g. GitHub Actions or
GitLab CI is neatly encapsulated in a single place. Next to some generic
variables, we also set up some variables that are specific to the actual
platform that the CI operates on, e.g. Linux or macOS.
We do not yet support Windows runners on GitLab CI. Unfortunately, those
systems do not use the same "CI_JOB_IMAGE" environment variable as both
Linux and macOS do. Instead, we can use the "OS" variable, which should
have a value of "Windows_NT" on Windows platforms.
Handle the combination of "$OS,$CI_JOB_IMAGE" and introduce support for
Windows.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In order to build and test Git, we have to first set up the Git for
Windows SDK, which contains various required tools and libraries. The
SDK is basically a clone of [1], but that repository is quite large due
to all the binaries it contains. We thus use both shallow clones and
sparse checkouts to speed up the setup. To handle this complexity we use
a GitHub action that is hosted externally at [2].
Unfortunately, this makes it rather hard to reuse the logic for CI
platforms other than GitHub Actions. After chatting with Johannes
Schindelin we came to the conclusion that it would be nice if the Git
for Windows SDK would regularly publish releases that one can easily
download and extract, thus moving all of the complexity into that single
step. Like this, all that a CI job needs to do is to fetch and extract
the resulting archive. This published release comes in the form of a new
"ci-artifacts" tag that gets updated regularly [3].
Implement a new script that knows how to fetch and extract that script
and convert GitHub Actions to use it.
[1]: https://github.com/git-for-windows/git-sdk-64/
[2]: https://github.com/git-for-windows/setup-git-for-windows-sdk/
[3]: https://github.com/git-for-windows/git-sdk-64/releases/tag/ci-artifacts/
Helped-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Windows by default has a restriction in place to only allow paths up to
260 characters. This restriction can nowadays be lifted by setting a
registry key, but is still active by default.
In t7300 we have one test that exercises the behaviour of git-clean(1)
with such long paths. Interestingly enough, this test fails on my system
that uses Windows 10 with mingw-w64 installed via MSYS2: instead of
observing ENAMETOOLONG, we observe ENOENT. This behaviour is consistent
across multiple different environments I have tried.
I cannot say why exactly we observe a different error here, but I would
not be surprised if this was either dependent on the Windows version,
the version of MinGW, the current working directory of Git or any kind
of combination of these.
Work around the issue by handling both errors.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When interactively rebasing merge commits, the commit message is parsed to
extract a probably meaningful label name. For instance if the merge commit
is “Merge branch 'feature0'”, then the rebase script will have thes lines:
```
label feature0
merge -C $sha feature0 # “Merge branch 'feature0'
```
This heuristic fails in the case of octopus merges or when the merge commit
message is actually unrelated to the parent commits.
An example that combines both is:
```
*---. 967bfa4 (HEAD -> integration) Integration
|\ \ \
| | | * 2135be1 (feature2, feat2) Feature 2
| |_|/
|/| |
| | * c88b01a Feature 1
| |/
|/|
| * 75f3139 (feat0) Feature 0
|/
* 25c86d0 (main) Initial commit
```
yields the labels Integration, Integration-2 and Integration-3.
Fix this by using a branch name for each merge commit's parent that is the
tip of at least one branch, and falling back to a label derived from the
merge commit message otherwise.
In the example above, the labels become feat0, Integration and feature2.
Signed-off-by: Nicolas Guichard <nicolas@guichard.eu>
Acked-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Extract load_branch_decorations from todo_list_add_update_ref_commands so
it can be re-used in make_script_with_merges.
Since it can now be called multiple times, use non-static lists and place
it next to load_ref_decorations to re-use the decoration_loaded guard.
Signed-off-by: Nicolas Guichard <nicolas@guichard.eu>
Acked-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
load_branch_decorations calls normalize_glob_ref on each string of filter's
string_lists. This effectively replaces the potentially non-owning char* of
those items with an owning char*.
Set the strdup_string flag on those string_lists.
This was not caught until now because:
- when passing string_lists already with the strdup_string already set, the
behaviour was correct
- when passing static string_lists, the new char* remain reachable until
program exit
Signed-off-by: Nicolas Guichard <nicolas@guichard.eu>
Acked-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The code fetches the submodules remote based on the superproject remote name
instead of the submodule remote name[1].
Instead of grabbing the default remote of the superproject repository, ask
the default remote of the submodule we are going to run 'git fetch' in.
1. https://lore.kernel.org/git/ZJR5SPDj4Wt_gmRO@pweza/
Signed-off-by: Daniel Black <daniel@mariadb.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
• Provide a commit message in the example command.
The command will hang since it is waiting for a commit message on
stdin. Which is usable but not straightforward enough since this is
example code.
• Use `||` directly since that is more straightforward than checking the
last exit status.
Also use `echo` and `exit` since `die` is not defined.
• Expose variable declarations.
Signed-off-by: Kristoffer Haugsbakk <code@khaugsbakk.name>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The synopsis for `git config unset` mentions two positional arguments:
`<name>` and `<value>`. While the first argument is correct, the second
is not. Users are expected to provide the value via `--value=<value>`.
Remove the positional argument. The `--value=<value>` option is already
documented correctly, so this is all we need to do to fix the
documentation.
Signed-off-by: Josh Heinrichs <joshiheinrichs@gmail.com>
Reviewed-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
There's a racy hang in fsmonitor on macOS that we sometimes see in CI.
When we serve a client, what's supposed to happen is:
1. The client thread calls with_lock__wait_for_cookie() in which we
create a cookie file and then wait for a pthread_cond event
2. The filesystem event listener sees the cookie file creation, does
some internal book-keeping, and then triggers the pthread_cond.
But there's a problem: we start the listener that accepts client threads
before we start the fs event thread. So it's possible for us to accept a
client which creates the cookie file and starts waiting before the fs
event thread is initialized, and we miss those filesystem events
entirely. That leaves the client thread hanging forever.
In CI, the symptom is that t9210 (which is testing scalar, which always
enables fsmonitor under the hood) may hang forever in "scalar clone". It
is waiting on "git fetch" which is waiting on the fsmonitor daemon.
The race happens more frequently under load, but you can trigger it
predictably with a sleep like this, which delays the start of the fs
event thread:
--- a/compat/fsmonitor/fsm-listen-darwin.c
+++ b/compat/fsmonitor/fsm-listen-darwin.c
@@ -510,6 +510,7 @@ void fsm_listen__loop(struct fsmonitor_daemon_state *state)
FSEventStreamSetDispatchQueue(data->stream, data->dq);
data->stream_scheduled = 1;
+ sleep(1);
if (!FSEventStreamStart(data->stream)) {
error(_("Failed to start the FSEventStream"));
goto force_error_stop_without_loop;
One solution might be to reverse the order of initialization: start the
fs event thread before we start the thread listening for clients. But
the fsmonitor code explicitly does it in the opposite direction. The fs
event thread wants to refer to the ipc_server_data struct, so we need it
to be initialized first.
A further complication is that we need a signal from the fs event thread
that it is actually ready and listening. And those details happen within
backend-specific fsmonitor code, whereas the initialization is in the
shared code.
So instead, let's use the ipc_server init/start split added in the
previous commit. The generic fsmonitor code will init the ipc_server but
_not_ start it, leaving that to the backend specific code, which now
needs to call ipc_server_start_async() at the right time.
For macOS, that is right after we start the FSEventStream that you can
see in the diff above.
It's not clear to me if Windows suffers from the same problem (and we
simply don't trigger it in CI), or if it is immune. Regardless, the
obvious place to start accepting clients there is right after we've
established the ReadDirectoryChanges watch.
This makes the hangs go away in our macOS CI environment, even when
compiled with the sleep() above.
Helped-by: Koji Nakamaru <koji.nakamaru@gree.net>
Signed-off-by: Jeff King <peff@peff.net>
Acked-by: Koji Nakamaru <koji.nakamaru@gree.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
To start an async ipc server, you call ipc_server_run_async(). That
initializes the ipc_server_data object, and starts all of the threads
running, which may immediately start serving clients.
This can create some awkward timing problems, though. In the fsmonitor
daemon (the sole user of the simple-ipc system), we want to create the
ipc server early in the process, which means we may start serving
clients before the rest of the daemon is fully initialized.
To solve this, let's break run_async() into two parts: an initialization
which allocates all data and spawns the threads (without letting them
run), and a start function which actually lets them begin work. Since we
have two simple-ipc implementations, we have to handle this twice:
- in ipc-unix-socket.c, we have a central listener thread which hands
connections off to worker threads using a work_available mutex. We
can hold that mutex after init, and release it when we're ready to
start.
We do need an extra "started" flag so that we know whether the main
thread is holding the mutex or not (e.g., if we prematurely stop the
server, we want to make sure all of the worker threads are released
to hear about the shutdown).
- in ipc-win32.c, we don't have a central mutex. So we'll introduce a
new startup_barrier mutex, which we'll similarly hold until we're
ready to let the threads proceed.
We again need a "started" flag here to make sure that we release the
barrier mutex when shutting down, so that the sub-threads can
proceed to the finish.
I've renamed the run_async() function to init_async() to make sure we
catch all callers, since they'll now need to call the matching
start_async().
We could leave run_async() as a wrapper that does both, but there's not
much point. There are only two callers, one of which is fsmonitor, which
will want to actually do work between the two calls. And the other is
just a test-tool wrapper.
For now I've added the start_async() calls in fsmonitor where they would
otherwise have happened, so there should be no behavior change with this
patch.
Signed-off-by: Jeff King <peff@peff.net>
Acked-by: Koji Nakamaru <koji.nakamaru@gree.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
A failure scenario reported in an earlier patch series[1] that several
`git worktree` subcommands failed or misbehaved when invoked from within
linked worktrees that used relative paths.
This adds a test that executes a `worktree prune` command inside both an
internally and an externally linked worktree and asserts that the other
worktree was not pruned.
[1]: https://lore.kernel.org/git/CAPig+cQXFy=xPVpoSq6Wq0pxMRCjS=WbkgdO+3LySPX=q0nPCw@mail.gmail.com/
Reported-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Caleb White <cdwhite3@pm.me>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Git currently stores absolute paths to both the main repository and
linked worktrees. However, this causes problems when moving repositories
or working in containerized environments where absolute paths differ
between systems. The worktree links break, and users are required to
manually execute `worktree repair` to repair them, leading to workflow
disruptions. Additionally, mapping repositories inside of containerized
environments renders the repository unusable inside the containers, and
this is not repairable as repairing the worktrees inside the containers
will result in them being broken outside the containers.
To address this, this patch makes Git always write relative paths when
linking worktrees. Relative paths increase the resilience of the
worktree links across various systems and environments, particularly
when the worktrees are self-contained inside the main repository (such
as when using a bare repository with worktrees). This improves
portability, workflow efficiency, and reduces overall breakages.
Although Git now writes relative paths, existing repositories with
absolute paths are still supported. There are no breaking changes
to workflows based on absolute paths, ensuring backward compatibility.
At a low level, the changes involve modifying functions in `worktree.c`
and `builtin/worktree.c` to use `relative_path()` when writing the
worktree’s `.git` file and the main repository’s `gitdir` reference.
Instead of hardcoding absolute paths, Git now computes the relative path
between the worktree and the repository, ensuring that these links are
portable. Locations where these respective file are read have also been
updated to properly handle both absolute and relative paths. Generally,
relative paths are always resolved into absolute paths before any
operations or comparisons are performed.
Additionally, `repair_worktrees_after_gitdir_move()` has been introduced
to address the case where both the `<worktree>/.git` and
`<repo>/worktrees/<id>/gitdir` links are broken after the gitdir is
moved (such as during a re-initialization). This function repairs both
sides of the worktree link using the old gitdir path to reestablish the
correct paths after a move.
The `worktree.path` struct member has also been updated to always store
the absolute path of a worktree. This ensures that worktree consumers
never have to worry about trying to resolve the absolute path themselves.
Signed-off-by: Caleb White <cdwhite3@pm.me>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This lays the groundwork for the next patch, which needs the backlink
returned from infer_backlink() as a `strbuf`. It seemed inefficient to
convert from `strbuf` to `char*` and back to `strbuf` again.
This refactors infer_backlink() to return an integer result and use a
pre-allocated `strbuf` for the inferred backlink path, replacing the
previous `char*` return type and improving efficiency.
Signed-off-by: Caleb White <cdwhite3@pm.me>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Ensure `server_options` is properly cleared using `string_list_clear()`
in `builtin/ls-remote.c:cmd_ls_remote`.
Although we cannot yet enable `TEST_PASSES_SANITIZE_LEAK=true` for
`t/t5702-protocol-v2.sh` due to other existing leaks, this fix ensures
that "git-ls-remote" related server options tests pass the sanitize leak
check:
...
ok 12 - server-options are sent when using ls-remote
ok 13 - server-options from configuration are used by ls-remote
...
Signed-off-by: Xing Xin <xingxin.xx@bytedance.com>
Reviewed-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Fix an issue where server options specified via the command line
(`--server-option` or `-o`) were not sent when fetching from multiple
remotes using Git protocol v2.
To reproduce the issue with a repository containing multiple remotes:
GIT_TRACE_PACKET=1 git -c protocol.version=2 fetch --server-option=demo --all
Observe that no server options are sent to any remote.
The root cause was identified in `builtin/fetch.c:fetch_multiple`, which
is invoked when fetching from more than one remote. This function forks
a `git-fetch` subprocess for each remote but did not include the
specified server options in the subprocess arguments.
This commit ensures that command-line specified server options are
properly passed to each subprocess. Relevant tests have been added.
Signed-off-by: Xing Xin <xingxin.xx@bytedance.com>
Reviewed-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Utilize the `server_options` from the corresponding remote during the
handshake in `transport.c` when Git protocol v2 is detected. This helps
initialize the `server_options` in `transport.h:transport` if no server
options are set for the transport (typically via `--server-option` or
`-o`).
While another potential place to incorporate server options from the
remote is in `transport.c:transport_get`, setting server options for a
transport using a protocol other than v2 could lead to unexpected errors
(see `transport.c:die_if_server_options`).
Relevant tests and documentation have been updated accordingly.
Signed-off-by: Xing Xin <xingxin.xx@bytedance.com>
Reviewed-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Currently, server options for Git protocol v2 can only be specified via
the command line option "--server-option" or "-o", which is inconvenient
when users want to specify a list of default options to send. Therefore,
we are introducing a new configuration to hold a list of default server
options, akin to the `push.pushOption` configuration for push options.
Initially, I named the new configuration `fetch.serverOption` to align
with `push.pushOption`. However, after discussing with Patrick, it was
renamed to `remote.<name>.serverOption` as suggested, because:
1. Server options are designed to be server-specific, making it more
logical to use a per-remote configuration.
2. Using "fetch." prefixed configurations in git-clone or git-ls-remote
seems out of place and inconsistent in design.
The parsing logic for `remote.<name>.serverOption` also relies on
`transport.c:parse_transport_option`, similar to `push.pushOption`, and
they follow the same priority design:
1. Server options set in lower-priority configuration files (e.g.,
/etc/gitconfig or $HOME/.gitconfig) can be overridden or unset in
more specific repository configurations using an empty string.
2. Command-line specified server options take precedence over those from
the configuration.
Server options from configuration are stored to the corresponding
`remote.h:remote` as a new field `server_options`. The field will be
utilized in the subsequent commit to help initialize the
`server_options` of `transport.h:transport`.
And documentation have been updated accordingly.
Helped-by: Patrick Steinhardt <ps@pks.im>
Helped-by: Junio C Hamano <gitster@pobox.com>
Reported-by: Liu Zhongbo <liuzhongbo.6666@bytedance.com>
Signed-off-by: Xing Xin <xingxin.xx@bytedance.com>
Reviewed-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add the `parse_transport_option()` method to parse the `push.pushOption`
configuration. This method will also be used in the next commit to
handle the new `remote.<name>.serverOption` configuration for setting
server options in Git protocol v2.
Signed-off-by: Xing Xin <xingxin.xx@bytedance.com>
Reviewed-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
These links should point to `.html` files, not to `.txt` ones.
Compare also to 4945f046c7f (api docs: link to html version of
api-trace2, 2022-09-16).
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Same as the preceding commit, we unconditionally dereference the index's
cache entries depending on the number of cache-tree entries, which can
lead to a segfault when the cache-tree is corrupted. Fix this bug.
This also makes t4058 pass with the leak sanitizer enabled.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In t4058 we have some tests that exercise git-read-tree(1) when used
with a tree that contains duplicate entries. While the expectation is
that we fail, we ideally should fail gracefully without a segfault.
But that is not the case: we never check that the number of entries in
the cache-tree is less than or equal to the number of entries in the
index. This can lead to an out-of-bounds read as we unconditionally
access `istate->cache[idx]`, where `idx` is controlled by the number of
cache-tree entries and the current position therein. The result is a
segfault.
Fix this segfault by adding a sanity check for the number of index
entries before dereferencing them.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The function `cache_tree_verify()` will `BUG()` whenever it finds that
the cache-tree extension of the index is corrupt. The function is thus
inherently untestable because the resulting call to `abort()` will be
detected by our testing framework and labelled an error. And rightfully
so: it shouldn't ever be possible to hit bugs, as they should indicate a
programming error rather than corruption of on-disk state.
Refactor the function to instead return error codes. This also ensures
that the function can be used e.g. by git-fsck(1) without the whole
process dying. Furthermore, this refactoring plugs some memory leaks
when returning early by creating a common exit path.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
macOS with fsmonitor daemon can hang forever when a submodule is
involved, which has been corrected.
* kn/osx-fsmonitor-with-submodules-fix:
fsmonitor OSX: fix hangs for submodules
Usability improvements for running tests in leak-checking mode.
* jk/test-lsan-improvements:
test-lib: check for leak logs after every test
test-lib: show leak-sanitizer logs on --immediate failure
test-lib: stop showing old leak logs
In 6241ce2170 (refs/reftable: reload locked stack when preparing
transaction, 2024-09-24) we have introduced a new test that exercises
how the reftable backend behaves with many concurrent writers all racing
with each other. This test was introduced after a couple of fixes in
this context that should make concurrent writes behave gracefully. As it
turns out though, Windows systems do not yet handle concurrent writes
properly, as we've got two reports for Cygwin and MinGW failing in this
newly added test.
The root cause of this is how we update the "tables.list" file: when
writing a new stack of tables we first write the data into a lockfile
and then rename that file into place. But Windows forbids us from doing
that rename when the target path is open for reading by another process.
And as the test races both readers and writers with each other we are
quite likely to hit this edge case.
This is not a regression: the logic didn't work before the mentioned
commit, and after the commit it performs well on Linux and macOS, and
the situation on Windows should have at least improved a bit. But the
test shows that we need to put more thought into how to make this work
properly there.
Work around the issue by disabling the test on Windows for now. While at
it, increase the locking timeout to address reported timeouts when using
either the address or memory sanitizer, which also tend to significantly
extend the runtime of this test.
This should be revisited after Git v2.47 is out.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
fsmonitor_classify_path_absolute() expects state->path_gitdir_watch.buf
has no trailing '/' or '.' For a submodule, fsmonitor_run_daemon() sets
the value with trailing "/." (as repo_get_git_dir(the_repository) on
Darwin returns ".") so that fsmonitor_classify_path_absolute() returns
IS_OUTSIDE_CONE.
In this case, fsevent_callback() doesn't update cookie_list so that
fsmonitor_publish() does nothing and with_lock__mark_cookies_seen() is
not invoked.
As with_lock__wait_for_cookie() infinitely waits for state->cookies_cond
that with_lock__mark_cookies_seen() should unlock, the whole daemon
hangs.
Remove trailing "/." from state->path_gitdir_watch.buf for submodules
and add a corresponding test in t7527-builtin-fsmonitor.sh. The test is
disabled for MINGW because hangs treated with this patch occur only for
Darwin and there is no simple way to terminate the win32 fsmonitor
daemon that hangs.
Suggested-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Suggested-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Koji Nakamaru <koji.nakamaru@gree.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When growing the `names` array fails we would end up with a `NULL`
pointer. This causes two problems:
- We would run into a segfault because we try to free names that we
have assigned to the array already.
- We lose track of the old array and cannot free its contents.
Fix this issue by using a temporary variable. Like this we do not
clobber the old array that we tried to reallocate, which will remain
valid when a call to realloc(3P) fails.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The diffopt output_prefix interface makes it the callback's job to
handle ownership of the memory it returns, keeping it valid while
callers are using it and then eventually freeing it when we are done
diffing.
In diff_output_prefix_callback() we handle this with a static strbuf,
effectively "leaking" it when the diff is done (but not triggering any
leak detectors because it's technically still reachable). This has not
been a big problem in practice, but it is a problem for libification:
two diffs running in the same process could stomp on each other's prefix
buffers.
Since we only need the strbuf when we are formatting graph padding, we
can give ownership of the strbuf to the git_graph struct, letting us
free it when that struct is no longer in use.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We may point our output_prefix callback to diff_output_prefix_callback()
in any of these cases:
1. we have a user-provided line_prefix
2. we have a graph prefix to show
3. both (1) and (2)
The function combines the available elements into a strbuf and returns
its pointer.
In the case that we just have the line_prefix, though, there is no need
for the strbuf. We can return the string directly.
This is a minor optimization by itself, but also will allow us to clean
up some memory ownership issues on top.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The diff_options structure has an output_prefix callback for returning a
prefix string, but it does so by returning a pointer to a strbuf.
This makes the interface awkward. There's no reason the callback should
need to use a strbuf, and it creates questions about whether the
ownership of the resulting buffer should be transferred to the caller
(it should not be, but a recent attempt to clean up this code led to a
double-free in some cases).
The one advantage we get is that the strbuf contains a ptr/len pair, so
we could in theory have a prefix with embedded NULs. But we can observe
that none of the existing callbacks would ever produce such a NUL (they
are usually just indentation or graph symbols, and even the
"--line-prefix" option takes a NUL-terminated string).
And anyway, only one caller (the one in log_tree_diff_flush) actually
looks at the strbuf length. In every other case we use a helper function
which discards the length and just returns the NUL-terminated string.
So let's just have the callback return a "const char *" pointer. It's up
to the callbacks themselves if they want to use a strbuf under the hood.
And now the caller in log_tree_diff_flush() can just use the helper
function along with everybody else. That lets us even simplify out the
function pointer check, since the helper returns an empty string
(technically this does mean we'll sometimes issue an empty fputs() call,
but I don't think this code path is hot enough to care about that).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The diff_options structure holds a line_prefix string and an associated
length. But the length is always just the strlen() of the NUL-terminated
string. Let's simplify the code by just storing the string pointer and
assuming it is NUL-terminated when we use it.
This will cause us to compute the string length in a few extra spots,
but I don't think any of these are particularly hot code paths.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Our local output_prefix() is exactly the same as the public
diff_line_prefix() function. Let's just use that one, saving us a little
bit of code.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Part of the maintainer's job is to keep up-to-date and publish the
'amlog' which stores a mapping between a patch's 'Message-Id' e-mail
header and the commit generated by applying said patch.
But our Documentation/howto/maintain-git.txt does not mention the amlog,
or the scripts which exist to help the maintainer keep the amlog
up-to-date.
(This bit me during the first integration round I did as interim
maintainer[1] involved a lot of manual clean-up. More recently it has
come up as part of a research effort to better understand a patch's
lifecycle on the list[2].)
Address this gap by briefly documenting the existence and purpose of the
'post-applypatch' hook in maintaining the amlog entries.
[1]: https://lore.kernel.org/git/Y19dnb2M+yObnftj@nand.local/
[2]: https://lore.kernel.org/git/CAJoAoZ=4ARuH3aHGe5yC_Xcnou_c396q_ZienYPY7YnEzZcyEg@mail.gmail.com/
Suggested-by: Junio C Hamano <gitster@pobox.com>
Helped-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Git maintenance tasks are staggered to a random minute of the hour per
client to avoid thundering herd issues. Updates the doc to add a note
about the same.
Signed-off-by: Shubham Kanodia <shubham.kanodia10@gmail.com>
Acked-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Commit 253ed9ecff (hash.h: scaffolding for _unsafe hashing variants,
2024-09-26) introduced the concept of having two hash algorithms: a safe
and an unsafe one. When the Makefile knobs do not explicitly request an
unsafe one, we fall back to using the safe algorithm.
However, the fallback to do so forgot one case: we should inherit the
NEEDS_CLONE_HELPER flag from the safe variant. Failing to do so means
that we'll end up defining two clone functions (the algorithm specific
one, and the generic one that just calls memcpy). You'll see an error
like this:
$ make OPENSSL_SHA1=1
[...]
sha1/openssl.h:46:29: error: redefinition of ‘openssl_SHA1_Clone’
46 | #define platform_SHA1_Clone openssl_SHA1_Clone
| ^~~~~~~~~~~~~~~~~~
hash.h:83:40: note: in expansion of macro ‘platform_SHA1_Clone’
83 | # define platform_SHA1_Clone_unsafe platform_SHA1_Clone
| ^~~~~~~~~~~~~~~~~~~
hash.h:101:33: note: in expansion of macro ‘platform_SHA1_Clone_unsafe’
101 | # define git_SHA1_Clone_unsafe platform_SHA1_Clone_unsafe
| ^~~~~~~~~~~~~~~~~~~~~~~~~~
hash.h:133:20: note: in expansion of macro ‘git_SHA1_Clone_unsafe’
133 | static inline void git_SHA1_Clone_unsafe(git_SHA_CTX_unsafe *dst,
| ^~~~~~~~~~~~~~~~~~~~~
sha1/openssl.h:37:20: note: previous definition of ‘openssl_SHA1_Clone’ with type ‘void(struct openssl_SHA1_CTX *, const struct openssl_SHA1_CTX *)’
37 | static inline void openssl_SHA1_Clone(struct openssl_SHA1_CTX *dst,
| ^~~~~~~~~~~~~~~~~~
This only matters when compiling with openssl as the "safe" variant,
since it's the only algorithm that requires a clone helper (and even
then, only if you are using openssl 3.0+). And you should never do that,
because it's not safe. But still, the invocation above used to work and
should continue to do so until we decide to require a
collision-detecting variant for the safe algorithm entirely.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The error message added by 296743a7ca (archive: load index before
pathspec checks, 2024-09-21) is misleading: unpack_trees() is not
touching the working tree at all here, but just loading a tree into
the index. Correct it.
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The output_prefix() method in line-log.c may call a function pointer via
the diff_options struct. This function pointer returns a strbuf struct
and then its buffer is passed back. However, that implies that the
consumer is responsible to free the string. This is especially true
because the default behavior is to duplicate the empty string.
The existing functions used in the output_prefix pointer include:
1. idiff_prefix_cb() in diff-lib.c. This returns the data pointer, so
the value exists across multiple calls.
2. diff_output_prefix_callback() in graph.c. This uses a static strbuf
struct, so it reuses buffers across calls. These should not be
freed.
3. output_prefix_cb() in range-diff.c. This is similar to the
diff-lib.c case.
In each case, we should not be freeing this buffer. We can convert the
output_prefix() function to return a const char pointer and stop freeing
the result.
This choice is essentially the opposite of what was done in 394affd46d
(line-log: always allocate the output prefix, 2024-06-07).
This was discovered via 'valgrind' while investigating a public report
of a bug in 'git log --graph -L' [1].
[1] https://github.com/git-for-windows/git/issues/5185
This issue would have been caught by the new test, when Git is compiled
with ASan to catch these double frees.
Co-authored-by: Jeff King <peff@peff.net>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Since DEVELOPER=YesPlease build enables -Wunused-parameter warnings
these days, the fallback definition for reencode_string_len() that
did not touch any of its parameters but one needs to be annotated
properly.
Signed-off-by: Mike Hommey <mh@glandium.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The reftable library uses pluggable allocators, which means that we
shouldn't ever use the standard allocator functions. But it is an easy
mistake to make to accidentally use e.g. free(3P) instead of the
reftable-specific `reftable_free()` function, and we do not have any
mechanism to detect this misuse right now.
Introduce a couple of macros that ban the standard allocators, similar
to how we do it in "banned.h".
Note that we do not ban the following two classes of functions:
- Macros like `FREE_AND_NULL()` or `REALLOC_ARRAY()`. As those expand
to code that contains already-banned functions we'd get a compiler
error even without banning those macros explicitly.
- Git-specific allocators like `xmalloc()` and friends. The primary
reason is that there are simply too many of them, so we're rather
aiming for best effort here. Furthermore, the eventual goal is to
make them unavailable in the reftable library place by not pulling
them in via "git-compat-utils.h" anymore.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We have several calls to `FREE_AND_NULL()` in the reftable library,
which of course uses free(3P). As the reftable allocators are pluggable
we should rather call the reftable specific function, which is
`reftable_free()`.
Introduce a new macro `REFTABLE_FREE_AND_NULL()` and adapt the callsites
accordingly.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
There are a small set of calls to free(3P) in the reftable library. As
the reftable allocators are pluggable we should rather call the reftable
specific function, which is `reftable_free()`.
Convert the code accordingly.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Handle trivial allocation failures in the reftable library and its unit
tests.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The tree interfaces of the reftable library handle both insertion and
searching of tree nodes with a single function, where the behaviour is
altered between the two via an `insert` bit. This makes it quit awkward
to handle allocation failures because on inserting we'd have to check
for `NULL` pointers and return an error, whereas on searching entries we
don't have to handle it as an allocation error.
Split up concerns of this function into two separate functions, one for
inserting entries and one for searching entries. This makes it easy for
us to check for allocation errors as `tree_insert()` should never return
a `NULL` pointer now. Adapt callers accordingly.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Handle allocation failures when adding entries to the pqueue. Adapt its
only caller accordingly.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Handle allocation failures in `block_writer_init()` and
`block_reader_init()`. This requires us to bubble up error codes into
`writer_reinit_block_writer()`. Adapt call sites accordingly.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Handle allocation failures in `new_indexed_table_ref_iter()`. While at
it, rename the function to match our coding style.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Handle allocation failures in `reftable_stack_auto_compact()`.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Handle allocation failures in `reftable_stack_reload_once()`.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Handle allocation failures in `reader_init_iter()`. This requires us to
also adapt `reftable_reader_init_*_iterator()` to bubble up the new
error codes. Adapt callers accordingly.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Handle allocation failures when creating unindexed readers.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Handle allocation failures in `merged_table_init_iter()`. While at it,
merge `merged_iter_init()` into the function. It only has a single
caller and merging them makes it easier to handle allocation failures
consistently.
This change also requires us to adapt `reftable_stack_init_*_iterator()`
to bubble up the new error codes of `merged_table_iter_init()`. Adapt
callsites accordingly.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Handle allocation failures in `reftable_new_writer()`. Adapt the
function to return an error code to return such failures. While at it,
rename it to match our code style as we have to touch up every callsite
anyway.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Handle allocation errors in `writer_index_hash()`. Adjust its only
caller in `reftable_writer_add_ref()` accordingly.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Handle allocation failures when decoding records. While at it, fix some
error codes to be `REFTABLE_FORMAT_ERROR`.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Handle allocation failures when copying records. While at it, convert
from `xstrdup()` to `reftable_strdup()`. Adapt callsites to check for
error codes.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Handle allocation failures in `parse_names()` by returning `NULL` in
case any allocation fails. While at it, refactor the function to return
the array directly instead of assigning it to an out-pointer.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Handle allocation failures in `reftable_calloc()`.
While at it, remove our use of `st_mult()` that would cause us to die on
an overflow. From the caller's point of view there is not much of a
difference between arguments that are too large to be multiplied and a
request that is too big to handle by the allocator: in both cases the
allocation cannot be fulfilled. And in neither of these cases do we want
the reftable library to die.
While we could use `unsigned_mult_overflows()` to handle the overflow
gracefully, we instead open-code it to further our goal of converting
the reftable codebase to become a standalone library that can be reused
by external projects.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The reftable library provides the ability to swap out allocators. There
is a gap here though, because we continue to use `xstrdup()` even in the
case where all the other allocators have been swapped out.
Introduce `reftable_strdup()` that uses `reftable_malloc()` to do the
allocation.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The split between "basics" and "publicbasics" is somewhat arbitrary and
not in line with how we typically structure code in the reftable
library. While we do indeed split up headers into a public and internal
part, we don't do that for the compilation unit itself. Furthermore, the
declarations for "publicbasics.c" are in "reftable-malloc.h", which
isn't in line with our naming schema, either.
Fix these inconsistencies by:
- Merging "publicbasics.c" into "basics.c".
- Renaming "reftable-malloc.h" to "reftable-basics.h" as the public
header.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The reftable library does not use the same memory allocation functions
as the rest of the Git codebase. Instead, as the reftable library is
supposed to be usable as a standalone library without Git, it provides a
set of pluggable memory allocators.
Compared to `xmalloc()` and friends these allocators are _not_ expected
to die when an allocation fails. This design choice is concious, as a
library should leave it to its caller to handle any kind of error. While
it is very likely that the caller cannot really do much in the case of
an out-of-memory situation anyway, we are not the ones to make that
decision.
Curiously though, we never handle allocation errors even though memory
allocation functions are allowed to fail. And as we do not plug in Git's
memory allocator via `reftable_set_alloc()` either the consequence is
that we'd instead segfault as soon as we run out of memory.
While the easy fix would be to wire up `xmalloc()` and friends, it
would only fix the usage of the reftable library in Git itself. Other
users like libgit2, which is about to revive its efforts to land a
backend for reftables, wouldn't be able to benefit from this solution.
Instead, we are about to do it the hard way: adapt all allocation sites
to perform error checking. Introduce a new error code for out-of-memory
errors that we will wire up in subsequent steps.
This commit also serves as the motivator for all the remaining steps in
this series such that we do not have to repeat the same arguments in
every single subsequent commit.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The checksum at the tail of files are now computed without
collision detection protection. This is safe as the consumer of
the information to protect itself from replay attacks checks for
hash collisions independently.
* tb/weak-sha1-for-tail-sum:
csum-file.c: use unsafe SHA-1 implementation when available
Makefile: allow specifying a SHA-1 for non-cryptographic uses
hash.h: scaffolding for _unsafe hashing variants
sha1: do not redefine `platform_SHA_CTX` and friends
pack-objects: use finalize_object_file() to rename pack/idx/etc
finalize_object_file(): implement collision check
finalize_object_file(): refactor unlink_or_warn() placement
finalize_object_file(): check for name collision before renaming
Leakfixes.
* jk/http-leakfixes: (28 commits)
http-push: clean up local_refs at exit
http-push: clean up loose request when falling back to packed
http-push: clean up objects list
http-push: free xml_ctx.cdata after use
http-push: free remote_ls_ctx.dentry_name
http-push: free transfer_request strbuf
http-push: free transfer_request dest field
http-push: free curl header lists
http-push: free repo->url string
http-push: clear refspecs before exiting
http-walker: free fake packed_git list
remote-curl: free HEAD ref with free_one_ref()
http: stop leaking buffer in http_get_info_packs()
http: call git_inflate_end() when releasing http_object_request
http: fix leak of http_object_request struct
http: fix leak when redacting cookies from curl trace
transport-helper: fix leak of dummy refs_list
fetch-pack: clear pack lockfiles list
fetch: free "raw" string when shrinking refspec
transport-helper: fix strbuf leak in push_refs_with_push()
...
When "git sparse-checkout disable" turns a sparse checkout into a
regular checkout, the index is fully expanded. This totally
expected behaviour however had an "oops, we are expanding the
index" advice message, which has been corrected.
* ds/sparse-checkout-expansion-advice:
sparse-checkout: disable advice in 'disable'
In load_cache_entries_threaded(), each thread allocates its own memory
pool. This pool needs to be cleaned up while closing the threads down,
or it will be leaked.
This ce_mem_pool pointer could theoretically be converted to an inline
copy of the struct, but the use of a pointer helps with existing lazy-
initialization logic. Adjusting that behavior only to avoid this pointer
would be a much bigger change.
Signed-off-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git --git-dir=nowhere cmd" failed to properly notice that it
wasn't in any repository while processing includeIf.onbranch
configuration and instead crashed.
* ps/includeif-onbranch-cornercase-fix:
config: fix evaluating "onbranch" with nonexistent git dir
t1305: exercise edge cases of "onbranch" includes
Background tasks "git maintenance" runs may need to use credential
information when going over the network, but a credential helper
may work only in an interactive environment, and end up blocking a
scheduled task waiting for UI. Credential helpers can now behave
differently when they are not running interactively.
* ds/background-maintenance-with-credential:
scalar: configure maintenance during 'reconfigure'
maintenance: add custom config to background jobs
credential: add new interactive config option
"git archive" with pathspec magic that uses the attribute
information did not work well, which has been corrected.
* rs/archive-with-attr-pathspec-fix:
archive: load index before pathspec checks
When a subprocess to work in a submodule spawned by "git submodule"
fails with SIGPIPE, the parent Git process caught the death of it,
but gave a generic "failed to work in that submodule", which was
misleading. We now behave as if the parent got SIGPIPE and die.
* pw/submodule-process-sigpipe:
submodule status: propagate SIGPIPE
Give timeout to the locking code to write to reftable.
* ps/reftable-concurrent-writes:
refs/reftable: reload locked stack when preparing transaction
reftable/stack: allow locking of outdated stacks
refs/reftable: introduce "reftable.lockTimeout"
The push reports that report failures to the user when pushing a
reference leak in several places. Plug these leaks by introducing a new
function `ref_push_report_free()` that frees the list of reports and
call it as required. While at it, fix a trivially leaking error string
in the vicinity.
These leaks get hit in t5411, but plugging them does not make the whole
test suite pass.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
While the return parameter of `write_rev_file_order()` is a string
constant, the function may indeed return an allocated string when its
first parameter is a `NULL` pointer. This makes for a confusing calling
convention, where callers need to be aware of these intricate ownership
rules and cast away the constness to free the string in some cases.
Adapt the function and its caller `write_rev_file()` to always return an
allocated string and adapt callers to always free the return value.
Note that this requires us to also adapt `rename_tmp_packfile()`, which
compares the pointers to packfile data with each other. Now that the
path of the reverse index file gets allocated unconditionally the check
will always fail. This is fixed by using strcmp(3P) instead, which also
feels way less fragile.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The `saved_parents` slab is used by `--full-diff` to save parents of a
commit which we are about to rewrite. We do not release its contents
once it's not used anymore, causing a memory leak. Plug it.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Both `rewrite_parents()` and `remove_duplicate_parents()` may end up
dropping some parents from a commit without freeing the respective
`struct commit_list` items. This causes a bunch of memory leaks. Plug
these.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The buffer used to compute the final MIDX name is never released. Plug
this memory leak.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When creating a new pseudo-merge group we collect a set of matchnig
commits and put them into a string map. This strmap is initialized such
that it does not allocate its keys, and instead we try to pass ownership
of the keys to it via `strmap_put()`. This isn't how it works though:
the strmap will never try to release these keys, and consequently they
end up leaking.
Fix this leak by initializing the strmap as duplicating its keys and not
trying to hand over ownership.
The leak is exposed by t5333, but plugging it does not yet make the full
test suite pass.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Fix various memory leaks hit by the pseudo-merge machinery. These leaks
are exposed by t5333, but plugging them does not yet make the whole test
suite pass.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
As described in "line-log.c" itself, the code is "leaking like a sieve".
These leaks are all of rather trivial nature, so this commit plugs them
without going too much into details for each of those leaks.
The leaks are hit by t4211, but plugging them alone does not make the
full test suite pass. The remaining leaks are unrelated to the line-log
subsystem.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The lifecycle management of diff queues is somewhat confusing:
- For most of the part this can be attributed to `DIFF_QUEUE_CLEAR()`,
which does not release any memory but rather initializes the queue,
only. This is in contrast to our common naming schema, where
"clearing" means that we release underlying memory and then
re-initialize the data structure such that it is ready to use.
- A second offender is `diff_free_queue()`, which does not free the
queue structure itself. It is rather a release-style function.
Refactor the code to make things less confusing. `DIFF_QUEUE_CLEAR()` is
replaced by `DIFF_QUEUE_INIT` and `diff_queue_init()`, while
`diff_free_queue()` is replaced by `diff_queue_release()`. While on it,
adapt callsites where we call `DIFF_QUEUE_CLEAR()` with the intent to
release underlying memory to instead call `diff_queue_clear()` to fix
memory leaks.
This memory leak is exposed by t4211, but plugging it alone does not
make the whole test suite pass.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We leak the config values when `gpg_sign` or `strategy` options are
being overridden via the command line. To fix this we need to free the
old value, which requires us to figure out whether the value was changed
via an option in the first place. The easy way to do this, which is to
initialize local variables with `NULL`, doesn't work because we cannot
tell the case where the user has passed e.g. `--no-gpg-sign`. Instead,
we use a sentinel value for both values that we can compare against to
check whether the user has passed the option.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We initialize but never clear a repository in the partial-clone test
helper. Plug this leak.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When cloning with bundle URIs we re-initialize `the_repository` after
having fetched the bundle. This causes a bunch of memory leaks though
because we do not release its previous state.
These leaks can be plugged by calling `repo_clear()` before we call
`repo_init()`. But this causes another issue because the remote that we
used is tied to the lifetime of the repository's remote state, which
would also get released. We thus have to make sure that it does not get
free'd under our feet.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
There are various different memory leaks in git-pack-redundant(1),
mostly caused by not even trying to free allocated memory. Fix them.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The `OPT_PATHSPEC_FROM_FILE()` option maps to `OPT_FILENAME()`, which we
know will always allocate memory when passed. We never free the memory
though, causing a memory leak. Plug it.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The submodule entry list returned by `submodules_of_tree()` is never
completely free'd by its only caller. Introduce a new function that
free's the list for us and call it.
While at it, also fix the leaking `branch_point` string.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When hitting a sparse directory in `wt_status_collect_changes_initial()`
we use a `struct strbuf` to assemble the directory's name. We never free
that buffer though, causing a memory leak.
Fix the leak by releasing the buffer. While at it, move the buffer
outside of the loop and reset it to save on some wasteful allocations.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
There are two memory leaks in "shell.c". The first one in `run_shell()`
is trivial and fixed without further explanation. The second one in
`cmd_main()` happens because we overwrite the `prog` variable, which
contains an allocated string. In fact though, the memory pointed to by
that variable is still in use because we use `split_cmdline()`, which
may create pointers into the middle of that string. But as we do not
have a direct pointer to the head of the allocated string anymore, we
get a complaint by the leak checker.
Address this by not overwriting the `prog` pointer.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In the scalar code we iterate through multiple repositories,
initializing each of them. We never clear them though, causing memory
leaks. Plug them.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When writing an index with the EOIE extension we allocate a separate
hash context. We never free that context though, causing a memory leak.
Plug it.
This leak is exposed by t9210, but plugging it alone does not make the
whole test suite pass.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We're leaking the args vector in git-annotate(1) because we never clear
it. Fixing it isn't as easy as calling `strvec_clear()` though because
calling `cmd_blame()` will cause the underlying array to be modified.
Instead, we also need to pass a shallow copy of the argv array to the
function.
Do so to plug the memory leaks.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* jk/http-leakfixes: (28 commits)
http-push: clean up local_refs at exit
http-push: clean up loose request when falling back to packed
http-push: clean up objects list
http-push: free xml_ctx.cdata after use
http-push: free remote_ls_ctx.dentry_name
http-push: free transfer_request strbuf
http-push: free transfer_request dest field
http-push: free curl header lists
http-push: free repo->url string
http-push: clear refspecs before exiting
http-walker: free fake packed_git list
remote-curl: free HEAD ref with free_one_ref()
http: stop leaking buffer in http_get_info_packs()
http: call git_inflate_end() when releasing http_object_request
http: fix leak of http_object_request struct
http: fix leak when redacting cookies from curl trace
transport-helper: fix leak of dummy refs_list
fetch-pack: clear pack lockfiles list
fetch: free "raw" string when shrinking refspec
transport-helper: fix strbuf leak in push_refs_with_push()
...
Update hashwrite() and friends to use the unsafe_-variants of hashing
functions, calling for e.g., "the_hash_algo->unsafe_update_fn()" instead
of "the_hash_algo->update_fn()".
These callers only use the_hash_algo to produce a checksum, which we
depend on for data integrity, but not for cryptographic purposes, so
these callers are safe to use the unsafe (non-collision detecting) SHA-1
implementation.
To time this, I took a freshly packed copy of linux.git, and ran the
following with and without the OPENSSL_SHA1_UNSAFE=1 build-knob. Both
versions were compiled with -O3:
$ git for-each-ref --format='%(objectname)' refs/heads refs/tags >in
$ valgrind --tool=callgrind ~/src/git/git-pack-objects \
--revs --stdout --all-progress --use-bitmap-index <in >/dev/null
Without OPENSSL_SHA1_UNSAFE=1 (that is, using the collision-detecting
SHA-1 implementation for both cryptographic and non-cryptographic
purposes), we spend a significant amount of our instruction count in
hashwrite():
$ callgrind_annotate --inclusive=yes | grep hashwrite | head -n1
159,998,868,413 (79.42%) /home/ttaylorr/src/git/csum-file.c:hashwrite [/home/ttaylorr/src/git/git-pack-objects]
, and the resulting "clone" takes 19.219 seconds of wall clock time,
18.94 seconds of user time and 0.28 seconds of system time.
Compiling with OPENSSL_SHA1_UNSAFE=1, we spend ~60% fewer instructions
in hashwrite():
$ callgrind_annotate --inclusive=yes | grep hashwrite | head -n1
59,164,001,176 (58.79%) /home/ttaylorr/src/git/csum-file.c:hashwrite [/home/ttaylorr/src/git/git-pack-objects]
, and generate the resulting "clone" much faster, in only 11.597 seconds
of wall time, 11.37 seconds of user time, and 0.23 seconds of system
time, for a ~40% speed-up.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Introduce _UNSAFE variants of the OPENSSL_SHA1, BLK_SHA1, and
APPLE_COMMON_CRYPTO_SHA1 compile-time knobs which indicate which SHA-1
implementation is to be used for non-cryptographic uses.
There are a couple of small implementation notes worth mentioning:
- There is no way to select the collision detecting SHA-1 as the
"fast" fallback, since the fast fallback is only for
non-cryptographic uses, and is meant to be faster than our
collision-detecting implementation.
- There are no similar knobs for SHA-256, since no collision attacks
are presently known and thus no collision-detecting implementations
actually exist.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Git's default SHA-1 implementation is collision-detecting, which hardens
us against known SHA-1 attacks against Git objects. This makes Git
object writes safer at the expense of some speed when hashing through
the collision-detecting implementation, which is slower than
non-collision detecting alternatives.
Prepare for loading a separate "unsafe" SHA-1 implementation that can be
used for non-cryptographic purposes, like computing the checksum of
files that use the hashwrite() API.
This commit does not actually introduce any new compile-time knobs to
control which implementation is used as the unsafe SHA-1 variant, but
does add scaffolding so that the "git_hash_algo" structure has five new
function pointers which are "unsafe" variants of the five existing
hashing-related function pointers:
- git_hash_init_fn unsafe_init_fn
- git_hash_clone_fn unsafe_clone_fn
- git_hash_update_fn unsafe_update_fn
- git_hash_final_fn unsafe_final_fn
- git_hash_final_oid_fn unsafe_final_oid_fn
The following commit will introduce compile-time knobs to specify which
SHA-1 implementation is used for non-cryptographic uses.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Our in-tree SHA-1 wrappers all define platform_SHA_CTX and related
macros to point at the opaque "context" type, init, update, and similar
functions for each specific implementation.
In hash.h, we use these platform_ variables to set up the function
pointers for, e.g., the_hash_algo->init_fn(), etc.
But while these header files have a header-specific macro that prevents
them declaring their structs / functions multiple times, they
unconditionally define the platform variables, making it impossible to
load multiple SHA-1 implementations at once.
As a prerequisite for loading a separate SHA-1 implementation for
non-cryptographic uses, only define the platform_ variables if they have
not already been defined.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In most places that write files to the object database (even packfiles
via index-pack or fast-import), we use finalize_object_file(). This
prefers link()/unlink() over rename(), because it means we will prefer
data that is already in the repository to data that we are newly
writing.
We should do the same thing in pack-objects. Even though we don't think
of it as accepting outside data (and thus not being susceptible to
collision attacks), in theory a determined attacker could present just
the right set of objects to cause an incremental repack to generate
a pack with their desired hash.
This has some test and real-world fallout, as seen in the adjustment to
t5303 below. That test script assumes that we can "fix" corruption by
repacking into a good state, including when the pack generated by that
repack operation collides with a (corrupted) pack with the same hash.
This violates our assumption from the previous adjustments to
finalize_object_file() that if we're moving a new file over an existing
one, that since their checksums match, so too must their contents.
This makes "fixing" corruption like this a more explicit operation,
since the test (and users, who may fix real-life corruption using a
similar technique) must first move the broken contents out of the way.
Note also that we now call adjust_shared_perm() twice. We already call
adjust_shared_perm() in stage_tmp_packfiles(), and now call it again in
finalize_object_file(). This is somewhat wasteful, but cleaning up the
existing calls to adjust_shared_perm() is tricky (because sometimes
we're writing to a tmpfile, and sometimes we're writing directly into
the final destination), so let's tolerate some minor waste until we can
more carefully clean up the now-redundant calls.
Co-authored-by: Jeff King <peff@peff.net>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We've had "FIXME!!! Collision check here ?" in finalize_object_file()
since aac1794132 (Improve sha1 object file writing., 2005-05-03). That
is, when we try to write a file with the same name, we assume the
on-disk contents are the same and blindly throw away the new copy.
One of the reasons we never implemented this is because the files it
moves are all named after the cryptographic hash of their contents
(either loose objects, or packs which have their hash in the name these
days). So we are unlikely to see such a collision by accident. And even
though there are weaknesses in sha1, we assume they are mitigated by our
use of sha1dc.
So while it's a theoretical concern now, it hasn't been a priority.
However, if we start using weaker hashes for pack checksums and names,
this will become a practical concern. So in preparation, let's actually
implement a byte-for-byte collision check.
The new check will cause the write of new differing content to be a
failure, rather than a silent noop, and we'll retain the temporary file
on disk. If there's no collision present, we'll clean up the temporary
file as usual after either rename()-ing or link()-ing it into place.
Note that this may cause some extra computation when the files are in
fact identical, but this should happen rarely.
Loose objects are exempt from this check, and the collision check may be
skipped by calling the _flags variant of this function with the
FOF_SKIP_COLLISION_CHECK bit set. This is done for a couple of reasons:
- We don't treat the hash of the loose object file's contents as a
checksum, since the same loose object can be stored using different
bytes on disk (e.g., when adjusting core.compression, using a
different version of zlib, etc.).
This is fundamentally different from cases where
finalize_object_file() is operating over a file which uses the hash
value as a checksum of the contents. In other words, a pair of
identical loose objects can be stored using different bytes on disk,
and that should not be treated as a collision.
- We already use the path of the loose object as its hash value /
object name, so checking for collisions at the content level doesn't
add anything.
Adding a content-level collision check would have to happen at a
higher level than in finalize_object_file(), since (avoiding race
conditions) writing an object loose which already exists in the
repository will prevent us from even reaching finalize_object_file()
via the object freshening code.
There is a collision check in index-pack via its `check_collision()`
function, but there isn't an analogous function in unpack-objects,
which just feeds the result to write_object_file().
So skipping the collision check here does not change for better or
worse the hardness of loose object writes.
As a small note related to the latter bullet point above, we must teach
the tmp-objdir routines to similarly skip the content-level collision
checks when calling migrate_one() on a loose object file, which we do by
setting the FOF_SKIP_COLLISION_CHECK bit when we are inside of a loose
object shard.
Co-authored-by: Jeff King <peff@peff.net>
Signed-off-by: Jeff King <peff@peff.net>
Helped-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
As soon as we've tried to link() a temporary object into place, we then
unlink() the tempfile immediately, whether we were successful or not.
For the success case, this is because we no longer need the old file
(it's now linked into place).
For the error case, there are two outcomes. Either we got EEXIST, in
which case we consider the collision to be a noop. Or we got a system
error, in which we case we are just cleaning up after ourselves.
Using a single line for all of these cases has some problems:
- in the error case, our unlink() may clobber errno, which we use in
the error message
- for the collision case, there's a FIXME that indicates we should do
a collision check. In preparation for implementing that, we'll need
to actually hold on to the file.
Split these three cases into their own calls to unlink_or_warn(). This
is more verbose, but lets us do the right thing in each case.
Co-authored-by: Jeff King <peff@peff.net>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We prefer link()/unlink() to rename() for object files, with the idea
that we should prefer the data that is already on disk to what is
incoming. But we may fall back to rename() if the user has configured us
to do so, or if the filesystem seems not to support cross-directory
links. This loses the "prefer what is on disk" property.
We can mitigate this somewhat by trying to stat() the destination
filename before doing the rename. This is racy, since the object could
be created between the stat() and rename() calls. But in practice it is
expanding the definition of "what is already on disk" to be the point
that the function is called. That is enough to deal with any potential
attacks where an attacker is trying to collide hashes with what's
already in the repository.
Co-authored-by: Jeff King <peff@peff.net>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When merging file pairs after they have been broken up we queue a new
file pair and discard the broken-up ones. The newly-queued file pair
reuses one filespec of the broken up pairs each, where the respective
other filespec gets discarded. But we only end up freeing the filespec's
data, not the filespec itself, and thus leak memory.
Fix these leaks by using `free_filespec()` instead.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When simplifying commits, e.g. because they are treesame with their
parents, we unset the commit's parent pointers but never free them. Plug
the resulting memory leaks.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The `get_schedule_cmd()` function allows us to override the schedule
command with a specific test command such that we can verify the
underlying logic in a platform-independent way. Its memory management is
somewhat wild though, because it basically gives up and assigns an
allocated string to the string constant output pointer. While this part
is marked with `UNLEAK()` to mask this, we also leak the local string
lists.
Rework the function such that it has a separate out parameter. If set,
we will assign it the final allocated command. Plug the other memory
leaks and create a common exit path.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When parsing the maintenance strategy from config we allocate a config
string, but do not free it after parsing it. Plug this leak by instead
using `git_config_get_string_tmp()`, which does not allocate any memory.
This leak is exposed by t7900, but plugging it alone does not make the
test suite pass.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The partial clone filter of a promisor remote is never free'd, causing
memory leaks. Furthermore, in case multiple partial clone filters are
defined for the same remote, we'd overwrite previous values without
freeing them.
Fix these leaks.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When creating a pattern via `create_grep_pat()` we allocate the pattern
member of the structure regardless of the token type. But later, when we
try to free the structure, we free the pattern member conditionally on
the token type and thus leak memory.
Plug this leak. The leak is exposed by t7814, but plugging it alone does
not make the whole test suite pass.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In `add_submodule_odb_by_path()` we add a path into a global string
list. The list is initialized with `NODUP`, which means that we do not
pass ownership of strings to the list. But we use `xstrdup()` when we
insert a path, with the consequence that the string will never get
free'd.
Plug the leak by marking the list as `DUP`. There is only a single
callsite where we insert paths anyway, and as explained above that
callsite was mishandling the allocation.
This leak is exposed by t7814, but plugging it does not make the whole
test suite pass.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Each thread may have a specific context in the trace2 subsystem that we
set up via thread-local storage. We do not set up a destructor for this
data though, which means that the context data will leak.
Plug this leak by installing a destructor. This leak is exposed by
t7814, but plugging it alone does not make the whole test suite pass.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
There are several leaking data structures in git-difftool(1). Plug them.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When repacking, we assemble git-pack-objects(1) arguments both for the
"normal" pack and for the cruft pack. This configuration gets populated
with a bunch of `OPT_PASSTHRU` options that we end up passing to the
child process. These options are allocated, but never free'd.
Create a new `pack_objects_args_release()` function that releases the
memory for us and call it for both sets of options.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In `prepare_order()` we parse an orderfile and assign it to a global
array. In order to save on some allocations, we replace newlines with
NUL characters and then assign pointers into the allocated buffer to
that array. This can cause the buffer to be completely unreferenced
though in some cases, e.g. because the order file is empty or because we
had to use `xmemdupz()` to copy the lines instead of NUL-terminating
them.
Refactor the code to always `xmemdupz()` the strings. This is a bit
simpler, and it is rather unlikely that saving a handful of allocations
really matters. This allows us to release the string buffer and thus
plug the memory leak.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The `OPTION_FILENAME` option always assigns either an allocated string
or `NULL` to the value. In case it is passed multiple times it does not
know to free the previous value though, which causes a memory leak.
Refactor the function to always free the previous value. None of the
sites where this option is used pass a string constant, so this change
is safe.
While at it, fix the argument of `fix_filename()` to be a string
constant. The only reason why it's not is because we use it as an
in-out-parameter, where the input is a constant and the output is not.
This is weird and unnecessary, as we can just return the result instead
of using the parameter for this.
This leak is being hit in t7621, but plugging it alone does not make the
test suite pass.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The `orderfile` diff option is being assigned via `OPT_FILENAME()`,
which assigns an allocated string to the variable. We never free it
though, causing a memory leak.
Change the type of the string to `char *` and free it to plug the leak.
This also requires us to use `xstrdup()` to assign the global config to
it in case it is set.
This leak is being hit in t7621, but plugging it alone does not make the
test suite pass.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The `opt_ff` field gets populated either via `OPT_PASSTHRU` via
`config_get_ff()` or when `--rebase` is passed. So we sometimes end up
overriding the value in `opt_ff` with another value, but we do not free
the old value, causing a memory leak.
Adapt the type of the variable to be `char *` and consistently assign
allocated strings to it such that we can easily free it when it is being
overridden.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In `treat_directory()` we perform some logic to handle ignored and
untracked entries. When populating a directory with entries we first
save the current number of ignored/untracked entries and then populate
new entries at the end of our arrays that keep track of those entries.
When we figure out that all entries have been ignored/are untracked we
then remove this tail of entries from those vectors again. But there is
an off by one error in both paths that causes us to not free the first
ignored and untracked entries, respectively.
Fix these off-by-one errors to plug the resulting leak. While at it,
massage the code a bit to match our modern code style.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When `update_submodule()` fails we return with `die_message()`, which
only causes us to print the same message as `die()` would without
actually causing the process to die. We don't free memory in that case
and thus leak memory.
Fix the leak by freeing the remote ref.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In the "submodule-nested-repo-config" helper we create a submodule
repository and print its configuration. We do not clear the repo,
causing a memory leak. Plug it.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Fix leaking error buffer when `compute_alternate_path()` fails.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In `runcommand_in_submodule_cb()` we may end up not executing the child
command when `argv` is empty. But we still populate the command with
environment variables and other things, which needs cleanup. This leads
to a memory leak because we do not call `finish_command()`.
Fix this by clearing the child process when we don't execute it.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We're not freeing the submodule update strategy command. Provide a
helper function that does this for us and call it in
`update_data_release()`.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In `handle_builtin()` we may end up creating an ad-hoc argv array in
case we see that the command line contains the "--help" parameter. In
this case we observe two memory leaks though:
- We leak the `struct strvec` itself because we directly exit after
calling `run_builtin()`, without bothering about any cleanups.
- Even if we free'd that vector we'd end up leaking some of its
strings because `run_builtin()` will modify the array.
Plug both of these leaks.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The `html_path` variable gets populated via `git_help_config()`, which
puts an allocated string into it if its value has been configured. We do
not clear the old value though, which causes a memory leak in case the
config exists multiple times.
Plug this leak. The leak is exposed by t0012, but plugging it alone is
not sufficient to make the test suite pass.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In `get_html_page_path()` we may end up assigning the return value of
`system_path()` to the global `html_path` variable. But as we also
assign the returned value to `to_free`, we will deallocate its memory
upon returning from the function. Consequently, `html_path` will now
point to deallocated memory.
Fix this issue by instead assigning the value to a separate local
variable.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When a patch series happened to look interesting to the maintainer
but is not ready for 'next', it is applied on a topic branch and
merged to the 'seen' branch to keep an eye on it. In an ideal
world, the participants give reviews and the original author
responds to the reviews, and such iterations may produce newer
versions of the patch series, and at some point, a concensus is
formed that the latest round is good enough for 'next'. Then the
topic is merged to 'next' for inclusion in a future release.
In a much less ideal world we live in, however, a topic sometimes
get stalled. The original author may not respond to hanging review
comments, may promise an update will be sent but does not manage to
do so, nobody talks about the topic on the list and nobody builds
upon it, etc.
Following the recent trend to document and give more transparency to
the decision making process, let's set a deadline to keep a topic
still alive, and actively discard those that are inactive for a long
period of time.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We allocate a list of ref structs from get_local_heads() but never clean
it up. We should do so before exiting to avoid complaints from the
leak-checker. Note that we have to initialize it to NULL, because
there's one code path that can jump to the cleanup label before we
assign to it.
Fixing this lets us mark t5540 as leak-free.
Curiously building with SANITIZE=leak and gcc does not seem to find this
problem, but switching to clang does. It seems like a fairly obvious
leak, though.
I was curious that the matching remote_refs did not have the same leak.
But that is because we store the list in a global variable, so it's
still reachable after we exit. Arguably we could treat it the same as
future-proofing, but I didn't bother (now that the script is marked
leak-free, anybody moving it to a stack variable will notice).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In http-push's finish_request(), if we fail a loose object request we
may fall back to trying a packed request. But if we do so, we leave the
http_loose_object_request in place, leaking it.
We can fix this by always cleaning it up. Note that the obj_req pointer
here (which we'll set to NULL) is a copy of the request->userData
pointer, which will now point to freed memory. But that's OK. We'll
either release the parent request struct entirely, or we'll convert it
into a packed request, which will overwrite userData itself.
This leak is found by t5540, but it's not quite leak-free yet.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In http-push's get_delta(), we generate a list of pending objects by
recursively processing trees and blobs, adding them to a linked list.
And then we iterate over the list, adding a new request for each
element.
But since we iterate using the list head pointer, at the end it is NULL
and all of the actual list structs have been leaked.
We can fix this either by using a separate iterator and then calling
object_list_free(), or by just freeing as we go. I picked the latter,
just because it means we continue to shrink the list as we go, though
I'm not sure it matters in practice (we call add_send_request() in the
loop, but I don't think it ever looks at the global objects list
itself).
This fixes several leaks noticed in t5540.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When we ask libexpat to parse XML data, we sometimes set xml_cdata as a
CharacterDataHandler callback. This fills in an allocated string in the
xml_ctx struct which we never free, causing a leak.
I won't pretend to understand the purpose of the field, but it looks
like it is used by other callbacks during the parse. At any rate, we
never look at it again after XML_Parse() returns, so we should be OK to
free() it then.
This fixes several leaks triggered by t5540.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The remote_ls_ctx struct has dentry_name string, which is filled in with
a heap allocation in the handle_remote_ls_ctx() XML callback. After the
XML parse is done in remote_ls(), we should free the string to avoid a
leak.
This fixes several leaks found by running t5540.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When we issue a PUT, we initialize and fill a strbuf embedded in the
transfer_request struct. But we never release this buffer, causing a
leak.
We can fix this by adding a strbuf_release() call to release_request().
If we stopped there, then non-PUT requests would try to release a
zero-initialized strbuf. This works OK in practice, but we should try to
follow the strbuf API more closely. So instead, we'll always initialize
the strbuf when we create the transfer_request struct.
That in turn means switching the strbuf_init() call in start_put() to a
simple strbuf_grow().
This leak is triggered in t5540.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When we issue a PUT request, we store the destination in the "dest"
field by detaching from a strbuf. But we never free the result, causing
a leak.
We can address this in the release_request() function. But note that we
also need to initialize it to NULL, as most other request types do not
set it at all.
Curiously there are _two_ functions to initialize a transfer_request
struct. Adding the initialization only to add_fetch_request() seems to
be enough for t5540, but I won't pretend to understand why. Rather than
just adding "request->dest = NULL" in both spots, let's zero the whole
struct. That addresses this problem, as well as any future ones (and it
can't possibly hurt, as by definition we'd be hitting uninitialized
memory previously).
This fixes several leaks noticed by t5540.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
To pass headers to curl, we have to allocate a curl_slist linked list
and then feed it to curl_easy_setopt(). But the header list is not
copied by curl, and must remain valid until we are finished with the
request.
A few spots in http-push get this right, freeing the list after
finishing the request, but many do not. In most cases the fix is simple:
we set up the curl slot, start it, and then use run_active_slot() to
take it to completion. After that, we don't need the headers anymore and
can call curl_slist_free_all().
But one case is trickier: when we do a MOVE request, we start the
request but don't immediately finish it. It's possible we could change
this to be more like the other requests, but I didn't want to get into
risky refactoring of this code. So we need to stick the header list into
the request struct and remember to free it later.
Curiously, the struct already has a headers field for this purpose! It
goes all the way back to 58e60dd203 (Add support for pushing to a remote
repository using HTTP/DAV, 2005-11-02), but it doesn't look like it was
ever used. We can make use of it just by assigning our headers to it,
and there is already code in finish_request() to clean it up.
This fixes several leaks triggered by t5540.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Our repo->url string comes from str_end_url_with_slash(), which always
allocates its output buffer. We should free it before exiting to avoid
triggering the leak-checker.
This can be seen by leak-checking t5540.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We parse the command-line arguments into a refspec struct, but we never
free them. We should do so before exiting to avoid triggering the
leak-checker.
This triggers in t5540 many times (basically every invocation of
http-push).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The dumb-http walker code creates a "fake" packed_git list representing
packs we've downloaded from the remote (I call it "fake" because
generally that struct is only used and managed by the local repository
struct). But during our cleanup phase we don't touch those at all,
causing a leak.
There's no support here from the rest of the object-database API, as
these structs are not meant to be freed, except when closing the object
store completely. But we can see that raw_object_store_clear() just
calls free() on them, and that's enough here to fix the leak.
I also added a call to close_pack() before each. In the regular code
this happens via close_object_store(), which we do as part of
raw_object_store_clear(). This is necessary to prevent leaking mmap'd
data (like the pack idx) or descriptors. The leak-checker won't catch
either of these itself, but I did confirm with some hacky warning()
calls and running t5550 that it's easy to leak at least index data.
This is all much more intimate with the packed_git struct than I'd like,
but I think fixing it would be a pretty big refactor. And it's just not
worth it for dumb-http code which is rarely used these days. If we can
silence the leak-checker without creating too much hassle, we should
just do that.
This lets us mark t5550 as leak-free.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
After dumb-http downloads the remote info/refs file, it adds an extra
HEAD ref struct to our list by downloading the remote symref and finding
the matching ref within our list. If either of those fails, we throw
away the ref struct. But we do so with free(), when we should use
free_one_ref() to catch any embedded allocations (in particular, if
fetching the remote HEAD succeeded but the branch is unborn, its
ref->symref field will be populated but we'll still throw it all away).
This leak is triggered by t5550 (but we still have a little more work to
mark it leak-free).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We use http_get_strbuf() to fetch the remote info/packs content into a
strbuf, but never free it, causing a leak. There's no need to hold onto
it, as we've already parsed it completely.
This lets us mark t5619 as leak-free.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In new_http_object_request(), we initialize the zlib stream with
git_inflate_init(). We must have a matching git_inflate_end() to avoid
leaking any memory allocated by zlib.
In most cases this happens in finish_http_object_request(), but we don't
always get there. If we abort a request mid-stream, then we may clean it
up without hitting that function.
We can't just add a git_inflate_end() call to the release function,
though. That would double-free the cases that did actually finish.
Instead, we'll move the call from the finish function to the release
function. This does delay it for the cases that do finish, but I don't
think it matters. We should have already reached Z_STREAM_END (and
complain if we didn't), and we do not record any status code from
git_inflate_end().
This leak is triggered by t5550 at least (and probably other dumb-http
tests).
I did find one other related spot of interest. If we try to read a
previously downloaded file and fail, we reset the stream by calling
memset() followed by a fresh git_inflate_init(). I don't think this case
is triggered in the test suite, but it seemed like an obvious leak, so I
added the appropriate git_inflate_end() before the memset() there.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The new_http_object_request() function allocates a struct on the heap,
along with some fields inside the struct. But the matching function to
clean it up, release_http_object_request(), only frees the interior
fields without freeing the struct itself, causing a leak.
The related http_pack_request new/release pair gets this right, and at
first glance we should be able to do the same thing and just add a
single free() call. But there's a catch.
These http_object_request structs are typically embedded in the
object_request struct of http-walker.c. And when we clean up that parent
struct, it sanity-checks the embedded struct to make sure we are not
leaking descriptors. Which means a use-after-free if we simply free()
the embedded struct.
I have no idea how valuable that sanity-check is, or whether it can
simply be deleted. This all goes back to 5424bc557f (http*: add helper
methods for fetching objects (loose), 2009-06-06). But the obvious way
to make it all work is to be sure we set the pointer to NULL after
freeing it (and our freeing process closes the descriptor, so we know
there is no leak).
To make sure we do that consistently, we'll switch the pointer we take
in release_http_object_request() to a pointer-to-pointer, and we'll set
it to NULL ourselves. And then the compiler can help us find each caller
which needs to be updated.
Most cases will just pass "&obj_req->req", which will obviously do the
right thing. In a few cases, like http-push's finish_request(), we are
working with a copy of the pointer, so we don't NULL the original. But
it's OK because the next step is to free the struct containing the
original pointer anyway.
This lets us mark t5551 as leak-free. Ironically this is the "smart"
http test, and the leak here only affects dumb http. But there's a
single dumb-http invocation in there. The full dumb tests are in t5550,
which still has some more leaks.
This also makes t5559 leak-free, as it's just an HTTP/2 variant of
t5551. But we don't need to mark it as such, since it inherits the flag
from t5551.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When redacting headers for GIT_TRACE_CURL, we build up a redacted cookie
header in a local strbuf, and then copy it into the output. But we
forget to release the temporary strbuf, leaking it for every cookie
header we show.
The other redacted headers don't run into this problem, since they're
able to work in-place in the output buffer. But the cookie parsing is
too complicated for that, since we redact the cookies individually.
This leak is triggered by the cookie tests in t5551.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When using a remote-helper, the fetch_refs() function will issue a
"list" command if we haven't already done so. We don't care about the
result, but this is just to maintain compatibility as explained in
ac3fda82bf (transport-helper: skip ls-refs if unnecessary, 2019-08-21).
But get_refs_list_using_list(), the function we call to issue the
command, does parse and return the resulting ref list, which we simply
leak. We should record the return value and free it immediately (another
approach would be to teach it to avoid allocating at all, but it does
not seem worth the trouble to micro-optimize this mostly historical
case).
Triggering this requires the v0 protocol (since in v2 we use stateless
connect to take over the connection). You can see it in t5551.37, "fetch
by SHA-1 without tag following", as it explicitly enables v0.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
If the --lock-pack option is passed (which it typically is when
fetch-pack is used under the hood by smart-http), then we may end up
with entries in our pack_lockfiles string_list. We need to clear them
before returning to avoid a leak.
In git-fetch this isn't a problem, since the same cleanup happens via
transport_unlock_pack(). But the leak is detectable in t5551, which does
http fetches.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The "--prefetch" option to git-fetch modifies the default refspec,
including eliminating some entries entirely. When we drop an entry we
free the strings in the refspec_item, but we forgot to free the matching
string in the "raw" array of the refspec struct. There's no behavioral
bug here (since we correctly shrink the raw array, too), but we're
leaking the allocated string.
Let's add in the leak-fix, and while we're at it drop "const" from
the type of the raw string array. These strings are always allocated by
refspec_append(), etc, and this makes the memory ownership more clear.
This is all a bit more intimate with the refspec code than I'd like, and
I suspect it would be better if each refspec_item held on to its own raw
string, we had a single array, and we could use refspec_item_clear() to
clean up everything. But that's a non-trivial refactoring, since
refspec_item structs can be held outside of a "struct refspec", without
having a matching raw string at all. So let's leave that for now and
just fix the leak in the most immediate way.
This lets us mark t5582 as leak-free.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We loop over the refs to push, building up a strbuf with the set of
"push" directives to send to the remote helper. But if the atomic-push
flag is set and we hit a rejected ref, we'll bail from the function
early. We clean up most things, but forgot to release the strbuf.
Fixing this lets us mark t5541 as leak-free.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The send-pack --force-with-lease option populates a push_cas_option
struct with allocated strings. Exiting without cleaning this up will
cause leak-checkers to complain.
We can fix this by calling clear_cas_option(), after making it publicly
available. Previously it was used only for resetting the list when we
saw --no-force-with-lease.
The git-push command has the same "leak", though in this case it won't
trigger a leak-checker since it stores the push_cas_option struct as a
global rather than on the stack (and is thus reachable even after main()
exits). I've added cleanup for it here anyway, though, as
future-proofing.
The leak is triggered by t5541 (it tests --force-with-lease over http,
which requires a separate send-pack process under the hood), but we
can't mark it as leak-free yet.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When we parse a commit via repo_parse_commit_internal(), if
save_commit_buffer is set we'll stuff the buffer of the object contents
into a cache, overwriting any previous value.
This can result in a leak of that previously cached value, though it's
rare in practice. If we have a value in the cache it would have come
from a previous parse, and during that parse we'd set the object.parsed
flag, causing any subsequent parse attempts to exit without doing any
work.
But it's possible to "unparse" a commit, which we do when registering a
commit graft. And since shallow fetches are implemented using grafts,
the leak is triggered in practice by t5539.
There are a number of possible ways to address this:
1. the unparsing function could clear the cached commit buffer, too. I
think this would work for the case I found, but I'm not sure if
there are other ways to end up in the same state (an unparsed
commit with an entry in the commit buffer cache).
2. when we parse, we could check the buffer cache and prefer it to
reading the contents from the object database. In theory the
contents of a particular sha1 are immutable, but the code in
question is violating the immutability with grafts. So this
approach makes me a bit nervous, although I think it would work in
practice (the grafts are applied to what we parse, but we still
retain the original contents).
3. We could realize the cache is already populated and discard its
contents before overwriting. It's possible some other code could be
holding on to a pointer to the old cache entry (and we'd introduce
a use-after-free), but I think the risk of that is relatively low.
4. The reverse of (3): when the cache is populated, don't bother
saving our new copy. This is perhaps a little weird, since we'll
have just populated the commit struct based on a different buffer.
But the two buffers should be the same, even in the presence of
grafts (as in (2) above).
I went with option 4. It addresses the leak directly and doesn't carry
any risk of breaking other assumptions. And it's the same technique used
by parse_object_buffer() for this situation, though I'm not sure when it
would even come up there. The extra safety has been there since
bd1e17e245 (Make "parse_object()" also fill in commit message buffer
data., 2005-05-25).
This lets us mark t5539 as leak-free.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When we call get_remote_heads() for protocol v0, that may populate the
"shallow" oid_array, which must be cleaned up to avoid a leak at the
program exit. The same problem exists for both fetch-pack and send-pack,
but not for the usual transport.c code paths, since we already do this
cleanup in disconnect_git().
Fixing this lets us mark t5542 as leak-free for the send-pack side, but
fetch-pack will need some more fixes before we can do the same for
t5539.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Our fetch_pack_args holds a filter_options struct that may be populated
with allocated strings by the by the "--filter" command-line option. We
must free it before exiting to avoid a leak when the program exits.
The usual fetch code paths that use transport.c don't have the same
leak, because we do the cleanup in disconnect_git().
Fixing this leak lets us mark t5500 as leak-free.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The git_connect() function has a special CONNECT_DIAG_URL mode, where we
stop short of actually connecting to the other side and just print some
parsing details. For URLs that require a child process (like ssh), we
free() the child_process struct but forget to clear it, leaking the
strings we stuffed into its "env" list.
This leak is triggered many times in t5500, which uses "fetch-pack
--diag-url", but we're not yet ready to mark it as leak-free.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When calling `fetch_pack()` the caller is expected to pass in a set of
sought-after refs that they want to fetch. This array gets massaged to
not contain duplicate entries, which is done by replacing duplicate refs
with `NULL` pointers. This modifies the caller-provided array, and in
case we do unset any pointers the caller now loses track of that ref and
cannot free it anymore.
Now the obvious fix would be to not only unset these pointers, but to
also free their contents. But this doesn't work because callers continue
to use those refs. Another potential solution would be to copy the array
in `fetch_pack()` so that we dont modify the caller-provided one. But
that doesn't work either because the NULL-ness of those entries is used
by callers to skip over ref entries that we didn't even try to fetch in
`report_unmatched_refs()`.
Instead, we make it the responsibility of our callers to duplicate these
arrays as needed. It ain't pretty, but it works to plug the memory leak.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When unregistering a shallow root we shrink the array of grafts by one
and move remaining grafts one to the left. This can of course only
happen when there are any grafts left, because otherwise there is
nothing to move. As such, this code is guarded by a condition that only
performs the move in case there are grafts after the position of the
graft to be unregistered.
By mistake we also put the call to free the unregistered graft into that
condition. But that doesn't make any sense, as we want to always free
the graft when it exists. Fix the resulting memory leak by doing so.
This leak is exposed by t5500, but plugging it does not make the whole
test suite pass.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We never clear the arguments that we pass to git-index-pack(1). Create a
common exit path and release them there to plug this leak.
This is leak is exposed by t5702, but plugging the leak does not make
the whole test suite pass.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
If you are trying to find and fix leaks in a large test script, it can
be overwhelming to see the leak logs for every test at once. The
previous commit let you use "--immediate" to see the logs after the
first failing test, but this isn't always the first leak. As discussed
there, we may see leaks from previous tests that didn't happen to fail.
To catch those, let's check for any logs that appeared after each test
snippet is run, meaning that in a SANITIZE=leak build, any leak is an
immediate failure of the test snippet.
This check is mostly free in non-leak builds (just a "test -z"), and
only a few extra processes in a leak build, so I don't think the
overhead should matter (if it does, we could probably optimize for the
common "no logs" case without even spending a process).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When we've compiled with SANITIZE=leak, at the end of the test script
we'll dump any collected logs to stdout. These logs have two uses:
1. Leaks don't always cause a test snippet to fail (e.g., if they
happen in a sub-process that we expect to return non-zero).
Checking the logs catches these cases that we'd otherwise miss
entirely.
2. LSan will dump the leak info to stderr, but that is sometimes
hidden (e.g., because it's redirected by the test, or because it's
in a sub-process whose stderr goes elsewhere). Dumping the logs is
the easiest way for the developer to see them.
One downside is that the set of logs for an entire script may be very
long, especially when you're trying to fix existing test scripts. You
can run with --immediate to stop at the first failing test, which means
we'll have accrued fewer logs. But we don't show the logs in that case!
Let's start doing so. This can only help case (2), of course (since it
depends on test failure). And it's somewhat weakened by the fact that
any cases of (1) will pollute the logs. But we can improve things
further in the next patch.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We ask LSan to record the logs of all leaks in test-results/, which is
useful for finding leaks that didn't trigger a test failure.
We don't clean out the leak/ directory for each test before running it,
though. Instead, we count the number of files it has, and complain only
if we ended up with more when the script finishes. So we shouldn't
trigger any output if you've made a script leak free. But if you simply
_reduced_ the number of leaks, then there is an annoying outcome: we do
not record which logs were from this run and which were from previous
ones. So when we dump them to stdout, you get a mess of
possibly-outdated leaks. This is very confusing when you are in an
edit-compile-test cycle trying to fix leaks.
The instructions do note that you should "rm -rf test-results/" if you
want to avoid this. But I'm having trouble seeing how this cumulative
count could ever be useful. It is not even counting the number of leaks,
but rather the number of processes with at least one leak!
So let's just blow away the per-test leak/ directory before running. We
already overwrite the ".out" file in test-results/ in the same way, so
this is following that pattern.
Running "make test" isn't affected by this, since it blows away all of
test-results/ already. This only comes up when you are iterating on a
single script that you're running manually.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
With the new synopsis formatting backend, no special asciidoc markup
is needed.
Signed-off-by: Jean-Noël Avila <jn.avila@free.fr>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In order to follow the common manpage usage, the synopsis of the
commands needs to be heavily typeset. A first try was performed with
using native markup, but it turned out to make the document source
almost unreadable, difficult to write and prone to mistakes with
unwanted Asciidoc's role attributes.
In order to both simplify the writer's task and obtain a consistant
typesetting in the synopsis, a custom 'synopsis' paragraph type is
created and the processor for backticked text are modified. The
backends of asciidoc and asciidoctor take in charge to correctly add
the required typesetting.
Signed-off-by: Jean-Noël Avila <jn.avila@free.fr>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When starting a reftable transaction we lock all stacks we are about to
modify. While it may happen that the stack is out-of-date at this point
in time we don't really care: transactional updates encode the expected
state of a certain reference, so all that we really want to verify is
that the _current_ value matches that expected state.
Pass `REFTABLE_STACK_NEW_ADDITION_RELOAD` when locking the stack such
that an out-of-date stack will be reloaded after having been locked.
This change is safe because all verifications of the expected state
happen after this step anyway.
Add a testcase that verifies that many writers are now able to write to
the stack concurrently without failures and with a deterministic end
result.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In `reftable_stack_new_addition()` we first lock the stack and then
check whether it is still up-to-date. If it is not we return an error to
the caller indicating that the stack is outdated.
This is overly restrictive in our ref transaction interface though: we
lock the stack right before we start to verify the transaction, so we do
not really care whether it is outdated or not. What we really want is
that the stack is up-to-date after it has been locked so that we can
verify queued updates against its current state while we know that it is
locked for concurrent modification.
Introduce a new flag `REFTABLE_STACK_NEW_ADDITION_RELOAD` that alters
the behaviour of `reftable_stack_init_addition()` in this case: when we
notice that it is out-of-date we reload it instead of returning an error
to the caller.
This logic will be wired up in the reftable backend in the next commit.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When multiple concurrent processes try to update references in a
repository they may try to lock the same lockfiles. This can happen even
when the updates are non-conflicting and can both be applied, so it
doesn't always make sense to abort the transaction immediately. Both the
"loose" and "packed" backends thus have a grace period that they wait
for the lock to be released that can be controlled via the config values
"core.filesRefLockTimeout" and "core.packedRefsTimeout", respectively.
The reftable backend doesn't have such a setting yet and instead fails
immediately when it sees such a lock. But the exact same concepts apply
here as they do apply to the other backends.
Introduce a new "reftable.lockTimeout" config that controls how long we
may wait for a "tables.list" lock to be released. The default value of
this config is 100ms, which is the same default as we have it for the
"loose" backend.
Note that even though we also lock individual tables, this config really
only applies to the "tables.list" file. This is because individual
tables are only ever locked when we already hold the "tables.list" lock
during compaction. When we observe such a lock we in fact do not want to
compact the table at all because it is already in the process of being
compacted by a concurrent process. So applying the same timeout here
would not make any sense and only delay progress.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The `include_by_branch()` function is responsible for evaluating whether
or not a specific include should be pulled in based on the currently
checked out branch. Naturally, his condition can only be evaluated when
we have a properly initialized repository with a ref store in the first
place. This is why the function guards against the case when either
`data->repo` or `data->repo->gitdir` are `NULL` pointers.
But the second check is insufficient: the `gitdir` may be set even
though the repository has not been initialized. Quoting "setup.c":
NEEDSWORK: currently we allow bogus GIT_DIR values to be set in some
code paths so we also need to explicitly setup the environment if the
user has set GIT_DIR. It may be beneficial to disallow bogus GIT_DIR
values at some point in the future.
So when either the GIT_DIR environment variable or the `--git-dir`
global option are set by the user then `the_repository` may end up with
an initialized `gitdir` variable. And this happens even when the dir is
invalid, like for example when it doesn't exist. It follows that only
checking for whether or not `gitdir` is `NULL` is not sufficient for us
to determine whether the repository has been properly initialized.
This issue can lead to us triggering a BUG: when using a config with an
"includeIf.onbranch:" condition outside of a repository while using the
`--git-dir` option pointing to an invalid Git directory we may end up
trying to evaluate the condition even though the ref storage format has
not been set up.
This bisects to 173761e21b (setup: start tracking ref storage format,
2023-12-29), but that commit really only starts to surface the issue
that has already existed beforehand. The code to check for `gitdir` was
introduced via 85fe0e800c (config: work around bug with
includeif:onbranch and early config, 2019-07-31), which tried to fix
similar issues when we didn't yet have a repository set up. But the fix
was incomplete as it missed the described scenario.
As the quoted comment mentions, we'd ideally refactor the code to not
set up `gitdir` with an invalid value in the first place, but that may
be a bigger undertaking. Instead, refactor the code to use the ref
storage format as an indicator of whether or not the ref store has been
set up to fix the bug.
Reported-by: Ronan Pigott <ronan@rjp.ie>
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add a couple more tests for "onbranch" includes for several edge cases.
All tests except for the last one pass, so for the most part this change
really only aims to nail down behaviour of include conditionals further.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When running 'git sparse-checkout disable' with the sparse index
enabled, Git is expected to expand the index into a full index. However,
it currently outputs the advice message saying that that is unexpected
and likely due to an issue with the working directory.
Disable this advice message when in this code path. Establish a pattern
for doing a similar removal in the future.
Signed-off-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
For each linked worktree, Git maintains two pointers: (1)
<repo>/worktrees/<id>/gitdir which points at the linked worktree, and
(2) <worktree>/.git which points back at <repo>/worktrees/<id>. Both
pointers are absolute pathnames.
Aside from manually manipulating those raw files, it is possible to
easily "break" one or both pointers by ignoring the "git worktree move"
command and instead manually moving a linked worktree, moving the
repository, or moving both. The "git worktree repair" command was
invented to handle this case by restoring these pointers to sane values.
For the "repair" command, the "git worktree" manual page states:
Repair worktree administrative files, if possible, if they have
become corrupted or outdated due to external factors.
The "if possible" clause was chosen deliberately to convey that the
existing implementation may not be able to fix every possible breakage,
and to imply that improvements may be made to handle other types of
breakage.
A recent problem report[*] illustrates a case in which "git worktree
repair" not only fails to fix breakage, but actually causes breakage.
Specifically, if a repository / main-worktree and linked worktrees are
*copied* as a unit (rather than *moved*), then "git worktree repair" run
in the copy leaves the copy untouched but botches the pointers in the
original repository and the original worktrees.
For instance, given this directory structure:
orig/
main/ (main-worktree)
linked/ (linked worktree)
if "orig" is copied (not moved) to "dup", then immediately after the
manual copy operation:
* orig/main/.git/worktrees/linked/gitdir points at orig/linked/.git
* orig/linked/.git points at orig/main/.git/worktrees/linked
* dup/main/.git/worktrees/linked/gitdir points at orig/linked/.git
* dup/linked/.git points at orig/main/.git/worktrees/linked
So, dup/main thinks its linked worktree is orig/linked, and worktree
dup/linked thinks its repository / main-worktree is orig/main.
"git worktree repair" is reasonably simple-minded; it wants to trust
valid-looking pointers, hence doesn't try to second-guess them. In this
case, when validating dup/linked/.git, it finds a legitimate repository
pointer, orig/main/.git/worktrees/linked, thus trusts that is correct,
but does notice that gitdir in that directory doesn't point at
dup/linked/.git, so it (incorrectly) _fixes_
orig/main/.git/worktrees/linked/gitdir to point at dup/linked/.git.
Similarly, when validating dup/main/.git/worktrees/linked/gitdir, it
finds a legitimate worktree pointer, orig/linked/.git, but notices that
its .git file doesn't point back at dup/main, thus (incorrectly) _fixes_
orig/linked/.git to point at dup/main/.git/worktrees/linked. Hence, it
has modified and broken the linkage between orig/main and orig/linked
rather than fixing dup/main and dup/linked as expected.
Fix this problem by also checking if a plausible .git/worktrees/<id>
exists in the *current* repository -- not just in the repository pointed
at by the worktree's .git file -- and comparing whether they are the
same. If not, then it is likely because the repository / main-worktree
and linked worktrees were copied, so prefer the discovered plausible
pointer rather than the one from the existing .git file.
[*]: https://lore.kernel.org/git/E1sr5iF-0007zV-2k@binarylane-bailey.stuart.id.au/
Reported-by: Russell Stuart <russell+git.vger.kernel.org@stuart.id.au>
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When f4dbdfc4d5 (commit-graph: clean up leaked memory during write,
2018-10-03) added the UNLEAK, it was right before a call to die_errno().
e103f7276f (commit-graph: return with errors during write, 2019-06-12)
made it unnecessary, as it was then followed by a free() call for the
allocated string.
The code moved to write_commit_graph_file() in the meantime and the
string pointer is now part of a struct, but the function's only caller
still cleans up the allocation. Drop the superfluous UNLEAK.
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
git archive checks whether pathspec arguments match anything to avoid
surprises due to typos and later loads the index to get attributes.
This order was OK when these features were introduced by ba053ea96c
(archive: do not read .gitattributes in working directory, 2009-04-18)
and d5f53d6d6f (archive: complain about path specs that don't match
anything, 2009-12-12).
But when attribute matching was added to pathspec in b0db704652
(pathspec: allow querying for attributes, 2017-03-13), the pathspec
checker in git archive did not support it fully, because it lacks the
attributes from the index.
Load the index earlier, before the pathspec check, to support attr
pathspecs.
Reported-by: Ronan Pigott <ronan@rjp.ie>
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The 'scalar reconfigure' command is intended to update registered repos
with the latest settings available. However, up to now we were not
reregistering the repos with background maintenance.
In particular, this meant that the background maintenance schedule would
not be updated if there are improvements between versions.
Be sure to register repos for maintenance during the reconfigure step.
Signed-off-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
At the moment, some background jobs are getting blocked on credentials
during the 'prefetch' task. This leads to other tasks, such as
incremental repacks, getting blocked. Further, if a user manages to fix
their credentials, then they still need to cancel the background process
before their background maintenance can continue working.
Update the background schedules for our four scheduler integrations to
include these config options via '-c' options:
* 'credential.interactive=false' will stop Git and some credential
helpers from prompting in the UI (assuming the '-c' parameters are
carried through and respected by GCM).
* 'core.askPass=true' will replace the text fallback for a username
and password into the 'true' command, which will return a success in
its exit code, but Git will treat the empty string returned as an
invalid password and move on.
We can do some testing that the credentials are passed, at least in the
systemd case due to writing the service files.
Signed-off-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When scripts or background maintenance wish to perform HTTP(S) requests,
there is a risk that our stored credentials might be invalid. At the
moment, this causes the credential helper to ping the user and block the
process. Even if the credential helper does not ping the user, Git falls
back to the 'askpass' method, which includes a direct ping to the user
via the terminal.
Even setting the 'core.askPass' config as something like 'echo' will
causes Git to fallback to a terminal prompt. It uses
git_terminal_prompt(), which finds the terminal from the environment and
ignores whether stdin has been redirected. This can also block the
process awaiting input.
Create a new config option to prevent user interaction, favoring a
failure to a blocked process.
The chosen name, 'credential.interactive', is taken from the config
option used by Git Credential Manager to already avoid user
interactivity, so there is already one credential helper that integrates
with this option. However, older versions of Git Credential Manager also
accepted other string values, including 'auto', 'never', and 'always'.
The modern use is to use a boolean value, but we should still be
careful that some users could have these non-booleans. Further, we
should respect 'never' the same as 'false'. This is respected by the
implementation and test, but not mentioned in the documentation.
The implementation for the Git interactions takes place within
credential_getpass(). The method prototype is modified to return an
'int' instead of 'void'. This allows us to detect that no attempt was
made to fill the given credential, changing the single caller slightly.
Also, a new trace2 region is added around the interactive portion of the
credential request. This provides a way to measure the amount of time
spent in that region for commands that _are_ interactive. It also makes
a conventient way to test that the config option works with
'test_region'.
Signed-off-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
It has been reported than running
git submodule status --recurse | grep -q ^+
results in an unexpected error message
fatal: failed to recurse into submodule $submodule
When "git submodule--helper" recurses into a submodule it creates a
child process. If that process fails then the error message above is
displayed by the parent. In the case above the child is killed by
SIGPIPE as "grep -q" exits as soon as it sees the first match. Fix this
by propagating SIGPIPE so that it is visible to the process running
git. We could propagate other signals but I'm not sure there is much
value in doing that. In the common case of the user pressing Ctrl-C or
Ctrl-\ then SIGINT or SIGQUIT will be sent to the foreground process
group and so the parent process will receive the same signal as the
child.
Reported-by: Matt Liberty <mliberty@precisioninno.com>
Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* cp/unit-test-reftable-stack:
t-reftable-stack: add test for stack iterators
t-reftable-stack: add test for non-default compaction factor
t-reftable-stack: use reftable_ref_record_equal() to compare ref records
t-reftable-stack: use Git's tempfile API instead of mkstemp()
t: harmonize t-reftable-stack.c with coding guidelines
t: move reftable/stack_test.c to the unit testing framework
git gui can open a merge tool when conflicts are detected (Right click
in the diff of the file with conflicts).
The merge tools that are allowed to use are hard coded into git gui.
If one wants to add a new merge tool it has to be added to git gui
through a source code change.
This is not convenient in comparison to how it works in git (without gui).
git itself has configuration options for a merge tools path and command
in the git configuration.
New merge tools can be set up there without a source code change.
Those options are used only by pure git in contrast to git gui. git calls
the configured merge tools directly from the configuration while git Gui
doesn't.
With this change git gui can call merge tools configured in the
configuration directly without a change in git gui source code.
It needs a configured "merge.tool" and a configured
"mergetool.<mergetool name>.cmd" configuration entry as shown in the
git-config manual page.
Configuration example:
[merge]
tool = vscode
[mergetool "vscode"]
cmd = \"the/path/to/Code.exe\" --wait --merge \"$LOCAL\" \"$REMOTE\" \"$BASE\" \"$MERGED\"
Without the "mergetool.<mergetool name>.cmd" entry and an unsupported
"merge.tool" entry, git gui behaves mainly as before this change and
informs the user about an unsupported merge tool. In addtition, it also
shows a hint to add a configuration entry to use the tool as an
unsupported tool with degraded support.
If a wrong "mergetool.<mergetool name>.cmd" is configured by accident,
it gets handled by git gui already. In this case git gui informs the
user that the merge tool couldn't be opened. This behavior is preserved
by this change and should not change.
"Beyond Compare 3" and "Visual Studio Code" were tested as manually
configured merge tools.
Signed-off-by: Tobias Boesch <tobias.boesch@miele.com>
Signed-off-by: Johannes Sixt <j6t@kdbg.org>
We would strip all leading and trailing whitespace, which git commit
does not. Let's be consistent here.
Signed-off-by: Oswald Buddenhagen <oswald.buddenhagen@gmx.de>
Signed-off-by: Johannes Sixt <j6t@kdbg.org>
This is also known as "washing". This is consistent with the behavior of
interactive git commit, which we should emulate as closely as possible
to avoid usability problems. This way commit message templates and
prepare hooks can be used properly, and comments from conflicted rebases
and merges are cleaned up without having to introduce special handling
for them.
Signed-off-by: Oswald Buddenhagen <oswald.buddenhagen@gmx.de>
Signed-off-by: Johannes Sixt <j6t@kdbg.org>
It is possible that stat information of tracked files is modified without
actually modifying the content. Plumbing commands would detect such files
as modified, so that Git GUI runs `git update-info --refresh` in order to
synchronize the cached stat info with the reality. However, this can be
an expensive operation in large repositories. As remediation,
e534f3a88676 (git-gui: Allow the user to disable update-index --refresh
during rescan, 2006-11-07) introduced an option to skip the expensive
part.
The option was named "trust file modification timestamp". But the catch
is that sometimes file timestamps can't be trusted. In this case, a file
would remain listed in Unstaged Changes although there are no changes.
So 16403d0b1f9d (git-gui: Refresh a file if it has an empty diff,
2006-11-11) introduced a popup message informing the user about the
situation and then removed the file from the Unstaged Changes list.
Now users had to click away the message box for every file that was
stat-dirty. Under the assumption that a file in such a state is not
the only one, 124355d32c06 (git-gui: Always start a rescan on an empty
diff, 2007-01-22) introduced a forced (potentially expensive) refresh
that would de-list all stat-dirty files after the first notification was
dismissed.
Along came 6c510bee2013 (Lazy man's auto-CRLF, 2007-02-13) in Git. It
introduced a new case where a file in the worktree can have no essential
differences to the staged version, but still be detected as modified by
plumbing commands. This time, however, the index cannot be synchronized
fully by `git update-index --refresh`, so that the file remains listed
in Unstaged Changes until it is staged manually.
Needless to say that the message box now becomes an annoyance, because
it must be dismissed every time an affected file is selected, and the
file remains listed nevertheless.
Remove the message box. Write the notice that no differences were found
in the diff panel instead. Also include a link that, when clicked,
initiates the rescan. With this scheme, the rescan does not happen
automatically anymore, but requires an additional click. (This is now
two clicks in total for users who encounter stat-dirty files after
enabling the "trust file modification timestamps" option.) However,
users whom the rescan does not help (autocrlf-related dirty files) save
half the clicks because there is no message box to dismiss.
Signed-off-by: Johannes Sixt <j6t@kdbg.org>
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.