24172 Commits

Author SHA1 Message Date
Junio C Hamano
99bd5a5c9f Merge branch 'tc/last-modified-active-paths-optimization'
"git last-modified" was optimized by narrowing the set of paths to
follow as it dug deeper in the history.

* tc/last-modified-active-paths-optimization:
  last-modified: implement faster algorithm
2025-11-12 11:45:24 -08:00
Junio C Hamano
e569dced68 Merge branch 'cc/fast-import-export-i18n-cleanup'
Messages from fast-import/export are now marked for i18n.

* cc/fast-import-export-i18n-cleanup:
  gpg-interface: mark a string for translation
  fast-import: mark strings for translation
  fast-export: mark strings for translation
  gpg-interface: use left shift to define GPG_VERIFY_*
  gpg-interface: simplify ssh fingerprint parsing
2025-11-06 15:17:01 -08:00
Junio C Hamano
c8a641c590 Merge branch 'rz/t0450-bisect-doc-update'
The help text and manual page of "git bisect" command have been
made consistent with each other.

* rz/t0450-bisect-doc-update:
  bisect: update usage and docs to match each other
2025-11-05 13:41:51 -08:00
Junio C Hamano
5931b6b2fb Merge branch 'jk/doc-backslash-in-exclude'
The patterns used in the .gitignore files use backslash in the way
documented for fnmatch(3); document as such to reduce confusion.

* jk/doc-backslash-in-exclude:
  doc: document backslash in gitignore patterns
2025-11-04 07:48:10 -08:00
Junio C Hamano
377e8e2848 Merge branch 'jk/test-delete-gpgsig-leakfix'
Leakfix.

* jk/test-delete-gpgsig-leakfix:
  test-tool: fix leak in delete-gpgsig command
2025-11-04 07:48:09 -08:00
Junio C Hamano
55e8615d18 Merge branch 'eb/t1016-hash-transition-fix'
Test fix.

* eb/t1016-hash-transition-fix:
  t1016-compatObjectFormat: really freeze time for reproduciblity
2025-11-04 07:48:09 -08:00
Junio C Hamano
8f0d663eac Merge branch 'tz/test-prepare-gnupghome'
Tests did not set up GNUPGHOME correctly, which is fixed but some
flaky tests are exposed in t1016, which needs to be addressed
before this topic can move forward.

* tz/test-prepare-gnupghome:
  t/lib-gpg: call prepare_gnupghome() in GPG2 prereq
  t/lib-gpg: add prepare_gnupghome() to create GNUPGHOME dir
2025-11-04 07:48:07 -08:00
Junio C Hamano
a9db6c66f5 Merge branch 'jt/repo-structure'
"git repo structure", a new command.

* jt/repo-structure:
  builtin/repo: add progress meter for structure stats
  builtin/repo: add keyvalue and nul format for structure stats
  builtin/repo: add object counts in structure output
  builtin/repo: introduce structure subcommand
  ref-filter: export ref_kind_from_refname()
  ref-filter: allow NULL filter pattern
  builtin/repo: rename repo_info() to cmd_repo_info()
2025-11-04 07:48:07 -08:00
Toon Claes
2a04e8c293 last-modified: implement faster algorithm
The current implementation of git-last-modified(1) works by doing a
revision walk, and inspecting the diff at each level of that walk to
annotate entries remaining in the hashmap of paths. In other words, if
the diff at some level touches a path which has not yet been associated
with a commit, then that commit becomes associated with the path.

While a perfectly reasonable implementation, it can perform poorly in
either one of two scenarios:

  1. There are many entries of interest, in which case there is simply
     a lot of work to do.

  2. Or, there are (even a few) entries which have not been updated in a
     long time, and so we must walk through a lot of history in order to
     find a commit that touches that path.

This patch rewrites the last-modified implementation that addresses the
second point. The idea behind the algorithm is to propagate a set of
'active' paths (a path is 'active' if it does not yet belong to a
commit) up to parents and do a truncated revision walk.

The walk is truncated because it does not produce a revision for every
change in the original pathspec, but rather only for active paths.

More specifically, consider a priority queue of commits sorted by
generation number. First, enqueue the set of boundary commits with all
paths in the original spec marked as interesting.

Then, while the queue is not empty, do the following:

  1. Pop an element, say, 'c', off of the queue, making sure that 'c'
     isn't reachable by anything in the '--not' set.

  2. For each parent 'p' (with index 'parent_i') of 'c', do the
     following:

     a. Compute the diff between 'c' and 'p'.
     b. Pass any active paths that are TREESAME from 'c' to 'p'.
     c. If 'p' has any active paths, push it onto the queue.

  3. Any path that remains active on 'c' is associated to that commit.

This ends up being equivalent to doing something like 'git log -1 --
$path' for each path simultaneously. But, it allows us to go much faster
than the original implementation by limiting the number of diffs we
compute, since we can avoid parts of history that would have been
considered by the revision walk in the original implementation, but are
known to be uninteresting to us because we have already marked all paths
in that area to be inactive.

To avoid computing many first-parent diffs, add another trick on top of
this and check if all paths active in 'c' are DEFINITELY NOT in c's
Bloom filter. Since the commit-graph only stores first-parent diffs in
the Bloom filters, we can only apply this trick to first-parent diffs.

Comparing the performance of this new algorithm shows about a 2.5x
improvement on git.git:

    Benchmark 1: master   no bloom
      Time (mean ± σ):      2.868 s ±  0.023 s    [User: 2.811 s, System: 0.051 s]
      Range (min … max):    2.847 s …  2.926 s    10 runs

    Benchmark 2: master with bloom
      Time (mean ± σ):     949.9 ms ±  15.2 ms    [User: 907.6 ms, System: 39.5 ms]
      Range (min … max):   933.3 ms … 971.2 ms    10 runs

    Benchmark 3: HEAD     no bloom
      Time (mean ± σ):     782.0 ms ±   6.3 ms    [User: 740.7 ms, System: 39.2 ms]
      Range (min … max):   776.4 ms … 798.2 ms    10 runs

    Benchmark 4: HEAD   with bloom
      Time (mean ± σ):     307.1 ms ±   1.7 ms    [User: 276.4 ms, System: 29.9 ms]
      Range (min … max):   303.7 ms … 309.5 ms    10 runs

    Summary
      HEAD   with bloom ran
        2.55 ± 0.02 times faster than HEAD     no bloom
        3.09 ± 0.05 times faster than master with bloom
        9.34 ± 0.09 times faster than master   no bloom

In short, the existing implementation is comparably fast *with* Bloom
filters as the new implementation is *without* Bloom filters. So, most
repositories should get a dramatic speed-up by just deploying this (even
without computing Bloom filters), and all repositories should get faster
still when computing Bloom filters.

When comparing a more extreme example of
`git last-modified -- COPYING t`, the difference is even 5 times better:

    Benchmark 1: master
      Time (mean ± σ):      4.372 s ±  0.057 s    [User: 4.286 s, System: 0.062 s]
      Range (min … max):    4.308 s …  4.509 s    10 runs

    Benchmark 2: HEAD
      Time (mean ± σ):     826.3 ms ±  22.3 ms    [User: 784.1 ms, System: 39.2 ms]
      Range (min … max):   810.6 ms … 881.2 ms    10 runs

    Summary
      HEAD ran
        5.29 ± 0.16 times faster than master

As an added benefit, results are more consistent now. For example
implementation in 'master' gives:

    $ git log --max-count=1 --format=%H -- pkt-line.h
    15df15fe07ef66b51302bb77e393f3c5502629de

    $ git last-modified -- pkt-line.h
    15df15fe07ef66b51302bb77e393f3c5502629de	pkt-line.h

    $ git last-modified | grep pkt-line.h
    5b49c1af03e600c286f63d9d9c9fb01403230b9f	pkt-line.h

With the changes in this patch the results of git-last-modified(1)
always match those of `git log --max-count=1`.

One thing to note though, the results might be outputted in a different
order than before. This is not considerd to be an issue because nowhere
is documented the order is guaranteed.

Based-on-patches-by: Derrick Stolee <stolee@gmail.com>
Based-on-patches-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Toon Claes <toon@iotcl.com>
Acked-by: Taylor Blau <me@ttaylorr.com>
[jc: tweaked use of xcalloc() to unbreak coccicheck]
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-11-03 07:25:41 -08:00
Junio C Hamano
a4b1a1478b Merge branch 'rs/merge-base-optim'
The code to walk revision graph to compute merge base has been
optimized.

* rs/merge-base-optim:
  commit-reach: avoid commit_list_insert_by_date()
2025-11-03 06:49:55 -08:00
Junio C Hamano
249b0d3f03 Merge branch 'jk/diff-patch-dry-run-cleanup'
Finishing touches to fixes to the recent regression in "git diff -w
--quiet" and anything that needs to internally generate patch to
see if it turns empty.

* jk/diff-patch-dry-run-cleanup:
  diff: simplify run_external_diff() quiet logic
  diff: drop dry-run redirection to /dev/null
  diff: replace diff_options.dry_run flag with NULL file
  diff: drop save/restore of color_moved in dry-run mode
  diff: send external diff output to diff_options.file
2025-11-03 06:49:55 -08:00
Junio C Hamano
3cf3369e81 Merge branch 'ps/maintenance-geometric'
"git maintenance" command learns the "geometric" strategy where it
avoids doing maintenance tasks that rebuilds everything from
scratch.

* ps/maintenance-geometric:
  t7900: fix a flaky test due to git-repack always regenerating MIDX
  builtin/maintenance: introduce "geometric" strategy
  builtin/maintenance: make "gc" strategy accessible
  builtin/maintenance: extend "maintenance.strategy" to manual maintenance
  builtin/maintenance: run maintenance tasks depending on type
  builtin/maintenance: improve readability of strategies
  builtin/maintenance: don't silently ignore invalid strategy
  builtin/maintenance: make the geometric factor configurable
  builtin/maintenance: introduce "geometric-repack" task
  builtin/gc: make `too_many_loose_objects()` reusable without GC config
  builtin/gc: remove global `repack` variable
2025-11-03 06:49:55 -08:00
Junio C Hamano
5236467090 Merge branch 'jk/match-pathname-fix'
The wildmatch code had a corner case bug that mistakenly makes
"foo**/bar" match with "foobar", which has been corrected.

* jk/match-pathname-fix:
  match_pathname(): give fnmatch one char of prefix context
  match_pathname(): reorder prefix-match check
2025-11-03 06:49:55 -08:00
Junio C Hamano
18a7988898 Merge branch 'rs/add-patch-quit'
The 'q'(uit) command in "git add -p" has been improved to quit
without doing any meaningless work before leaving, and giving EOF
(typically control-D) to the prompt is made to behave the same way.

* rs/add-patch-quit:
  add-patch: quit on EOF
  add-patch: quit without skipping undecided hunks
2025-11-03 06:49:54 -08:00
Junio C Hamano
27a1735807 Merge branch 'ly/diff-name-only-with-diff-from-content'
Regression fixes for a topic that has already been merged.

* ly/diff-name-only-with-diff-from-content:
  diff: stop output garbled message in dry run mode
2025-10-30 08:00:20 -07:00
Junio C Hamano
5554738038 Merge branch 'ps/remove-packfile-store-get-packs'
Two slightly different ways to get at "all the packfiles" in API
has been cleaned up.

* ps/remove-packfile-store-get-packs:
  packfile: rename `packfile_store_get_all_packs()`
  packfile: introduce macro to iterate through packs
  packfile: drop `packfile_store_get_packs()`
  builtin/grep: simplify how we preload packs
  builtin/gc: convert to use `packfile_store_get_all_packs()`
  object-name: convert to use `packfile_store_get_all_packs()`
2025-10-30 08:00:19 -07:00
Junio C Hamano
48d0b6545a Merge branch 'ps/symlink-symref-deprecation'
"Symlink symref" has been added to the list of things that will
disappear at Git 3.0 boundary.

* ps/symlink-symref-deprecation:
  refs/files: deprecate writing symrefs as symbolic links
2025-10-30 08:00:19 -07:00
Junio C Hamano
923436e23d Merge branch 'ey/commit-graph-changed-paths-config'
A new configuration variable commitGraph.changedPaths allows to
turn "--changed-paths" on by default for "git commit-graph".

* ey/commit-graph-changed-paths-config:
  commit-graph: add new config for changed-paths & recommend it in scalar
2025-10-30 08:00:19 -07:00
Christian Couder
c295115ec6 fast-import: mark strings for translation
Some error or warning messages in "builtin/fast-import.c" are marked
for translation, but many are not.

To be more consistent and provide a better experience to people using a
translated version, let's mark all the remaining error or warning
messages for translation.

While at it, let's make the following small changes:

  - replace "GIT" or "git" in a few error messages to just "Git",
  - replace "Expected from command, got %s" to "expected 'from'
    command, got '%s'", which makes it clearer that "from" is a command
    and should not be translated,
  - downcase error and warning messages that start with an uppercase,
  - fix test cases in "t9300-fast-import.sh" that broke because an
    error or warning message was downcased,
  - split error and warning messages that are too long,
  - adjust the indentation of some arguments of the error functions.

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-10-30 07:06:58 -07:00
Jeff King
85333aa1af test-tool: fix leak in delete-gpgsig command
We read the input into a strbuf, so we must free it. Without this, t1016
complains in SANITIZE=leak mode.

The bug was introduced in 7673ecd2dc (t1016-compatObjectFormat: add
tests to verify the conversion between objects, 2023-10-01). But nobody
seems to have noticed, probably because CI did not run these tests until
the fix in 6cd8369ef3 (t/lib-gpg: call prepare_gnupghome() in GPG2
prereq, 2024-07-03).

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-10-29 12:36:10 -07:00
Jeff King
8a6d158a1d doc: document backslash in gitignore patterns
Because gitignore patterns are passed to fnmatch, the handling of
backslashes is the same as it is there: it can be used to escape
metacharacters. We do reference fnmatch(3) for more details, but it may
be friendlier to point out this implication explicitly (especially for
people who want to know about backslash handling and search the
documentation for that word). There are also two cases that I've seen
some other backslash-escaping systems handle differently, so let's
describe those:

  1. A backslash before any character treats that character literally,
     even if it's not otherwise a meta-character. As opposed to
     including the backslash itself (like "foo\bar" in shell expands to
     "foo\bar") or forbidding it ("foo\zar" is required to produce a
     diagnostic in C).

  2. A backslash at the end of the string is an invalid pattern (and not
     a literal backslash).

This second one in particular was a point of confusion between our
implementation and the one in JGit. Our wildmatch behavior matches what
POSIX specifies for fnmatch, so the code and documentation are in line.
But let's add a test to cover this case. Note that the behavior here
differs between wildmatch itself (which is what gitignore will use) and
pathspec matching (which will only turn to wildmatch if a literal match
fails). So we match "foo\" to "foo\" in pathspecs, but not via
gitignore.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-10-29 09:17:21 -07:00
Eric W. Biederman
f711f37b05 t1016-compatObjectFormat: really freeze time for reproduciblity
The strategy in t1016-compatObjectFormat is to build two trees with
identical commits, one tree encoded in sha1 the other tree encoded
in sha256 and to use the compatibility code to test and see if
the two trees are identical.

GPG signatures include the current time as part of the signature.

To make gpg deterministic I forced the use of gpg --faked-system-time.
Unfortunately I did not look closely enough.

By default gpg still allows time to move forward with --faked-system-time.
So in those rare instances when the system is heavily loaded and gpg runs
slower than other times, signatures over the exact same data differ
due to timestamps with a minuscule difference.

Reading through the gpg documentation with a close eye, time can be
frozen by including an exclamation point at the end of the argument to
--faked-system-time.

Add the exclamation point so gpg really runs with a fixed notion of time,
resulting in the exact same data having identical gpg signatures.

That is enough that I can run "t1016-compatObjectFormat.sh --stress"
and I don't see any failures.

It is possible a future change to gpg will make replay protection more
robust and not provide a way to allow two separate runs of gpg to
produce exactly the same signature for exactly the same data.  If that
happens a deeper comparison of the two repositories will need to be
performed.  A comparison that simply verifies the signatures and
compares the data for equality.  For now that is a lot of work
for no gain so I am just documenting the possibility.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-10-28 20:10:15 -07:00
Ruoyu Zhong
bb42dc9710 bisect: update usage and docs to match each other
Update the usage string of `git bisect` and documentation to match each
other. While at it, also:

1. Move the synopsis of `git bisect` subcommands to the synopsis
   section, so that the test `t0450-txt-doc-vs-help.sh` can pass.

2. Document the `git bisect next` subcommand, which exists in the code
   but is missing from the documentation.

See also: [1].

[1]: https://lore.kernel.org/git/3DA38465-7636-4EEF-B074-53E4628F5355@gmail.com/

Suggested-by: Ben Knoble <ben.knoble@gmail.com>
Signed-off-by: Ruoyu Zhong <zhongruoyu@outlook.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-10-28 15:41:42 -07:00
Junio C Hamano
3deb97fe24 Merge branch 'cc/fast-import-strip-signed-tags'
"git fast-import" is taught to handle signed tags, just like it
recently learned to handle signed commits, in different ways.

* cc/fast-import-strip-signed-tags:
  fast-import: add '--signed-tags=<mode>' option
  fast-export: handle all kinds of tag signatures
  t9350: properly count annotated tags
  lib-gpg: allow tests with GPGSM or GPGSSH prereq first
  doc: git-tag: stop focusing on GPG signed tags
2025-10-28 10:29:09 -07:00
Junio C Hamano
54ac3809c3 Merge branch 'ds/sparse-checkout-clean'
"git sparse-checkout" subcommand learned a new "clean" action to
prune otherwise unused working-tree files that are outside the
areas of interest.

* ds/sparse-checkout-clean:
  sparse-index: improve advice message instructions
  t: expand tests around sparse merges and clean
  sparse-index: point users to new 'clean' action
  sparse-checkout: add --verbose option to 'clean'
  dir: add generic "walk all files" helper
  sparse-checkout: match some 'clean' behavior
  sparse-checkout: add basics of 'clean' command
  sparse-checkout: remove use of the_repository
2025-10-28 10:29:09 -07:00
Patrick Steinhardt
a4265572bb t7900: fix a flaky test due to git-repack always regenerating MIDX
When a supposedly no-op "git repack" runs across a second boundary,
because the command always touches the MIDX file and updates its
timestamp, "ls -l $GIT_DIR/objects/pack/" before and after the
operation can change, which causes such a test to fail.  Only
compare the *.pack files in the directory before and after the
operation to work around this flakyness.

Arguably, git-repack(1) should learn to not rewrite the MIDX in case
we know it is already up-to-date. But this is not a new problem
introduced via the new geometric maintenance task, so for now it
should be good enough to paper over the issue.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
[jc: taken from diff to v4 from v3 that was already merged to 'next']
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-10-27 13:57:17 -07:00
Junio C Hamano
b42b995d22 Merge branch 'so/t2401-use-test-path-helpers' into maint-2.51
Test modernization.

* so/t2401-use-test-path-helpers:
  t2401: update path checks using test_path helpers
2025-10-26 19:48:21 -07:00
Junio C Hamano
ed931ebe18 Merge branch 'ps/t7528-ssh-agent-uds-workaround' into maint-2.51
Recent OpenSSH creates the Unix domain socket to communicate with
ssh-agent under $HOME instead of /tmp, which causes our test to
fail doe to overly long pathname in our test environment, which has
been worked around by using "ssh-agent -T".

* ps/t7528-ssh-agent-uds-workaround:
  t7528: work around ETOOMANY in OpenSSH 10.1 and newer
2025-10-26 19:48:20 -07:00
Junio C Hamano
3d638cb389 Merge branch 'jk/status-z-short-fix' into maint-2.51
The "--short" option of "git status" that meant output for humans
and "-z" option to show NUL delimited output format did not mix
well, and colored some but not all things.  The command has been
updated to color all elements consistently in such a case.

* jk/status-z-short-fix:
  status: make coloring of "-z --short" consistent
2025-10-26 19:48:19 -07:00
Junio C Hamano
2319fbae48 Merge branch 'jk/diff-no-index-with-pathspec-fix' into maint-2.51
An earlier addition to "git diff --no-index A B" to limit the
output with pathspec after the two directories misbehaved when
these directories were given with a trailing slash, which has been
corrected.

* jk/diff-no-index-with-pathspec-fix:
  diff --no-index: fix logic for paths ending in '/'
2025-10-26 19:48:19 -07:00
Junio C Hamano
e56c419347 Merge branch 'jk/diff-from-contents-fix' into maint-2.51
Recently we attempted to improve "git diff -w" and friends to
handle cases where patch output would be suppressed, but it
introduced a bug that emits unnecessary output, which has been
corrected.

* jk/diff-from-contents-fix:
  diff: restore redirection to /dev/null for diff_from_contents
2025-10-26 19:48:18 -07:00
René Scharfe
e56f6dcd7b add-patch: quit on EOF
If we reach the end of the input, e.g. because the user pressed ctrl-D
on Linux, there is no point in showing any more prompts, as we won't get
any reply.  Do the same as option 'q' would: Quit.

Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-10-26 16:34:39 -07:00
Jeff King
1940a02dc1 match_pathname(): give fnmatch one char of prefix context
In match_pathname(), which we use for matching .gitignore and
.gitattribute patterns, we are comparing paths with fnmatch patterns
(actually our extended wildmatch, which will be important).  There's an
extra optimization there: we pre-compute the number of non-wildcard
characters at the beginning of the pattern and do an fspathncmp() on
that prefix.

That lets us avoid fnmatch entirely on patterns without wildcards, and
shrinks the amount of work we hand off to fnmatch. For a pattern like
"foo*.txt" and a path "foobar.txt", we'd cut away the matching "foo"
prefix and just pass "*.txt" and "bar.txt" to fnmatch().

But this misses a subtle corner case. In fnmatch(), we'll think
"bar.txt" is the start of the path, but it's not. This doesn't matter
for the pattern above, but consider the wildmatch pattern "foo**/bar"
and the path "foobar". These two should not match, because there is no
file named "bar", and the "**" applies only to the containing directory
name. But after removing the "foo" prefix, fnmatch will get "**/bar" and
"bar", which it does consider a match, because "**/" can match zero
directories.

We can solve this by giving fnmatch a bit more context. As long as it
has one byte of the matched prefix, then it will know that "bar" is not
the start of the path. In this example it would get "o**/bar" and
"obar", and realize that they cannot match.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-10-26 16:32:43 -07:00
Junio C Hamano
52b56e8b79 Merge branch 'ps/t7528-ssh-agent-uds-workaround'
Recent OpenSSH creates the Unix domain socket to communicate with
ssh-agent under $HOME instead of /tmp, which causes our test to
fail doe to overly long pathname in our test environment, which has
been worked around by using "ssh-agent -T".

* ps/t7528-ssh-agent-uds-workaround:
  t7528: work around ETOOMANY in OpenSSH 10.1 and newer
2025-10-24 13:48:05 -07:00
Junio C Hamano
7d763b98ef Merge branch 'rs/add-patch-document-p-for-pager'
Show 'P'ipe command in "git add -p".

* rs/add-patch-document-p-for-pager:
  add-patch: fully document option P
2025-10-24 13:48:05 -07:00
Junio C Hamano
78bf9ce0d1 Merge branch 'jc/t1016-setup-fix'
GPG signing test set-up has been broken for a year, which has been
corrected.

* jc/t1016-setup-fix:
  t1016: make sure to use specified GPG
2025-10-24 13:48:05 -07:00
Junio C Hamano
e7909b3a90 Merge branch 'jk/status-z-short-fix'
The "--short" option of "git status" that meant output for humans
and "-z" option to show NUL delimited output format did not mix
well, and colored some but not all things.  The command has been
updated to color all elements consistently in such a case.

* jk/status-z-short-fix:
  status: make coloring of "-z --short" consistent
2025-10-24 13:48:04 -07:00
Junio C Hamano
385772e183 Merge branch 'js/t7500-pwd-windows-fix'
Test fix.

* js/t7500-pwd-windows-fix:
  t7500: fix tests with absolute path following ":(optional)" on Windows
2025-10-24 13:48:04 -07:00
Patrick Steinhardt
d9bccf2ec3 builtin/maintenance: introduce "geometric" strategy
We have two different repacking strategies in Git:

  - The "gc" strategy uses git-gc(1).

  - The "incremental" strategy uses multi-pack indices and `git
    multi-pack-index repack` to merge together smaller packfiles as
    determined by a specific batch size.

The former strategy is our old and trusted default, whereas the latter
has historically been used for our scheduled maintenance. But both
strategies have their shortcomings:

  - The "gc" strategy performs regular all-into-one repacks. Furthermore
    it is rather inflexible, as it is not easily possible for a user to
    enable or disable specific subtasks.

  - The "incremental" strategy is not a full replacement for the "gc"
    strategy as it doesn't know to prune stale data.

So today, we don't have a strategy that is well-suited for large repos
while being a full replacement for the "gc" strategy.

Introduce a new "geometric" strategy that aims to fill this gap. This
strategy invokes all the usual cleanup tasks that git-gc(1) does like
pruning reflogs and rerere caches as well as stale worktrees. But where
it differs from both the "gc" and "incremental" strategy is that it uses
our geometric repacking infrastructure exposed by git-repack(1) to
repack packfiles. The advantage of geometric repacking is that we only
need to perform an all-into-one repack when the object count in a repo
has grown significantly.

One downside of this strategy is that pruning of unreferenced objects is
not going to happen regularly anymore. Every geometric repack knows to
soak up all loose objects regardless of their reachability, and merging
two or more packs doesn't consider reachability, either. Consequently,
the number of unreachable objects will grow over time.

This is remedied by doing an all-into-one repack instead of a geometric
repack whenever we determine that the geometric repack would end up
merging all packfiles anyway. This all-into-one repack then performs our
usual reachability checks and writes unreachable objects into a cruft
pack. As cruft packs won't ever be merged during geometric repacks we
can thus phase out these objects over time.

Of course, this still means that we retain unreachable objects for far
longer than with the "gc" strategy. But the maintenance strategy is
intended especially for large repositories, where the basic assumption
is that the set of unreachable objects will be significantly dwarfed by
the number of reachable objects.

If this assumption is ever proven to be too disadvantageous we could for
example introduce a time-based strategy: if the largest packfile has not
been touched for longer than $T, we perform an all-into-one repack. But
for now, such a mechanism is deferred into the future as it is not clear
yet whether it is needed in the first place.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Acked-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-10-24 13:42:45 -07:00
Patrick Steinhardt
40a7415833 builtin/maintenance: make "gc" strategy accessible
While the user can pick the "incremental" maintenance strategy, it is
not possible to explicitly use the "gc" strategy. This has two
downsides:

  - It is impossible to use the default "gc" strategy for a specific
    repository when the strategy was globally set to a different strategy.

  - It is not possible to use git-gc(1) for scheduled maintenance.

Address these issues by making making the "gc" strategy configurable.
Furthermore, extend the strategy so that git-gc(1) runs for both manual
and scheduled maintenance.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Acked-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-10-24 13:42:44 -07:00
Patrick Steinhardt
0e994d9f38 builtin/maintenance: extend "maintenance.strategy" to manual maintenance
The "maintenance.strategy" configuration allows users to configure how
Git is supposed to perform repository maintenance. The idea is that we
provide a set of high-level strategies that may be useful in different
contexts, like for example when handling a large monorepo. Furthermore,
the strategy can be tweaked by the user by overriding specific tasks.

In its current form though, the strategy only applies to scheduled
maintenance. This creates something of a gap, as scheduled and manual
maintenance will now use _different_ strategies as the latter would
continue to use git-gc(1) by default. This makes the strategies way less
useful than they could be on the one hand. But even more importantly,
the two different strategies might clash with one another, where one of
the strategies performs maintenance in such a way that it discards
benefits from the other strategy.

So ideally, it should be possible to pick one strategy that then applies
globally to all the different ways that we perform maintenance. This
doesn't necessarily mean that the strategy always does the _same_ thing
for every maintenance type. But it means that the strategy can configure
the different types to work in tandem with each other.

Change the meaning of "maintenance.strategy" accordingly so that the
strategy is applied to both types, manual and scheduled. As preceding
commits have introduced logic to run maintenance tasks depending on this
type we can tweak strategies so that they perform those tasks depending
on the context.

Note that this raises the question of backwards compatibility: when the
user has configured the "incremental" strategy we would have ignored
that strategy beforehand. Instead, repository maintenance would have
continued to use git-gc(1) by default.

But luckily, we can match that behaviour by:

  - Keeping all current tasks of the incremental strategy as
    `MAINTENANCE_TYPE_SCHEDULED`. This ensures that those tasks will not
    run during manual maintenance.

  - Configuring the "gc" task so that it is invoked during manual
    maintenance.

Like this, the user shouldn't observe any difference in behaviour.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Acked-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-10-24 13:42:44 -07:00
Patrick Steinhardt
d465be2327 builtin/maintenance: don't silently ignore invalid strategy
When parsing maintenance strategies we completely ignore the
user-configured value in case it is unknown to us. This makes it
basically undiscoverable to the user that scheduled maintenance is
devolving into a no-op.

Change this to instead die when seeing an unknown maintenance strategy.
While at it, pull out the parsing logic into a separate function so that
we can reuse it in a subsequent commit.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Acked-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-10-24 13:42:43 -07:00
Patrick Steinhardt
5c2ad50193 builtin/maintenance: make the geometric factor configurable
The geometric repacking task uses a factor of two for its geometric
sequence, meaning that each next pack must contain at least twice as
many objects as the next-smaller one. In some cases it may be helpful to
configure this factor though to reduce the number of packfile merges
even further, e.g. in very big repositories. But while git-repack(1)
itself supports doing this, the maintenance task does not give us a way
to tune it.

Introduce a new "maintenance.geometric-repack.splitFactor" configuration
to plug this gap.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Acked-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-10-24 13:42:43 -07:00
Patrick Steinhardt
9bc151850c builtin/maintenance: introduce "geometric-repack" task
Introduce a new "geometric-repack" task. This task uses our geometric
repack infrastructure as provided by git-repack(1) itself, which is a
strategy that especially hosting providers tend to use to amortize the
costs of repacking objects.

There is one issue though with geometric repacks, namely that they
unconditionally pack all loose objects, regardless of whether or not
they are reachable. This is done because it means that we can completely
skip the reachability step, which significantly speeds up the operation.
But it has the big downside that we are unable to expire objects over
time.

To address this issue we thus use a split strategy in this new task:
whenever a geometric repack would merge together all packs, we instead
do an all-into-one repack. By default, these all-into-one repacks have
cruft packs enabled, so unreachable objects would now be written into
their own pack. Consequently, they won't be soaked up during geometric
repacking anymore and can be expired with the next full repack, assuming
that their expiry date has surpassed.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Acked-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-10-24 13:42:43 -07:00
Jeff King
57c2b6cc86 diff: send external diff output to diff_options.file
Diff output usually goes to the process stdout, but it can be redirected
with the "--output" option. We store this in the "file" pointer of
diff_options, and all of the diff code should write there instead of to
stdout.

But there's one spot we missed: running an external diff cmd. We don't
redirect its output at all, so it just defaults to the stdout of the
parent process. We should instead point its stdout at our output file.
There are a few caveats to watch out for when doing so:

  - The stdout field takes a descriptor, not a FILE pointer. We can pull
    out the descriptor with fileno().

  - The run-command API always closes the stdout descriptor we pass to
    it. So we must duplicate it (otherwise we break the FILE pointer,
    since it now points to a closed descriptor).

  - We don't need to worry about closing our dup'd descriptor, since the
    point is that run-command will do it for us (even in the case of an
    error). But we do need to make sure we skip the dup() if we set
    no_stdout (because then run-command will not look at it at all).

  - When the output is going to stdout, it would not be wrong to dup()
    the descriptor, but we don't need to. We can skip that extra work
    with a simple pointer comparison.

  - It seems like you'd need to fflush() the descriptor before handing
    off a copy to the child process to prevent out-of-order writes. But
    that was true even before this patch! It works because run-command
    always calls fflush(NULL) before running the child.

The new test shows the breakage (and fix). The need for duplicating the
descriptor doesn't need a new test; that is covered by the later test
"GIT_EXTERNAL_DIFF with more than one changed files".

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-10-24 10:15:21 -07:00
René Scharfe
134ec330d2 commit-reach: avoid commit_list_insert_by_date()
Building a list using commit_list_insert_by_date() has quadratic worst
case complexity.  Avoid it by just appending in the loop and sorting at
the end.

The number of merge bases is usually small, so don't expect speedups in
normal repositories.  It has no limit, though.  The added perf test
shows a nice improvement when dealing with 16384 merge bases:

Test                     v2.51.1           HEAD
-----------------------------------------------------------------
6010.2: git merge-base   0.55(0.54+0.00)   0.03(0.02+0.00) -94.5%

Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-10-24 10:13:17 -07:00
Junio C Hamano
88b3704ab1 Merge branch 'jk/diff-from-contents-fix'
Recently we attempted to improve "git diff -w" and friends to
handle cases where patch output would be suppressed, but it
introduced a bug that emits unnecessary output, which has been
corrected.

* jk/diff-from-contents-fix:
  diff: restore redirection to /dev/null for diff_from_contents
2025-10-24 09:10:37 -07:00
Patrick Steinhardt
b7fb2194b9 t7528: work around ETOOMANY in OpenSSH 10.1 and newer
In t7528 we spawn an SSH agent to verify that we can sign a commit via
it. This test has started to fail on some machines:

    +++ ssh-agent
    unix_listener_tmp: path "/home/pks/Development/git/build/test-output/trash directory.t7528-signed-commit-ssh/.ssh/agent/s.UTulegefEg.agent.UrPHumMXPq" too long for Unix domain socket
    main: Couldn't prepare agent socket

As it turns out this is caused by a change in OpenSSH 10.1 [1]:

 * ssh-agent(1), sshd(8): move agent listener sockets from /tmp to
   under ~/.ssh/agent for both ssh-agent(1) and forwarded sockets
   in sshd(8).

Instead of creating the socket in "/tmp", OpenSSH now creates the socket
in our home directory. And as the home directory gets modified to be
located in our test output directory we end up with paths that are
somewhat long. But Linux has a rather short limit of 108 characters for
socket paths, and other systems have even lower limits, so it is very
easy now to exceed the limit and run into the above error.

Work around the issue by using `ssh-agent -T`, which instructs it to
use the old behaviour and create the socket in "/tmp" again. This switch
has only been introduced with 10.1 though, so for older versions we have
to fall back to not using it. That's fine though, as older versions know
to put the socket into "/tmp" already.

An alternative approach would be to abbreviate the socket name itself so
that we create it as e.g. "sshsock" in the trash directory. But taking
the above example we'd still end up with a path that is 91 characters
long. So we wouldn't really have a lot of headroom, and it is quite
likely that some developers would see the issue on their machines.

[1]: https://www.openssh.com/txt/release-10.1

Reported-by: Xi Ruoyao <xry111@xry111.site>
Suggested-by: brian m. carlson <sandals@crustytoothpaste.net>
Helped-by: Jeff King <peff@peff.net>
Helped-by: Lauri Tirkkonen <lauri@hacktheplanet.fi>
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-10-23 09:52:55 -07:00
Lidong Yan
3ed5d8bd73 diff: stop output garbled message in dry run mode
Earlier, b55e6d36 (diff: ensure consistent diff behavior with
ignore options, 2025-08-08) introduced "dry-run" mode to the
diff machinery so that content-based diff filtering (like
ignoring space changes or those that match -I<regex>) can first
try to produce a patch without emitting any output to see if
under the given diff filtering condition we would get any output
lines, and a new helper function diff_flush_patch_quietly() was
introduced to use the mode to see an individual filepair needs
to be shown.

However, the solution was not complete. When files are deleted,
file modes change, or there are unmerged entries in the index,
dry-run mode still produces output because we overlooked these
conditions, and as a result, dry-run mode was not quiet.

To fix this, return early in emit_diff_symbol_from_struct() if
we are in dry-run mode. This function will be called by all the
emit functions to output the results. Returning early can avoid
diff output when files are deleted or file modes are changed.
Stop print message in dry-run mode if we have unmerged entries
in index. Discard output of external diff tool in dry-run mode.

Signed-off-by: Lidong Yan <yldhome2d2@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-10-23 09:06:52 -07:00
Junio C Hamano
45b5ae65e8 Merge branch 'jk/diff-from-contents-fix' into ly/diff-name-only-with-diff-from-content
* jk/diff-from-contents-fix:
  diff: restore redirection to /dev/null for diff_from_contents
2025-10-22 12:58:50 -07:00