More object database related information are shown in "git repo
structure" output.
* jt/repo-struct-more-objinfo:
builtin/repo: add object disk size info to structure table
builtin/repo: add disk size info to keyvalue stucture output
builtin/repo: add inflated object info to structure table
builtin/repo: add inflated object info to keyvalue structure output
builtin/repo: humanise count values in structure output
strbuf: split out logic to humanise byte values
builtin/repo: group per-type object values into struct
Building a list using commit_list_insert_by_date() has quadratic worst
case complexity. Avoid it by using prio_queue.
Use prio_queue_peek()+prio_queue_replace() instead of prio_queue_get()+
prio_queue_put() if possible, as the former only rebalance the
prio_queue heap once instead of twice.
In sane repositories this won't make much of a difference because the
number of items in the list or queue won't be very high:
Benchmark 1: ./git_v2.52.0 show-branch origin/main origin/next origin/seen origin/todo
Time (mean ± σ): 538.2 ms ± 0.8 ms [User: 527.6 ms, System: 9.6 ms]
Range (min … max): 537.0 ms … 539.2 ms 10 runs
Benchmark 2: ./git show-branch origin/main origin/next origin/seen origin/todo
Time (mean ± σ): 530.6 ms ± 0.4 ms [User: 519.8 ms, System: 9.8 ms]
Range (min … max): 530.1 ms … 531.3 ms 10 runs
Summary
./git show-branch origin/main origin/next origin/seen origin/todo ran
1.01 ± 0.00 times faster than ./git_v2.52.0 show-branch origin/main origin/next origin/seen origin/todo
That number is not limited, though, and in pathological cases like the
one in p6010 we see a sizable improvement:
Test v2.52.0 HEAD
------------------------------------------------------------------
6010.4: git show-branch 2.19(2.19+0.00) 0.03(0.02+0.00) -98.6%
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We received a report that invoking "git restore -source my_base_branch"
resulted in the confusing error message "fatal: could not resolve
ource". This looked like a typo in our error message, but it is
actually because "-source" is missing its second dash and is being
resolved as "-s ource". However, due to the lack of the quoting
recommended in CodingGuidelines, this is confusing to the reader and
we can do better.
Add the necessary quoting to this message. With this change, we now get
this less confusing message:
fatal: could not resolve 'ource'
Reported-by: Zhelyo Zhelev <zhelyo@gmail.com>
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git fetch" that involves fetching tags, when a tag being fetched
needs to overwrite existing one, failed to fetch other tags, which
has been corrected.
* kn/fix-fetch-backfill-tag-with-batched-ref-updates:
fetch: fix failed batched updates skipping operations
fetch: fix non-conflicting tags not being committed
fetch: extract out reference committing logic
MEMZERO_ARRAY() helper is introduced to avoid clearing only the
first N bytes of an N-element array whose elements are larger than
a byte.
* tc/memzero-array:
contrib/coccinelle: pass include paths to spatch(1)
git-compat-util: introduce MEMZERO_ARRAY() macro
"git submodule add" to add a submodule under <name> segfaulted,
when a submodule.<name>.something is already in .gitmodules file
without defining where its submodule.<name>.path is, which has been
corrected.
* jc/submodule-add:
submodule add: sanity check existing .gitmodules
"git replay --onto=<commit> ...", when <commit> is mistyped,
started to segfault with recent change, which has been corrected.
* rs/replay-wrong-onto-fix:
replay: move onto NULL check before first use
"git replay" documentation updates.
* kh/doc-replay-updates:
doc: replay: link section using markup
replay: improve --contained and add to doc
doc: replay: mention no output on conflicts
Similar to a prior commit, update the table output format for the
git-repo(1) structure command to display the total object disk usage by
object type.
Signed-off-by: Justin Tobler <jltobler@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Similar to a prior commit, extend the keyvalue and nul output formats of
the git-repo(1) structure command to additionally provide info regarding
total object disk sizes by object type.
Signed-off-by: Justin Tobler <jltobler@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Update the table output format for the git-repo(1) structure command to
begin printing the total inflated object size info by object type. To be
more human-friendly, larger values are scaled down and displayed with
the appropriate unit prefix. Output for the keyvalue and nul formats
remains unchanged.
Signed-off-by: Justin Tobler <jltobler@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The structure subcommand for git-repo(1) outputs basic count information
for objects and references. Extend this output to also provide
information regarding total size of inflated objects by object type.
For now, object size by object type info is only added to the keyvalue
and nul output formats. In a subsequent commit, this info is also added
to the table format.
Signed-off-by: Justin Tobler <jltobler@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The table output format for the git-repo(1) structure subcommand is used
by default and intended to provide output to users in a human-friendly
manner. When the reference/object count values in a repository are
large, it becomes more cumbersome for users to read the values.
For larger values, update the table output format to instead produce
more human-friendly count values that are scaled down with the
appropriate unit prefix. Output for the keyvalue and nul formats remains
unchanged.
Signed-off-by: Justin Tobler <jltobler@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The `object_stats` structure stores object counts by type. In a
subsequent commit, additional per-type object measurements will also be
stored. Group per-type object values into a new struct to allow better
reuse.
Signed-off-by: Justin Tobler <jltobler@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The "git_istream" abstraction has been revamped to make it easier
to interface with pluggable object database design.
* ps/object-read-stream:
streaming: drop redundant type and size pointers
streaming: move into object database subsystem
streaming: refactor interface to be object-database-centric
streaming: move logic to read packed objects streams into backend
streaming: move logic to read loose objects streams into backend
streaming: make the `odb_read_stream` definition public
streaming: get rid of `the_repository`
streaming: rely on object sources to create object stream
packfile: introduce function to read object info from a store
streaming: move zlib stream into backends
streaming: create structure for filtered object streams
streaming: create structure for packed object streams
streaming: create structure for loose object streams
streaming: create structure for in-core object streams
streaming: allocate stream inside the backend-specific logic
streaming: explicitly pass packfile info when streaming a packed object
streaming: propagate final object type via the stream
streaming: drop the `open()` callback function
streaming: rename `git_istream` into `odb_read_stream`
"git repo struct" learned to take "-z" as a synonym to "--format=nul".
* lo/repo-struct-z:
repo: add -z as an alias for --format=nul to git-repo-structure
repo: use [--format=... | -z] instead of [-z] in git-repo-info synopsis
repo: remove blank line from Documentation/git-repo.adoc
A help message from "git branch" now mentions "git help" instead of
"man" when suggesting to read some documentation.
* kh/advise-w-git-help-in-branch:
branch: advice using git-help(1) instead of man(1)
"git last-modified" used to mishandle "--" to mark the beginning of
pathspec, which has been corrected.
* js/last-modified-with-sparse-checkouts:
last-modified: support sparse checkouts
Recent optimization to "last-modified" command introduced use of
uninitialized block of memory, which has been corrected.
* tc/last-modified-active-paths-optimization:
last-modified: fix use of uninitialized memory
There is no documentation for `--contained`.
Start by copying the text from `replay_options` in `builtin/
replay.c`. But some people think that the existing text is a
bit unclear; what does it mean for a branch to be contained
in a revision range? Let’s include the implied commits here:
the branches that point at commits in the range.
Also use “update” instead of “advance”. “Update” is the verb
commonly used in this context.
Helped-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Helped-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Kristoffer Haugsbakk <code@khaugsbakk.name>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Unless there are good reasons, it is customary to have the options[]
array used with the parse-options API declared in function scope rather
than at file scope.
Move builtin/pull.c:cmd_pull()’s options[] array into the function to
match that convention.
Signed-off-by: K Jayatheerth <jayatheerthkulkarni2005@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
cmd_replay() aborts if the pointer "onto" is NULL after argument
parsing, e.g. when specifying a non-existing commit with --onto.
15cd4ef1f4 (replay: make atomic ref updates the default behavior,
2025-11-06) added code that dereferences this pointer before the check.
Switch their places to avoid a segmentation fault.
Reported-by: Kristoffer Haugsbakk <kristofferhaugsbakk@fastmail.com>
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Introduce a new macro MEMZERO_ARRAY() that zeroes the memory allocated
by ALLOC_ARRAY() and friends. And add coccinelle rule to enforce the use
of this macro.
Signed-off-by: Toon Claes <toon@iotcl.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Fix a regression introduced with batched updates in 0e358de64a (fetch:
use batched reference updates, 2025-05-19) when fetching references. In
the `do_fetch()` function, we jump to cleanup if committing the
transaction fails, regardless of whether using batched or atomic
updates. This skips three subsequent operations:
- Update 'FETCH_HEAD' as part of `commit_fetch_head()`.
- Add upstream tracking information via `set_upstream()`.
- Setting remote 'HEAD' values when `do_set_head` is true.
For atomic updates, this is expected behavior. For batched updates,
we want to continue with these operations even if some refs fail to
update.
Skipping `commit_fetch_head()` isn't actually a regression because
'FETCH_HEAD' is already updated via `append_fetch_head()` when not
using '--atomic'. However, we add a test to validate this behavior.
Skipping the other two operations (upstream tracking and remote HEAD)
is a regression. Fix this by only jumping to cleanup when using
'--atomic', allowing batched updates to continue with post-fetch
operations. Add tests to prevent future regressions.
Helped-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The commit 0e358de64a (fetch: use batched reference updates, 2025-05-19)
updated the 'git-fetch(1)' command to use batched updates. This batches
updates to gain performance improvements. When fetching references, each
update is added to the transaction. Finally, when committing, individual
updates are allowed to fail with reason, while the transaction itself
succeeds.
One scenario which was missed here, was fetching tags. When fetching
conflicting tags, the `fetch_and_consume_refs()` function returns '1',
which skipped committing the transaction and directly jumped to the
cleanup section. This mean that no updates were applied. This also
extends to backfilling tags which is done when fetching specific
refspecs which contains tags in their history.
Fix this by committing the transaction when we have an error code and
not using an atomic transaction. This ensures other references are
applied even when some updates fail.
The cleanup section is reached with `retcode` set in several scenarios:
- `truncate_fetch_head()`, `open_fetch_head()` and `prune_refs()` set
`retcode` before the transaction is created, so no commit is
attempted.
- `fetch_and_consume_refs()` and `backfill_tags()` are the primary
cases this fix targets, both setting a positive `retcode` to
trigger the committing of the transaction.
This simplifies error handling and ensures future modifications to
`do_fetch()` don't need special handling for batched updates.
Add tests to check for this regression. While here, add a missing
cleanup from previous test.
Reported-by: David Bohman <debohman@gmail.com>
Helped-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The use of "revision" (a connected set of commits) has been
clarified in the "git replay" documentation.
* en/replay-doc-revision-range:
Documentation/git-replay.adoc: fix errors around revision range
"git replay" forgot to omit the "gpgsig-sha256" extended header
from the resulting commit the same way it omits "gpgsig", which has
been corrected.
* pw/replay-exclude-gpgsig-fix:
replay: do not copy "gpgsign-sha256" header
The error message given by "git config set", when the variable
being updated has more than one values defined, used old style "git
config" syntax with an incorrect option in its hint, both of which
have been corrected.
* rs/config-set-multi-error-message-fix:
config: fix suggestion for failed set of multi-valued option
The option help text given by "git config unset -h" described
the "--all" option to "replace", not "unset", multiple variables,
which has been corrected.
* rs/config-unset-opthelp-fix:
config: fix short help of unset flags
Code refactoring around object database sources.
* ps/object-source-management:
odb: handle recreation of quarantine directories
odb: handle changing a repository's commondir
chdir-notify: add function to unregister listeners
odb: handle initialization of sources in `odb_new()`
http-push: stop setting up `the_repository` for each reference
t/helper: stop setting up `the_repository` repeatedly
builtin/index-pack: fix deferred fsck outside repos
oidset: introduce `oidset_equal()`
odb: move logic to disable ref updates into repo
odb: refactor `odb_clear()` to `odb_free()`
odb: adopt logic to close object databases
setup: convert `set_git_dir()` to have file scope
path: move `enter_repo()` into "setup.c"
"git config get --path" segfaulted on an ":(optional)path" that
does not exist, which has been corrected.
* jc/optional-path:
config: really treat missing optional path as not configured
config: really pretend missing :(optional) value is not there
config: mark otherwise unused function as file-scope static
Other Git commands that have nul-terminated output, such as git-config,
git-status, git-ls-files, and git-repo-info have a flag `-z` for using
the null character as the record separator.
Add the `-z` flag to git-repo-structure as an alias for `--format=nul`,
making it consistent with the behavior of the other commands.
Signed-off-by: Lucas Seiki Oshiro <lucasseikioshiro@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The flag -z is only an alias for --format=null and even though --format
and -z can be used together and repeated, only the last one is
considered.
Replace `[-z]` in the synopsis of git-repo-info by
`[--format=... | -z]`, expliciting that the use of one of those flags
replace the other.
Signed-off-by: Lucas Seiki Oshiro <lucasseikioshiro@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In a sparse checkout, a user might want to run `last-modified` on a
directory outside the worktree.
And even in non-sparse checkouts, a user might need to run that command
on a directory that does not exist in the worktree.
These use cases should be supported via the `--` separator between
revision and file arguments, which is even advertised in the
documentation. This patch fixes a tiny bug that prevents that from
working.
This fixes https://github.com/git-for-windows/git/issues/5978
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Acked-by: Derrick Stolee <stolee@gmail.com>
Acked-by: Toon Claes <toon@iotcl.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
8fbd903e (branch: advise about ref syntax rules, 2024-03-05) added
an advice about checking git-check-ref-format(1) for the ref syntax
rules. The advice uses man(1). But git(1) is a multi-platform tool and
man(1) may not be available on some platforms. It might also be slightly
jarring to see a suggestion for running a command which is not from
the Git suite.
Let’s instead use git-help(1) in order to stay inside the land of
git(1). This also means that `help.format` (for `man`, `html` or other
formats) will be used if set.
Also change to using single quotes (') to quote the command since that
is more conventional.
While here let’s also update the test to use `{SQ}`, which is more
readable and easier to edit.
Signed-off-by: Kristoffer Haugsbakk <code@khaugsbakk.name>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
git-last-modified(1) uses a scratch bitmap to keep track of paths that
have been changed between commits. To avoid reallocating a bitmap on
each call of process_parent(), the scratch bitmap is kept and reused.
Although, it seems an incorrect length is passed to memset(3).
`struct bitmap` uses `eword_t` to for internal storage. This type is
typedef'd to uint64_t. To fully zero the memory used by the bitmap,
multiply the length (saved in `struct bitmap::word_alloc`) by the size
of `eword_t`.
Reported-by: Anders Kaseorg <andersk@mit.edu>
Helped-by: Jeff King <peff@peff.net>
Signed-off-by: Toon Claes <toon@iotcl.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
There was significant confusion in the git-replay manual about what
constitutes a revision range. As noted in f302c1e4aa09 (revisions(7):
clarify that most commands take a single revision range, 2021-05-18):
Commands that are specifically designed to take two distinct ranges
(e.g. "git range-diff R1 R2" to compare two ranges) do exist, but they
are exceptions. Unless otherwise noted, all "git" commands that operate
on a set of commits work on a single revision range.
`git replay` is not an exception, but a few places in the manual were
written as though it were. These appear to have come in revisions to
the original series, between v3->v4 (see
https://lore.kernel.org/git/CAP8UFD3bpLrVW97DH7j=V9H2GsTSAkksC9L3QujQERFk_kLnZA@mail.gmail.com/
, "More than one <revision-range> can be passed") and between v6->v7
(https://lore.kernel.org/git/20231115143327.2441397-1-christian.couder@gmail.com/,
"Takes ranges of commits"), and I missed both of these revisions when
reviewing. Fix them now.
There was also a reference to the "Commit Limiting options below", but
this page has no such section of options; strike the misleading
reference.
It is worth noting that we are documenting existing behavior, rather
than optimal behavior. Junio has multiple times suggested introducing
alternative ways to walk revisions and use them in `git replay
--advance`, e.g. at
* https://lore.kernel.org/git/xmqqy1mqo6kv.fsf@gitster.g/
* https://lore.kernel.org/git/xmqq8rb3is8c.fsf@gitster.g/
* https://lore.kernel.org/git/xmqqtsydj2zk.fsf@gitster.g/ (item (2))
If/when we introduce some new revision walking flag that implements one
of these alternate types of revision walks, we can update the --advance
option and this manual appropriately.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git worktree list" attempts to show paths to worktrees while
aligning them, but miscounted display columns for the paths when
non-ASCII characters were involved, which has been corrected.
* pw/worktree-list-display-width-fix:
worktree list: quote paths
worktree list: fix column spacing
When "git replay" replays a commit it copies the extended headers
across from the original commit. However, if the original commit
was signed, we do not want to copy the header associated with the
signature is it wont be valid for the new commit. The code already
knows to avoid coping the "gpgsig" header but does not know to avoid
copying the "gpgsig-sha256" header. Add that header to the list of
exclusions to match what "git commit --amend" does.
Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Tools like `git filter-repo`[1] use `git fast-export` and
`git fast-import` to rewrite repository history. When rewriting
history using one such tool though, commit signatures might become
invalid because the commits they sign changed due to the changes
in the repository history made by the tool between the fast-export
and the fast-import steps.
Note that as far as signature handling goes:
* Since fast-export doesn't know what changes filter-repo may make
to the stream, it can't know whether the signatures will still be
valid.
* Since filter-repo doesn't know what history canonicalizations
fast-export performed (and it performs a few), it can't know whether
the signatures will still be valid.
* Therefore, fast-import is the only process in the pipeline that
can know whether a specified signature remains valid.
Having invalid signatures in a rewritten repository could be
confusing, so users rewritting history might prefer to simply
discard signatures that are invalid at the fast-import step.
For example a common use case is to rewrite only "recent" history.
While specifying commit ranges corresponding to "recent" commits
could work, users worry about getting it wrong and want to just
automatically rewrite everything, expecting older commit signatures
to be untouched.
To let them do that, let's add a new 'strip-if-invalid' mode to the
`--signed-commits=<mode>` option of `git fast-import`.
It would be interesting for the `--signed-tags=<mode>` option to
have this mode too, but we leave that for a future improvement.
It might also be possible for `git fast-export` to have such a mode
in its `--signed-commits=<mode>` and `--signed-tags=<mode>`
options, but the use cases for it are much less clear, so we also
leave that for possible future improvements.
For now let's just die() if 'strip-if-invalid' is passed to these
options where it hasn't been implemented yet.
[1]: https://github.com/newren/git-filter-repo
Helped-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When asked to perform object consistency checks via the `--fsck-objects`
flag we verify that each object part of the pack is valid. In general,
this check can even be performed outside of a Git repository: we don't
need an initialized object database as we simply read the object from
the packfile directly.
But there's one exception: a subset of the object checks may be deferred
to a later point in time. For now, this only concerns ".gitmodules" and
".gitattributes" files: whenever we see a tree referencing these files
we queue them for a deferred check. This is done because we need to do
some extra checks for those files to ensure that they are well-formed,
and these checks need to be done regardless of whether the corresponding
blobs are part of the packfile or not.
This works inside a repository, but unfortunately the logic leads to a
segfault when running outside of one. This is because we eventually call
`odb_read_object()`, which will crash because the object database has
not been initialized.
There's multiple options here:
- We could in theory create a purely in-memory database with only a
packfile store that contains the single packfile. We don't really
have the infrastructure for this yet though, and it would end up
being quite hacky.
- We could refuse to perform consistency checks outside of a
repository. But most of the checks work alright, so this would be a
regression.
- We can skip the finalizing consistency checks when running outside
of a repository. This is not as invasive as skipping all checks,
but it's not great to randomly skip a subset of tests, either.
None of these options really feel perfect. The first one would be the
obvious choice if easily possible.
There's another option though: instead of skipping the final object
checks, we can die if there are any queued object checks. With this
change we now die exactly if and only if we would have previously
segfaulted. Like this we ensure that objects that _may_ fail the
consistency checks won't be silently skipped, and at the same time we
give users a much better error message.
Refactor the code accordingly and add a test that would have triggered
the segfault. Note that we also move down the logic to add the packfile
to the store. There is no point doing this any earlier than right before
we execute `fsck_finish()`, and it ensures that the logic to set up and
perform the consistency check is self-contained.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git submodule add" tries to find if a submodule with the same name
already exists at a different path, by looking up an entry in the
.gitmodules file. If the entry in the file is incomplete, e.g.,
when the submodule.<name>.something variable is defined but there is
no definition of submodule.<name>.path variable, it accesses the
missing .path member of the submodule structure and triggers a
segfault.
A brief audit was done to make sure that the code does not assume
members other than those that are absolutely certain to exist: a
submodule obtained by submodule_from_name() should have .name
member, while a submodule obtained by submodule_from_path() should
also have .path as well as .name member, and we cannot assume
anything else. Luckily, the module_add() codepath was the only
problematic one. It is fairly recent code that comes from 1fa06ced
(submodule: prevent overwriting .gitmodules on path reuse,
2025-07-24).
A helper used by update_submodule() seems to assume that its call to
submodule_from_path() always yields a submodule object without a
failure, which seems to rely on the caller making sure it is the
case. Leave an assert() with a NEEDSWORK comment there for future
developers to make sure the assumption actually holds.
Signed-off-by: Junio C Hamano <gitster@pobox.com>