79512 Commits

Author SHA1 Message Date
Junio C Hamano
b43a84d22e Merge branch 'ps/read-object-info-improvements' into jch
The object-info API has been cleaned up.

Comments?

* ps/read-object-info-improvements:
  packfile: drop repository parameter from `packed_object_info()`
  packfile: skip unpacking object header for disk size requests
  packfile: disentangle return value of `packed_object_info()`
  packfile: always populate pack-specific info when reading object info
  packfile: extend `is_delta` field to allow for "unknown" state
  packfile: always declare object info to be OI_PACKED
  object-file: always set OI_LOOSE when reading object info
2026-01-08 16:40:37 +09:00
Junio C Hamano
53b44d9bcf Merge branch 'en/fsck-snapshot-ref-state' into jch
"git fsck" used inconsistent set of refs to show a confused
warning, which has been corrected.

* en/fsck-snapshot-ref-state:
  fsck: snapshot default refs before object walk
2026-01-08 16:40:37 +09:00
Junio C Hamano
17a70947dd Merge branch 'tt/receive-pack-oo-namespace-symref-fix' into jch
"git receive-pack", when namespace is involved, segfaulted when a
symbolic ref cross the namespace boundary.

Comments?

* tt/receive-pack-oo-namespace-symref-fix:
  receive-pack: fix crash on out-of-namespace symref
2026-01-08 16:40:37 +09:00
Junio C Hamano
44961b090e Merge branch 'sb/doc-worktree-prune-expire-improvement' into jch
The help text and the documentation for the "--expire" option of
"git worktree [list|prune]" have been improved.

* sb/doc-worktree-prune-expire-improvement:
  worktree: use 'prune' instead of 'expire' in help text
  worktree: clarify --expire applies to missing worktrees
2026-01-08 16:40:36 +09:00
Junio C Hamano
ae4ce794e1 Merge branch 'js/symlink-windows' into jch
Upstream symbolic link support on Windows from Git-for-Windows.

* js/symlink-windows:
  mingw: special-case index entries for symlinks with buggy size
  mingw: emulate `stat()` a little more faithfully
  mingw: try to create symlinks without elevated permissions
  mingw: add support for symlinks to directories
  mingw: implement basic `symlink()` functionality (file symlinks only)
  mingw: implement `readlink()`
  mingw: allow `mingw_chdir()` to change to symlink-resolved directories
  mingw: support renaming symlinks
  mingw: handle symlinks to directories in `mingw_unlink()`
  mingw: add symlink-specific error codes
  mingw: change default of `core.symlinks` to false
  mingw: factor out the retry logic
  mingw: compute the correct size for symlinks in `mingw_lstat()`
  mingw: teach dirent about symlinks
  mingw: let `mingw_lstat()` error early upon problems with reparse points
  mingw: drop the separate `do_lstat()` function
  mingw: implement `stat()` with symlink support
  mingw: don't call `GetFileAttributes()` twice in `mingw_lstat()`
2026-01-08 16:40:36 +09:00
Junio C Hamano
9c835711cd Merge branch 'js/prep-symlink-windows' into jch
Further preparation to upstream symbolic link support on Windows.

* js/prep-symlink-windows:
  trim_last_path_component(): avoid hard-coding the directory separator
  strbuf_readlink(): support link targets that exceed PATH_MAX
  strbuf_readlink(): avoid calling `readlink()` twice in corner-cases
  init: do parse _all_ core.* settings early
  mingw: do resolve symlinks in `getcwd()`
2026-01-08 16:40:36 +09:00
Junio C Hamano
571c2317fc Merge branch 'ap/http-probe-rpc-use-auth' into jch
* ap/http-probe-rpc-use-auth:
  remote-curl: Use auth for probe_rpc() requests too
2026-01-08 16:40:36 +09:00
Junio C Hamano
7bfa6bbf46 Merge branch 'sb/doc-update-ref-markup-fix' into jch
Doc mark-up fix.

Will merget to 'next'.

* sb/doc-update-ref-markup-fix:
  doc: fix `update-ref` `symref-create` formatting
2026-01-08 16:40:35 +09:00
Junio C Hamano
14e9ef1b89 Merge branch 'yc/histogram-hunk-shift-fix' into jch
The final clean-up phase of the diff output could turn the result of
histogram diff algorithm suboptimal, which has been corrected.

Comments?

* yc/histogram-hunk-shift-fix:
  xdiff: re-diff shifted change groups when using histogram algorithm
2026-01-08 16:40:35 +09:00
Junio C Hamano
db3876ce47 Merge branch 'jk/parse-int' into jch
Introduce a more robust way to parse a decimal integer stored in a
piece of memory that is not necessarily terminated with NUL (which
Asan strict-string-check complains even when use of strtol() is
safe due to varified existence of whitespace after the digits).

* jk/parse-int:
  fsck: use parse_unsigned_from_buf() for parsing timestamp
  cache-tree: use parse_int_from_buf()
  parse: add functions for parsing from non-string buffers
  parse: prefer bool to int for boolean returns
2026-01-08 16:40:35 +09:00
Junio C Hamano
31274bdd17 Merge branch 'tc/last-modified-options-cleanup' into jch
The "-z" and "--max-depth" documentation (and implementation of
"-z") in the "git last-modified" command have been updated.

* tc/last-modified-options-cleanup:
  fixup! last-modified: document option --max-depth
  last-modified: document how depth is handled better
  last-modified: document option --max-depth
  last-modified: handle and document NUL termination
2026-01-08 16:40:34 +09:00
Junio C Hamano
61f3f7b770 Merge branch 'kn/ref-location' into jch
A mechanism to specify what reference backend to use and store
references in which directory is introduced, which would likely to
be useful during ref migration.

Comments?

* kn/ref-location:
  refs: add GIT_REF_URI to specify reference backend and directory
  refs: support obtaining ref_store for given dir
2026-01-08 16:40:34 +09:00
Junio C Hamano
45d4b513c3 Merge branch 'wm/complete-git-short-opts' into jch
The command line completion script (in contrib/) learned to
complete "git -<TAB>" to give single-letter options like "-C".

* wm/complete-git-short-opts:
  completion: complete "git -<TAB>" with short options
2026-01-08 16:40:34 +09:00
Junio C Hamano
446369869b Merge branch 'ar/submodule-gitdir-tweak' into jch
Avoid local submodule repository directory paths overlapping with
each other by encoding submodule names before using them as path
components.

Comments?

* ar/submodule-gitdir-tweak:
  submodule: detect conflicts with existing gitdir configs
  submodule: hash the submodule name for the gitdir path
  submodule: fix case-folding gitdir filesystem collisions
  submodule--helper: fix filesystem collisions by encoding gitdir paths
  builtin/credential-store: move is_rfc3986_unreserved to url.[ch]
  submodule--helper: add gitdir migration command
  submodule: allow runtime enabling extensions.submodulePathConfig
  submodule: introduce extensions.submodulePathConfig
  builtin/submodule--helper: add gitdir command
  submodule: always validate gitdirs inside submodule_name_to_gitdir
  submodule--helper: use submodule_name_to_gitdir in add_submodule
2026-01-08 16:40:33 +09:00
Junio C Hamano
11e5dec2ba Merge branch 'ps/packfile-store-in-odb-source' into jch
The packfile_store data structure is moved from object store to odb
source.

* ps/packfile-store-in-odb-source:
  packfile: move MIDX into packfile store
  packfile: refactor `find_pack_entry()` to work on the packfile store
  packfile: inline `find_kept_pack_entry()`
  packfile: only prepare owning store in `packfile_store_prepare()`
  packfile: only prepare owning store in `packfile_store_get_packs()`
  packfile: move packfile store into object source
  packfile: refactor misleading code when unusing pack windows
  packfile: refactor kept-pack cache to work with packfile stores
  packfile: pass source to `prepare_pack()`
  packfile: create store via its owning source
2026-01-08 16:40:33 +09:00
Junio C Hamano
e26fe3f7d2 ### match next 2026-01-08 16:40:32 +09:00
Junio C Hamano
6d0ac2147d Merge branch 'ps/clar-integers' into jch
Import newer version of "clar", unit testing framework.

* ps/clar-integers:
  gitattributes: disable blank-at-eof errors for clar test expectations
  t/unit-tests: demonstrate use of integer comparison assertions
  t/unit-tests: update clar to 39f11fe
2026-01-08 16:40:32 +09:00
Junio C Hamano
6856aa4803 Merge branch 'kh/replay-invalid-onto-advance' into jch
Test coverage of "git replay" has been improved.

* kh/replay-invalid-onto-advance:
  t3650: add more regression tests for failure conditions
  replay: die if we cannot parse object
  replay: improve code comment and die message
  replay: die descriptively when invalid commit-ish is given
  replay: find *onto only after testing for ref name
  replay: remove dead code and rearrange
2026-01-08 16:40:32 +09:00
Junio C Hamano
3d198e2b10 Merge branch 'ps/odb-misc-fixes' into jch
Miscellaneous fixes on object database layer.

* ps/odb-misc-fixes:
  odb: properly close sources before freeing them
  builtin/gc: fix condition for whether to write commit graphs
2026-01-08 16:40:31 +09:00
Junio C Hamano
f9c04f1e97 Merge branch 'pt/t7800-difftool-test-racefix' into jch
Test fixup.

* pt/t7800-difftool-test-racefix:
  t7800: fix racy "difftool --dir-diff syncs worktree" test
2026-01-08 16:40:31 +09:00
Junio C Hamano
fb9120bfa7 Merge branch 'ps/t1300-2021-use-test-path-is-helpers' into jch
Test updates.

* ps/t1300-2021-use-test-path-is-helpers:
  t1300: use test helpers instead of `test` command
2026-01-08 16:40:31 +09:00
Junio C Hamano
6d7996535d Merge branch 'rs/commit-stack' into jch
Code clean-up, unifying various hand-rolled "list of commit
objects" and use the commit_stack API.

* rs/commit-stack:
  commit-reach: use commit_stack
  commit-graph: use commit_stack
  commit: add commit_stack_grow()
  shallow: use commit_stack
  pack-bitmap-write: use commit_stack
  commit: add commit_stack_init()
  test-reach: use commit_stack
  remote: use commit_stack for src_commits
  remote: use commit_stack for sent_tips
  remote: use commit_stack for local_commits
  name-rev: use commit_stack
  midx: use commit_stack
  log: use commit_stack
  revision: export commit_stack
2026-01-08 16:40:31 +09:00
Junio C Hamano
96ba419a76 Merge branch 'sb/bundle-uri-without-uri' into jch
Diagnose invalid bundle-URI that lack the URI entry, instead of
crashing.

* sb/bundle-uri-without-uri:
  bundle-uri: validate that bundle entries have a uri
2026-01-08 16:40:31 +09:00
Junio C Hamano
e2124d6fe0 Merge branch 'ja/doc-synopsis-style-more' into jch
More doc style updates.

* ja/doc-synopsis-style-more:
  doc: convert git-remote to synopsis style
  doc: convert git stage to use synopsis block
  doc: convert git-status tables to AsciiDoc format
  doc: convert git-status to synopsis style
  doc: fix t0450-txt-doc-vs-help to select only first synopsis block
2026-01-08 16:40:30 +09:00
Junio C Hamano
d529f3a197 The 16th batch
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-01-08 16:40:12 +09:00
Junio C Hamano
2db806d817 Merge branch 'en/ort-recursive-d-f-conflict-fix'
The ort merge machinery hit an assertion failure in a history with
criss-cross merges renamed a directory and a non-directory, which
has been corrected.

* en/ort-recursive-d-f-conflict-fix:
  merge-ort: fix corner case recursive submodule/directory conflict handling
2026-01-08 16:40:12 +09:00
Junio C Hamano
512351f2a8 Merge branch 'dd/t5403-modernise'
Test micro-clean-up.

* dd/t5403-modernise:
  t5403: use test_path_is_file instead of test -f
2026-01-08 16:40:12 +09:00
Junio C Hamano
c0754dc423 Merge branch 'ds/diff-lazy-fetch-with-name-only-fix'
Running "git diff" with "--name-only" and other options that allows
us not to look at the blob contents, while objects that are lazily
fetched from a promisor remote, caused use-after-free, which has
been corrected.

* ds/diff-lazy-fetch-with-name-only-fix:
  diff: avoid segfault with freed entries
2026-01-08 16:40:11 +09:00
Junio C Hamano
d28d2be5f2 Merge branch 'rs/tag-wo-the-repository'
Code clean-up.

* rs/tag-wo-the-repository:
  tag: stop using the_repository
  tag: support arbitrary repositories in parse_tag()
  tag: support arbitrary repositories in gpg_verify_tag()
  tag: use algo of repo parameter in parse_tag_buffer()
2026-01-08 16:40:11 +09:00
Adrian Ratiu
36d43bef82 submodule: detect conflicts with existing gitdir configs
Credit goes to Emily and Josh for testing and noticing a corner-case
which caused conflicts with existing gitdir configs to silently pass
validation, then fail later in add_submodule() with a cryptic error:

fatal: A git directory for 'nested%2fsub' is found locally with remote(s):
  origin	/.../trash directory.t7425-submodule-gitdir-path-extension/sub

This change ensures the validation step checks existing gitdirs for
conflicts. We only have to do this for submodules having gitdirs,
because those without submodule.%s.gitdir need to be migrated and
will throw an error earlier in the submodule codepath.

Quoting Josh:
 My testing setup has been as follows:
 * Using our locally-built Git with our downstream patch of [1] included:
   * create a repo "sub"
   * create a repo "super"
   * In "super":
     * mkdir nested
     * git submodule add ../sub nested/sub
     * Verify that the submodule's gitdir is .git/modules/nested%2fsub
 * Using a build of git from upstream `next` plus this series:
   * git config set --global extensions.submodulepathconfig true
   * git clone --recurse-submodules super super2
   * create a repo "nested%2fsub"
   * In "super2":
     * git submodule add ../nested%2fsub

At this point I'd expect the collision detection / encoding to take
effect, but instead I get the error listed above.
End quote

Suggested-by: Josh Steadmon <steadmon@google.com>
Signed-off-by: Adrian Ratiu <adrian.ratiu@collabora.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-01-08 11:05:16 +09:00
Adrian Ratiu
a612f856ff submodule: hash the submodule name for the gitdir path
If none of the previous plain-text / encoding / derivation steps work
and case 2.4 is reached, then try a hash of the submodule name to see
if that can be a valid gitdir before giving up and throwing an error.

This is a "last resort" type of measure to avoid conflicts since it
loses the human readability of the gitdir path. This logic will be
reached in rare cases, as can be seen in the test we added.

Signed-off-by: Adrian Ratiu <adrian.ratiu@collabora.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-01-08 11:05:16 +09:00
Adrian Ratiu
1bb1906270 submodule: fix case-folding gitdir filesystem collisions
Add a new check when extension.submodulePathConfig is enabled, to
detect and prevent case-folding filesystem colisions. When this
new check is triggered, a stricter casefolding aware URI encoding
is used to percent-encode uppercase characters.

By using this check/retry mechanism the uppercase encoding is
only applied when necessary, so case-sensitive filesystems are
not affected.

Signed-off-by: Adrian Ratiu <adrian.ratiu@collabora.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-01-08 11:05:16 +09:00
Adrian Ratiu
ee437749ff submodule--helper: fix filesystem collisions by encoding gitdir paths
Fix nested filesystem collisions by url-encoding gitdir paths stored
in submodule.%s.gitdir, when extensions.submodulePathConfig is enabled.

Credit goes to Junio and Patrick for coming up with this design: the
encoding is only applied when necessary, to newly added submodules.

Existing modules don't need the encoding because git already errors
out when detecting nested gitdirs before this patch.

This commit adds the basic url-encoding and some tests. Next commits
extend the encode -> validate -> retry loop to fix more conflicts.

Suggested-by: Junio C Hamano <gitster@pobox.com>
Suggested-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Adrian Ratiu <adrian.ratiu@collabora.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-01-08 11:05:15 +09:00
Adrian Ratiu
7621825c43 builtin/credential-store: move is_rfc3986_unreserved to url.[ch]
is_rfc3986_unreserved() was moved to credential-store.c and was made
static by f89854362c (credential-store: move related functions to
credential-store file, 2023-06-06) under a correct assumption, at the
time, that it was the only place using it.

However now we need it to apply URL-encoding to submodule names when
constructing gitdir paths, to avoid conflicts, so bring it back as a
public function exposed via url.h, instead of the old helper path
(strbuf), which has nothing to do with 3986 encoding/decoding anymore.

This function will be used in subsequent commits which do the encoding.

Signed-off-by: Adrian Ratiu <adrian.ratiu@collabora.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-01-08 11:05:15 +09:00
Adrian Ratiu
d5a4b9b73a submodule--helper: add gitdir migration command
Manually running
"git config submodule.<name>.gitdir .git/modules/<name>"
for each submodule can be impractical, so add a migration command to
submodule--helper to automatically create configs for all submodules
as required by extensions.submodulePathConfig.

The command calls create_default_gitdir_config() which validates the
gitdir paths before adding the configs.

Suggested-by: Junio C Hamano <gitster@pobox.com>
Suggested-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Adrian Ratiu <adrian.ratiu@collabora.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-01-08 11:05:15 +09:00
Adrian Ratiu
4cf8c114e3 submodule: allow runtime enabling extensions.submodulePathConfig
Add a new config `init.defaultSubmodulePathConfig` which allows
enabling `extensions.submodulePathConfig` for new submodules by
default (those created via git init or clone).

Important: setting init.defaultSubmodulePathConfig = true does
not globally enable `extensions.submodulePathConfig`. Existing
repositories will still have the extension disabled and will
require migration (for example via git submodule--helper command
added in the next commit).

Suggested-by: Patrick Steinhardt <ps@pks.im>
Suggested-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Adrian Ratiu <adrian.ratiu@collabora.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-01-08 11:05:15 +09:00
Adrian Ratiu
7ad97f4bea submodule: introduce extensions.submodulePathConfig
The idea of this extension is to abstract away the submodule gitdir
path implementation: everyone is expected to use the config and not
worry about how the path is computed internally, either in git or
other implementations.

With this extension enabled, the submodule.<name>.gitdir repo config
becomes the single source of truth for all submodule gitdir paths.

The submodule.<name>.gitdir config is added automatically for all new
submodules when this extension is enabled.

Git will throw an error if the extension is enabled and a config is
missing, advising users how to migrate. Migration is manual for now.

E.g. to add a missing config entry for an existing "foo" module:
git config submodule.foo.gitdir .git/modules/foo

Suggested-by: Junio C Hamano <gitster@pobox.com>
Suggested-by: Phillip Wood <phillip.wood123@gmail.com>
Suggested-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Adrian Ratiu <adrian.ratiu@collabora.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-01-08 11:05:15 +09:00
Adrian Ratiu
376c94b167 builtin/submodule--helper: add gitdir command
This exposes the gitdir name computed by submodule_name_to_gitdir()
internally, to make it easier for users and tests to interact with it.

Next commit will add a gitdir configuration, so this helper can also be
used to easily query that config or validate any gitdir path the user
sets (submodule_name_to_git_dir now runs the validation logic, since
our previous commit).

Based-on-patch-by: Brandon Williams <bwilliams.eng@gmail.com>
Signed-off-by: Adrian Ratiu <adrian.ratiu@collabora.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-01-08 11:05:15 +09:00
Adrian Ratiu
a05c78d224 submodule: always validate gitdirs inside submodule_name_to_gitdir
Move the ad-hoc validation checks sprinkled across the source tree,
after calling submodule_name_to_gitdir() into the function proper,
which now always validates the gitdir before returning it.

This simplifies the API and helps to:
1. Avoid redundant validation calls after submodule_name_to_gitdir().
2. Avoid the risk of callers forgetting to validate.
3. Ensure gitdir paths provided by users via configs are always valid
   (config gitdir paths are added in a subsequent commit).

The validation function can still be called as many times as needed
outside submodule_name_to_gitdir(), for example we keep two calls
which are still required, to avoid parallel clone races by re-running
the validation in builtin/submodule-helper.c.

Signed-off-by: Adrian Ratiu <adrian.ratiu@collabora.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-01-08 11:05:14 +09:00
Adrian Ratiu
aac1ec71fd submodule--helper: use submodule_name_to_gitdir in add_submodule
While testing submodule gitdir path encoding, I noticed submodule--helper
is still using a hardcoded modules gitdir path leading to test failures.

Call the submodule_name_to_gitdir() helper instead, which was invented
exactly for this purpose and is already used by all the other locations
which work on gitdirs.

Also narrow the scope of the submod_gitdir_path variable which is not
used anymore in the updated "else" branch.

Signed-off-by: Adrian Ratiu <adrian.ratiu@collabora.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-01-08 11:05:14 +09:00
Patrick Steinhardt
57592cba24 packfile: drop repository parameter from packed_object_info()
The function `packed_object_info()` takes a packfile and offset and
returns the object info for the corresponding object. Despite these two
parameters though it also takes a repository pointer. This is redundant
information though, as `struct packed_git` already has a repository
pointer that is always populated.

Drop the redundant parameter.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-01-08 11:04:23 +09:00
Patrick Steinhardt
ca3b4933e9 packfile: skip unpacking object header for disk size requests
While most of the object info requests for a packed object require us to
unpack its headers, reading its disk size doesn't. We still unpack the
object header in that case though, which is unnecessary work.

Skip reading the header if only the disk size is requested. This leads
to a small speedup when reading disk size, only. The following benchmark
was done in the Git repository:

    Benchmark 1: ./git rev-list --disk-usage HEAD (rev = HEAD~)
      Time (mean ± σ):     105.2 ms ±   0.6 ms    [User: 91.4 ms, System: 13.3 ms]
      Range (min … max):   103.7 ms … 106.0 ms    27 runs

    Benchmark 2: ./git rev-list --disk-usage HEAD (rev = HEAD)
      Time (mean ± σ):      96.7 ms ±   0.4 ms    [User: 86.2 ms, System: 10.0 ms]
      Range (min … max):    96.2 ms …  98.1 ms    30 runs

    Summary
      ./git rev-list --disk-usage HEAD (rev = HEAD) ran
        1.09 ± 0.01 times faster than ./git rev-list --disk-usage HEAD (rev = HEAD~)

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-01-08 11:04:23 +09:00
Patrick Steinhardt
9c82ba6c32 packfile: disentangle return value of packed_object_info()
The `packed_object_info()` function returns the type of the packed
object. While we use an `enum object_type` to store the return value,
this type is not to be confused with the actual object type. It _may_
contain the object type, but it may just as well encode that the given
packed object is stored as a delta.

We have removed the only caller that relied on this returned object type
in the preceding commit, so let's simplify semantics and return either 0
on success or a negative error code otherwise.

This unblocks a small optimization where we can skip reading the object
type altogether.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-01-08 11:04:23 +09:00
Patrick Steinhardt
03d894e23c packfile: always populate pack-specific info when reading object info
When reading object information via `packed_object_info()` we may not
populate the object info's packfile-specific fields. This leads to
inconsistent object info depending on whether the info was populated via
`packfile_store_read_object_info()` or `packed_object_info()`.

Fix this inconsistency so that we can always assume the pack info to be
populated when reading object info from a pack.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-01-08 11:04:22 +09:00
Patrick Steinhardt
56be11f501 packfile: extend is_delta field to allow for "unknown" state
The `struct object_info::u::packed::is_delta` field determines whether
or not a specific object is stored as a delta. It only stores whether or
not the object is stored as delta, so it is treated as a boolean value.

This boolean is insufficient though: when reading a packed object via
`packfile_store_read_object_info()` we know to skip parsing the actual
object when the user didn't request any object-specific data. In that
case we won't read the object itself, but will only look up its position
in the packfile. Consequently, we do not know whether it is a delta or
not.

This isn't really an issue right now, as the check for an empty request
is broken. But a subsequent commit will fix it, and once we do we will
have the need to also represent an "unknown" delta state.

Prepare for this change by introducing a new enum that encodes the
object type. We don't use the "unknown" state just yet, but will start
to do so in a subsequent commit.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-01-08 11:04:22 +09:00
Patrick Steinhardt
0ff0f991df packfile: always declare object info to be OI_PACKED
When reading object info via a packfile we yield one of two types:

  - The object can either be OI_PACKED, which is what a caller would
    typically expect.

  - Or it can be OI_DBCACHED if it is stored in the delta base cache.

The latter really is an implementation detail though, and callers
typically don't care at all about the difference. Furthermore, the
information whether or not it is part of the delta base cache can
already be derived via the `is_delta` field, so the fact that we discern
between OI_PACKED and OI_DBCACHED only further complicates the
interface.

There aren't all that many callers that care about the `whence` field in
the first place. In fact, there's only three:

  - `packfile_store_read_object_info()` checks for `whence == OI_PACKED`
    and then populates the packfile information of the object info
    structure. We now start to do this also for deltified objects, which
    gives its callers strictly more information.

  - `repack_local_links()` wants to determine whether the object is part
    of a promisor pack and checks for `whence == OI_PACKED`. If so, it
    verifies that the packfile is a promisor pack. It's arguably wrong
    to declare that an object is not part of a promisor pack only
    because it is stored in the delta base cache.

  - `is_not_in_promisor_pack_obj()` does the same, but checks that a
    specific object is _not_ part of a promisor pack. The same reasoning
    as above applies.

Drop the OI_DBCACHED enum completely. None of the callers seem to care
about the distinction.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-01-08 11:04:22 +09:00
Patrick Steinhardt
b3b89b691c object-file: always set OI_LOOSE when reading object info
There are some early returns in `odb_source_loose_read_object_info()`
in cases where we don't have to open the loose object. These return
paths do not set `struct object_info::whence` to `OI_LOOSE` though, so
it becomes impossible for the caller to tell the format of such an
object.

The root cause of this really is that we have so many different return
paths in the function. As a consequence, it's harder than necessary to
make sure that all successful exit paths sot up the `whence` field as
expected.

Address this by refactoring the function to have a single exit path.
Like this, we can trivially set up the `whence` field when we exit
successfully from the function.

Note that we also:

  - Rename `status` to `ret` to match our usual coding style, but also
    to show that the old `status` variable is now always getting the
    expected value. Furthermore, the value is not initialized anymore,
    which has the consequence that most compilers will warn for exit
    paths where we forgot to set it.

  - Move the setup of scratch pointers closer to `parse_loose_header()`
    to show where it's needed.

  - Guard a couple of variables on cleanup so that they only get
    released in case they have been set up.

  - Reset `oi->delta_base_oid` towards the end of the function, together
    with all the other object info pointers.

Overall, all these changes result in a diff that is somewhat hard to
read. But the end result is significantly easier to read and reason
about, so I'd argue this one-time churn is worth it.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-01-08 11:04:22 +09:00
Elijah Newren
1df7743798 fsck: snapshot default refs before object walk
Fsck has a race when operating on live repositories; consider the
following simple script that writes new commits as fsck runs:

    #!/bin/bash
    git fsck &
    PID=$!

    while ps -p $PID >/dev/null; do
        sleep 3
        git commit -q --allow-empty -m "Another commit"
    done

Since fsck walks objects for connectivity and then reads the refs at the
end to check, this can cause fsck to get confused and think that the new
refs refer to missing commits and that new reflog entries are invalid.
Running the above script in a clone of git.git results in the following
(output ellipsized to remove additional errors of the same type):

    $ ./fsck-while-writing.sh
    Checking ref database: 100% (1/1), done.
    Checking object directories: 100% (256/256), done.
    warning in tag d6602ec5194c87b0fc87103ca4d67251c76f233a: missingTaggerEntry: invalid format - expected 'tagger' line
    Checking objects: 100% (835091/835091), done.
    error: HEAD: invalid reflog entry 2aac9f9286e2164fbf8e4f1d1df53044ace2b310
    error: HEAD: invalid reflog entry 2aac9f9286e2164fbf8e4f1d1df53044ace2b310
    error: HEAD: invalid reflog entry da0f5b80d61844a6f0ad2ddfd57e4fdfa246ea68
    error: HEAD: invalid reflog entry da0f5b80d61844a6f0ad2ddfd57e4fdfa246ea68
    [...]
    error: HEAD: invalid reflog entry 87c8a5c2f6b79d9afa9e941590b9a097b6f7ac09
    error: HEAD: invalid reflog entry d80887a48865e6ad165274b152cbbbed29f8a55a
    error: HEAD: invalid reflog entry d80887a48865e6ad165274b152cbbbed29f8a55a
    error: HEAD: invalid reflog entry 6724f2dfede88bfa9445a333e06e78536c0c6c0d
    error: refs/heads/mybranch invalid reflog entry 2aac9f9286e2164fbf8e4f1d1df53044ace2b310
    error: refs/heads/mybranch: invalid reflog entry 2aac9f9286e2164fbf8e4f1d1df53044ace2b310
    error: refs/heads/mybranch: invalid reflog entry da0f5b80d61844a6f0ad2ddfd57e4fdfa246ea68
    error: refs/heads/mybranch: invalid reflog entry da0f5b80d61844a6f0ad2ddfd57e4fdfa246ea68
    [...]
    error: refs/heads/mybranch: invalid reflog entry 87c8a5c2f6b79d9afa9e941590b9a097b6f7ac09
    error: refs/heads/mybranch: invalid reflog entry d80887a48865e6ad165274b152cbbbed29f8a55a
    error: refs/heads/mybranch: invalid reflog entry d80887a48865e6ad165274b152cbbbed29f8a55a
    error: refs/heads/mybranch: invalid reflog entry 6724f2dfede88bfa9445a333e06e78536c0c6c0d
    Checking connectivity: 833846, done.
    missing commit 6724f2dfede88bfa9445a333e06e78536c0c6c0d
    Verifying commits in commit graph: 100% (242243/242243), done.

We can minimize the race opportunities by taking a snapshot of refs at
program invocation, doing the connectivity check, and then checking the
snapshotted refs afterward.  This avoids races with regular refs between
fsck and adding objects to the database, though it still leaves a race
between a gc and fsck.  We are less concerned about folks simultaneously
running gc with fsck; though, if it becomes an issue, we could lock fsck
during gc.  We definitely do not want to lock fsck during operations
that may add objects to the object store; that would be problematic for
forges.

Note that refs aren't the only problem, though; reflog entries and index
entries could be problematic as well.  For now we punt on index entries
just leaving a TODO comment, and for reflogs we use a coarse solution of
taking the time at the beginning of the program and ignoring reflog
entries newer than that time.  That may be imperfect if dealing with a
network filesystem, so we leave TODO comment for those that want to
improve that handling as well.

As a high level overview:
  * In addition to fsck_handle_ref(), which now is only a few lines long
    to process a ref, there's also a snapshot_ref() which is called
    early in the program for each ref and takes all the error checking
    logic.
  * The iterating over refs that used to be in get_default_heads() plus
    a loop over the arguments now appears in shapshot_refs().
  * There's a new process_refs() as well that kind of looks like the old
    get_default_heads() though it is streamlined due to the work done by
    snapshot_refs().

This combination of changes modifies the output of running the script
(from the beginning of this commit message) to:

    $ ./fsck-while-writing.sh
    Checking ref database: 100% (1/1), done.
    Checking object directories: 100% (256/256), done.
    warning in tag d6602ec5194c87b0fc87103ca4d67251c76f233a: missingTaggerEntry: invalid format - expected 'tagger' line
    Checking objects: 100% (835091/835091), done.
    Checking connectivity: 833846, done.
    Verifying commits in commit graph: 100% (242243/242243), done.

While worries about live updates while running fsck is likely of most
interest for forge operators, it may also benefit those with
automated jobs (such as git maintenance) or even casual users who want
to do other work in their clone while fsck is running.

Originally-based-on-a-patch-by: Matthew John Cheetham <mjcheetham@outlook.com>
Helped-by: Junio C Hamano <gitster@pobox.com>
Helped-by: Jeff King <peff@peff.net>
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-01-07 11:37:34 +09:00
Patrick Steinhardt
65405829a8 packfile: move MIDX into packfile store
The multi-pack index still is tracked as a member of the object database
source, but ultimately the MIDX is always tied to one specific packfile
store.

Move the structure into `struct packfile_store` accordingly. This
ensures that the packfile store now keeps track of all data related to
packfiles.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-01-07 09:37:43 +09:00
Patrick Steinhardt
4cd60cf483 packfile: refactor find_pack_entry() to work on the packfile store
The function `find_pack_entry()` doesn't work on a specific packfile
store, but instead works on the whole repository. This causes a bit of a
conceptual mismatch in its callers:

  - `packfile_store_freshen_object()` supposedly acts on a store, and
    its callers know to iterate through all sources already.

  - `packfile_store_read_object_info()` behaves likewise.

The only exception that doesn't know to handle iteration through sources
is `has_object_pack()`, but that function is trivial to adapt.

Refactor the code so that `find_pack_entry()` works on the packfile
store level instead.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-01-07 09:37:43 +09:00