Further preparation to upstream symbolic link support on Windows.
* js/prep-symlink-windows:
trim_last_path_component(): avoid hard-coding the directory separator
strbuf_readlink(): support link targets that exceed 2*PATH_MAX
strbuf_readlink(): avoid calling `readlink()` twice in corner-cases
init: do parse _all_ core.* settings early
mingw: do resolve symlinks in `getcwd()`
The final clean-up phase of the diff output could turn the result of
histogram diff algorithm suboptimal, which has been corrected.
Comments?
* yc/histogram-hunk-shift-fix:
xdiff: re-diff shifted change groups when using histogram algorithm
Introduce a more robust way to parse a decimal integer stored in a
piece of memory that is not necessarily terminated with NUL (which
Asan strict-string-check complains even when use of strtol() is
safe due to varified existence of whitespace after the digits).
* jk/parse-int:
fsck: use parse_unsigned_from_buf() for parsing timestamp
cache-tree: use parse_int_from_buf()
parse: add functions for parsing from non-string buffers
parse: prefer bool to int for boolean returns
The "-z" and "--max-depth" documentation (and implementation of
"-z") in the "git last-modified" command have been updated.
* tc/last-modified-options-cleanup:
fixup! last-modified: document option --max-depth
last-modified: document how depth is handled better
last-modified: document option --max-depth
last-modified: handle and document NUL termination
A mechanism to specify what reference backend to use and store
references in which directory is introduced, which would likely to
be useful during ref migration.
Comments?
* kn/ref-location:
refs: add GIT_REF_URI to specify reference backend and directory
refs: support obtaining ref_store for given dir
The command line completion script (in contrib/) learned to
complete "git -<TAB>" to give single-letter options like "-C".
* wm/complete-git-short-opts:
completion: complete "git -<TAB>" with short options
Avoid local submodule repository directory paths overlapping with
each other by encoding submodule names before using them as path
components.
Comments?
* ar/submodule-gitdir-tweak:
submodule: detect conflicts with existing gitdir configs
submodule: hash the submodule name for the gitdir path
submodule: fix case-folding gitdir filesystem collisions
submodule--helper: fix filesystem collisions by encoding gitdir paths
builtin/credential-store: move is_rfc3986_unreserved to url.[ch]
submodule--helper: add gitdir migration command
submodule: allow runtime enabling extensions.submodulePathConfig
submodule: introduce extensions.submodulePathConfig
builtin/submodule--helper: add gitdir command
submodule: always validate gitdirs inside submodule_name_to_gitdir
submodule--helper: use submodule_name_to_gitdir in add_submodule
The packfile_store data structure is moved from object store to odb
source.
* ps/packfile-store-in-odb-source:
packfile: move MIDX into packfile store
packfile: refactor `find_pack_entry()` to work on the packfile store
packfile: inline `find_kept_pack_entry()`
packfile: only prepare owning store in `packfile_store_prepare()`
packfile: only prepare owning store in `packfile_store_get_packs()`
packfile: move packfile store into object source
packfile: refactor misleading code when unusing pack windows
packfile: refactor kept-pack cache to work with packfile stores
packfile: pass source to `prepare_pack()`
packfile: create store via its owning source
The split command in "git subtree" (in contrib/) has been taught to
deal better with rebased history.
* cs/rebased-subtree-split:
contrib/subtree: detect rewritten subtree commits
"git fsck" used inconsistent set of refs to show a confused
warning, which has been corrected.
* en/fsck-snapshot-ref-state:
fsck: snapshot default refs before object walk
"git patch-id" documentation updates.
* kh/doc-patch-id:
doc: patch-id: --verbatim locks in --stable
doc: patch-id: spell out the git-diff-tree(1) form
doc: patch-id: use definite article for the result
patch-id: use “patch ID” throughout
doc: patch-id: capitalize Git version
doc: patch-id: don’t use semicolon between bullet points
Update a FAQ entry on synching two separate repositories using the
"git stash export/import" recently introduced.
* bc/doc-stash-import-export:
gitfaq: document using stash import/export to sync working tree
Import newer version of "clar", unit testing framework.
* ps/clar-integers:
gitattributes: disable blank-at-eof errors for clar test expectations
t/unit-tests: demonstrate use of integer comparison assertions
t/unit-tests: update clar to 39f11fe
Improve the error message when a bad argument is given to the
`--onto` option of "git replay". Test coverage of "git replay" has
been improved.
* kh/replay-invalid-onto-advance:
t3650: add more regression tests for failure conditions
replay: die if we cannot parse object
replay: improve code comment and die message
replay: die descriptively when invalid commit-ish is given
replay: find *onto only after testing for ref name
replay: remove dead code and rearrange
Miscellaneous fixes on object database layer.
* ps/odb-misc-fixes:
odb: properly close sources before freeing them
builtin/gc: fix condition for whether to write commit graphs
Code clean-up, unifying various hand-rolled "list of commit
objects" and use the commit_stack API.
* rs/commit-stack:
commit-reach: use commit_stack
commit-graph: use commit_stack
commit: add commit_stack_grow()
shallow: use commit_stack
pack-bitmap-write: use commit_stack
commit: add commit_stack_init()
test-reach: use commit_stack
remote: use commit_stack for src_commits
remote: use commit_stack for sent_tips
remote: use commit_stack for local_commits
name-rev: use commit_stack
midx: use commit_stack
log: use commit_stack
revision: export commit_stack
Diagnose invalid bundle-URI that lack the URI entry, instead of
crashing.
* sb/bundle-uri-without-uri:
bundle-uri: validate that bundle entries have a uri
More doc style updates.
* ja/doc-synopsis-style-more:
doc: convert git-remote to synopsis style
doc: convert git stage to use synopsis block
doc: convert git-status tables to AsciiDoc format
doc: convert git-status to synopsis style
doc: fix t0450-txt-doc-vs-help to select only first synopsis block
git subtree split --prefix P
detects splits that are outside of path prefix `P` and prunes
them from history graph processing. This improves the performance
of repeated `split --rejoin` with many different prefixes.
Both before and after 83f9dad7d6 (contrib/subtree: fix split with
squashed subtrees, 2025-09-09), the pruning logic does not detect
**rebased** or **cherry-picked** git-subtree commits. If `split`
encounters any of these commits, the split output may have
incomplete history.
All commits authored by
git subtree merge [--squash] --prefix Q
have a first or second parent that has *only* subtree commits
as ancestors. When splitting a completely different path `P/`,
it is safe to ignore:
1. the merged tree
2. the subtree parent
3. *all* of that parent's ancestry, which applies only to
path `Q/` and not `P/`.
But this relationship no longer holds if the git-subtree commit
is rebased or otherwise reauthored. After a rebase, the former
git-subtree commit will have other unrelated commits as ancestors.
Ignoring these commits may exclude the history of `P/`,
leading to incomplete `subtree split` output.
The pruning logic relies solely on the `git-subtree-*:` trailers
to detect git-subtree commits, which it blindly accepts without
further validation. The split logic also takes its time about
being wrong: `cmd_split()` execs a `git show` for *every* commit
in the split range… twice. This is inefficient in a shell script.
Add a "reality check" to ignore rebased or rewritten commits:
* Rewrites of non-merge commits cannot be detected, so the new
detector no longer looks for them.
* Merges carry a `git-subtree-mainline:` trailer with the hash of
the **first parent**. If this hash differs, or if the "merge"
commit no longer has multiple parents, a rewrite has occurred.
To increase speed, package this logic in a new method,
`find_other_splits()`. Perform the check up-front by iterating
over a single `git log`. Add ignored subtrees to:
1. the `notree` cache, which excludes them from the `split` history
2. a `prune` negative refs list. The negative refs prevent
recursing into other subtrees. Since there are potentially a
*lot* of these, cache them on disk and use rev-list's
`--stdin` mode.
Reported-by: George <george@mail.dietrich.pub>
Signed-off-by: Colin Stagner <ask+git@howdoi.land>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Currently, this function hard-codes the directory separator as the
forward slash.
However, on Windows the backslash character is valid, too. And we want
to call this function in the upcoming support for symlinks on Windows
with the symlink targets (which naturally use the canonical directory
separator on Windows, which is _not_ the forward slash).
Prepare that function to be useful also in that context.
Signed-off-by: Karsten Blees <karsten.blees@gmail.com>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The `strbuf_readlink()` function refuses to read link targets that
exceed 2*PATH_MAX (even if a sufficient size was specified by the
caller).
The reason that that limit is 2*PATH_MAX instead of PATH_MAX is that
the symlink targets do not need to be normalized. After running
`ln -s a/../a/../a/../a/../b c`, the target of the symlink `c` will not
be normalized to `b` but instead be much longer. As such, symlink
targets' lengths can far exceed PATH_MAX.
They are frequently much longer than 2*PATH_MAX on Windows, which
actually supports paths up to 32,767 characters, but sets PATH_MAX to
260 for backwards compatibility. For full details, see
https://learn.microsoft.com/en-us/windows/win32/fileio/maximum-file-path-limitation
Let's just hard-code the limit used by `strbuf_readlink()` to 32,767 and
make it independent of the current platform's PATH_MAX.
Based-on-a-patch-by: Karsten Blees <karsten.blees@gmail.com>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The `strbuf_readlink()` function calls `readlink()`` twice if the hint
argument specifies the exact size of the link target (e.g. by passing
stat.st_size as returned by `lstat()`). This is necessary because
`readlink(..., hint) == hint` could mean that the buffer was too small.
Use `hint + 1` as buffer size to prevent this.
Signed-off-by: Karsten Blees <karsten.blees@gmail.com>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In Git for Windows, `has_symlinks` is set to 0 by default. Therefore, we
need to parse the config setting `core.symlinks` to know if it has been
set to `true`. In `git init`, we must do that before copying the
templates because they might contain symbolic links.
Even if the support for symbolic links on Windows has not made it to
upstream Git yet, we really should make sure that all the `core.*`
settings are parsed before proceeding, as they might very well change
the behavior of `git init` in a way the user intended.
This fixes https://github.com/git-for-windows/git/issues/3414
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
As pointed out in https://github.com/git-for-windows/git/issues/1676,
the `git rev-parse --is-inside-work-tree` command currently fails when
the current directory's path contains symbolic links.
The underlying reason for this bug is that `getcwd()` is supposed to
resolve symbolic links, but our `mingw_getcwd()` implementation did not.
We do have all the building blocks for that, though: the
`GetFinalPathByHandleW()` function will resolve symbolic links. However,
we only called that function if `GetLongPathNameW()` failed, for
historical reasons: the latter function was supported for a long time,
but the former API function was introduced only with Windows Vista, and
we used to support also Windows XP. With that support having been
dropped, we are free to call the symbolic link-resolving function right
away.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Fix a compiler warning (-Werror=analyzer-deref-before-check) due to
dereferencing the options pointer before NULL checking it.
In practice run_hooks_opt() is never called with a NULL opt struct,
so this just fixes the code to not trigger the warning anymore.
The NULL check is kept as-is because some future patches might end up
calling run_hooks_opt with a NULL opt struct, which is clearly a bug.
While at it, also fix the BUG message function name.
Reported-by: correctmost <cmlists@sent.com>
Suggested-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Adrian Ratiu <adrian.ratiu@collabora.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Fsck has a race when operating on live repositories; consider the
following simple script that writes new commits as fsck runs:
#!/bin/bash
git fsck &
PID=$!
while ps -p $PID >/dev/null; do
sleep 3
git commit -q --allow-empty -m "Another commit"
done
Since fsck walks objects for connectivity and then reads the refs at the
end to check, this can cause fsck to get confused and think that the new
refs refer to missing commits and that new reflog entries are invalid.
Running the above script in a clone of git.git results in the following
(output ellipsized to remove additional errors of the same type):
$ ./fsck-while-writing.sh
Checking ref database: 100% (1/1), done.
Checking object directories: 100% (256/256), done.
warning in tag d6602ec5194c87b0fc87103ca4d67251c76f233a: missingTaggerEntry: invalid format - expected 'tagger' line
Checking objects: 100% (835091/835091), done.
error: HEAD: invalid reflog entry 2aac9f9286e2164fbf8e4f1d1df53044ace2b310
error: HEAD: invalid reflog entry 2aac9f9286e2164fbf8e4f1d1df53044ace2b310
error: HEAD: invalid reflog entry da0f5b80d61844a6f0ad2ddfd57e4fdfa246ea68
error: HEAD: invalid reflog entry da0f5b80d61844a6f0ad2ddfd57e4fdfa246ea68
[...]
error: HEAD: invalid reflog entry 87c8a5c2f6b79d9afa9e941590b9a097b6f7ac09
error: HEAD: invalid reflog entry d80887a48865e6ad165274b152cbbbed29f8a55a
error: HEAD: invalid reflog entry d80887a48865e6ad165274b152cbbbed29f8a55a
error: HEAD: invalid reflog entry 6724f2dfede88bfa9445a333e06e78536c0c6c0d
error: refs/heads/mybranch invalid reflog entry 2aac9f9286e2164fbf8e4f1d1df53044ace2b310
error: refs/heads/mybranch: invalid reflog entry 2aac9f9286e2164fbf8e4f1d1df53044ace2b310
error: refs/heads/mybranch: invalid reflog entry da0f5b80d61844a6f0ad2ddfd57e4fdfa246ea68
error: refs/heads/mybranch: invalid reflog entry da0f5b80d61844a6f0ad2ddfd57e4fdfa246ea68
[...]
error: refs/heads/mybranch: invalid reflog entry 87c8a5c2f6b79d9afa9e941590b9a097b6f7ac09
error: refs/heads/mybranch: invalid reflog entry d80887a48865e6ad165274b152cbbbed29f8a55a
error: refs/heads/mybranch: invalid reflog entry d80887a48865e6ad165274b152cbbbed29f8a55a
error: refs/heads/mybranch: invalid reflog entry 6724f2dfede88bfa9445a333e06e78536c0c6c0d
Checking connectivity: 833846, done.
missing commit 6724f2dfede88bfa9445a333e06e78536c0c6c0d
Verifying commits in commit graph: 100% (242243/242243), done.
We can minimize the race opportunities by taking a snapshot of refs at
program invocation, doing the connectivity check, and then checking the
snapshotted refs afterward. This avoids races with regular refs between
fsck and adding objects to the database, though it still leaves a race
between a gc and fsck. We are less concerned about folks simultaneously
running gc with fsck; though, if it becomes an issue, we could lock fsck
during gc. We definitely do not want to lock fsck during operations
that may add objects to the object store; that would be problematic for
forges.
Note that refs aren't the only problem, though; reflog entries and index
entries could be problematic as well. For now we punt on index entries
just leaving a TODO comment, and for reflogs we use a coarse solution of
taking the time at the beginning of the program and ignoring reflog
entries newer than that time. That may be imperfect if dealing with a
network filesystem, so we leave TODO comment for those that want to
improve that handling as well.
As a high level overview:
* In addition to fsck_handle_ref(), which now is only a few lines long
to process a ref, there's also a snapshot_ref() which is called
early in the program for each ref and takes all the error checking
logic.
* The iterating over refs that used to be in get_default_heads() plus
a loop over the arguments now appears in shapshot_refs().
* There's a new process_refs() as well that kind of looks like the old
get_default_heads() though it is streamlined due to the work done by
snapshot_refs().
This combination of changes modifies the output of running the script
(from the beginning of this commit message) to:
$ ./fsck-while-writing.sh
Checking ref database: 100% (1/1), done.
Checking object directories: 100% (256/256), done.
warning in tag d6602ec5194c87b0fc87103ca4d67251c76f233a: missingTaggerEntry: invalid format - expected 'tagger' line
Checking objects: 100% (835091/835091), done.
Checking connectivity: 833846, done.
Verifying commits in commit graph: 100% (242243/242243), done.
While worries about live updates while running fsck is likely of most
interest for forge operators, it may also benefit those with
automated jobs (such as git maintenance) or even casual users who want
to do other work in their clone while fsck is running.
Helped-by: Junio C Hamano <gitster@pobox.com>
Helped-by: Jeff King <peff@peff.net>
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The multi-pack index still is tracked as a member of the object database
source, but ultimately the MIDX is always tied to one specific packfile
store.
Move the structure into `struct packfile_store` accordingly. This
ensures that the packfile store now keeps track of all data related to
packfiles.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The function `find_pack_entry()` doesn't work on a specific packfile
store, but instead works on the whole repository. This causes a bit of a
conceptual mismatch in its callers:
- `packfile_store_freshen_object()` supposedly acts on a store, and
its callers know to iterate through all sources already.
- `packfile_store_read_object_info()` behaves likewise.
The only exception that doesn't know to handle iteration through sources
is `has_object_pack()`, but that function is trivial to adapt.
Refactor the code so that `find_pack_entry()` works on the packfile
store level instead.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The `find_kept_pack_entry()` function is only used in
`has_object_kept_pack()`, which is only a trivial wrapper itself. Inline
the latter into the former.
Furthermore, reorder the code so that we can drop the declaration of the
function in "packfile.h". This allows us to make the function file-local.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When calling `packfile_store_prepare()` we prepare not only the provided
packfile store, but also all those of all other sources part of the same
object database. This was required when the store was still sitting on
the object database level. But now that it sits on the source level it's
not anymore.
Refactor the code so that we only prepare the single packfile store
passed by the caller. Adapt callers accordingly.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When calling `packfile_store_get_packs()` we prepare not only the
provided packfile store, but also all those of all other sources part of
the same object database. This was required when the store was still
sitting on the object database level. But now that it sits on the source
level it's not anymore.
Adapt the code so that we only prepare the MIDX of the provided store.
All callers only work in the context of a single store or call the
function in a loop over all sources, so this change shouldn't have any
practical effects.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The packfile store is a member of `struct object_database`, which means
that we have a single store per database. This doesn't really make much
sense though: each source connected to the database has its own set of
packfiles, so there is a conceptual mismatch here. This hasn't really
caused much of a problem in the past, but with the advent of pluggable
object databases this is becoming more of a problem because some of the
sources may not even use packfiles in the first place.
Move the packfile store down by one level from the object database into
the object database source. This ensures that each source now has its
own packfile store, and we can eventually start to abstract it away
entirely so that the caller doesn't even know what kind of store it
uses.
Note that we only need to adjust a relatively small number of callers,
way less than one might expect. This is because most callers are using
`repo_for_each_pack()`, which handles enumeration of all packfiles that
exist in the repository. So for now, none of these callers need to be
adapted. The remaining callers that iterate through the packfiles
directly and that need adjustment are those that are a bit more tangled
with packfiles. These will be adjusted over time.
Note that this patch only moves the packfile store, and there is still a
bunch of functions that seemingly operate on a packfile store but that
end up iterating over all sources. These will be adjusted in subsequent
commits.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The function `unuse_one_window()` is responsible for unmapping one of
the packfile windows, which is done when we have exceeded the allowed
number of window.
The function receives a `struct packed_git` as input, which serves as an
additional packfile that should be considered to be closed. If not
given, we seemingly skip that and instead go through all of the
repository's packfiles. The conditional that checks whether we have a
packfile though does not make much sense anymore, as we dereference the
packfile regardless of whether or not it is a `NULL` pointer to derive
the repository's packfile store.
The function was originally introduced via f0e17e86e1 (pack: move
release_pack_memory(), 2017-08-18), and here we indeed had a caller that
passed a `NULL` pointer. That caller was later removed via 9827d4c185
(packfile: drop release_pack_memory(), 2019-08-12), so starting with
that commit we always pass a `struct packed_git`. In 9c5ce06d74
(packfile: use `repository` from `packed_git` directly, 2024-12-03) we
then inadvertently started to rely on the fact that the pointer is never
`NULL` because we use it now to identify the repository.
Arguably, it didn't really make sense in the first place that the caller
provides a packfile, as the selected window would have been overridden
anyway by the subsequent loop over all packfiles if there was an older
window. So the overall logic is quite misleading overall. The only case
where it _could_ make a difference is when there were two packfiles with
the same `last_used` value, but that case doesn't ever happen because
the `pack_used_ctr` is strictly increasing.
Refactor the code so that we instead pass in the object database to
help make the code less misleading.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The kept pack cache is a cache of packfiles that are marked as kept
either via an accompanying ".kept" file or via an in-memory flag. The
cache can be retrieved via `kept_pack_cache()`, where one needs to pass
in a repository.
Ultimately though the kept-pack cache is a property of the packfile
store, and this causes problems in a subsequent commit where we want to
move down the packfile store to be a per-object-source entity.
Prepare for this and refactor the kept-pack cache to work on top of a
packfile store instead. While at it, rename both the function and flags
specific to the kept-pack cache so that they can be properly attributed
to the respective subsystems.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When preparing a packfile we pass various pieces attached to the pack's
object database source via the `struct prepare_pack_data`. Refactor this
code to instead pass in the source directly. This reduces the number of
variables we need to pass and allows for a subsequent refactoring where
we start to prepare the pack via the source.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>