23676 Commits

Author SHA1 Message Date
Patrick Steinhardt
e989dd96b8 odb: rename oid_object_info()
Rename `oid_object_info()` to `odb_read_object_info()` as well as their
`_extended()` variant to match other functions related to the object
database and our modern coding guidelines.

Introduce compatibility wrappers so that any in-flight topics will
continue to compile.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-07-01 14:46:37 -07:00
Patrick Steinhardt
c44185f6c1 odb: get rid of the_repository when handling alternates
The functions to manage alternates all depend on `the_repository`.
Refactor them to accept an object database as a parameter and adjust all
callers. The functions are renamed accordingly.

Note that right now the situation is still somewhat weird because we end
up using the object store path provided by the object store's repository
anyway. Consequently, we could have instead passed in a pointer to the
repository instead of passing in the pointer to the object store. This
will be addressed in subsequent commits though, where we will start to
use the path owned by the object store itself.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-07-01 14:46:36 -07:00
Patrick Steinhardt
8f49151763 object-store: rename files to "odb.{c,h}"
In the preceding commits we have renamed the structures contained in
"object-store.h" to `struct object_database` and `struct odb_backend`.
As such, the code files "object-store.{c,h}" are confusingly named now.
Rename them to "odb.{c,h}" accordingly.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-07-01 14:46:34 -07:00
Patrick Steinhardt
a1e2581a1e object-store: rename object_directory to odb_source
The `object_directory` structure is used as an access point for a single
object directory like ".git/objects". While the structure isn't yet
fully self-contained, the intent is for it to eventually contain all
information required to access objects in one specific location.

While the name "object directory" is a good fit for now, this will
change over time as we continue with the agenda to make pluggable object
databases a thing. Eventually, objects may not be accessed via any kind
of directory at all anymore, but they could instead be backed by any
kind of durable storage mechanism. While it seems quite far-fetched for
now, it is thinkable that eventually this might even be some form of a
database, for example.

As such, the current name of this structure will become worse over time
as we evolve into the direction of pluggable ODBs. Immediate next steps
will start to carve out proper self-contained object directories, which
requires us to pass in these object directories as parameters. Based on
our modern naming schema this means that those functions should then be
named after their subsystem, which means that we would start to bake the
current name into the codebase more and more.

Let's preempt this by renaming the structure. There have been a couple
alternatives that were discussed:

  - `odb_backend` was discarded because it led to the association that
    one object database has a single backend, but the model is that one
    alternate has one backend. Furthermore, "backend" is more about the
    actual backing implementation and less about the high-level concept.

  - `odb_alternate` was discarded because it is a bit of a stretch to
    also call the main object directory an "alternate".

Instead, pick `odb_source` as the new name. It makes it sufficiently
clear that there can be multiple sources and does not cause confusion
when mixed with the already-existing "alternate" terminology.

In the future, this change allows us to easily introduce for example a
`odb_files_source` and other format-specific implementations.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-07-01 14:46:34 -07:00
Junio C Hamano
5d2812ff3c Merge branch 'mm/apply-reverse-mode-of-deleted-path'
"git apply --index/--cached" when applying a deletion patch in
reverse failed to give the mode bits of the path "removed" by the
patch to the file it creates, which has been corrected.

* mm/apply-reverse-mode-of-deleted-path:
  apply: set file mode when --reverse creates a deleted file
  t4129: test that git apply warns for unexpected mode changes
2025-05-30 11:59:17 -07:00
Junio C Hamano
b4847a4477 Merge branch 'jt/receive-pack-skip-connectivity-check'
"git receive-pack" optionally learns not to care about connectivity
check, which can be useful when the repository arranges to ensure
connectivity by some other means.

* jt/receive-pack-skip-connectivity-check:
  builtin/receive-pack: add option to skip connectivity check
  t5410: test receive-pack connectivity check
2025-05-28 07:59:56 -07:00
Junio C Hamano
b5afd0a7ee Merge branch 'kn/passing-leak-tests'
Remove the leftover hints to the test framework to mark tests that
do not pass the leak checker tests, as they should no longer be
needed.

* kn/passing-leak-tests:
  t: remove unexpected SANITIZE_LEAK variables
2025-05-28 07:59:56 -07:00
Junio C Hamano
80f49f2ae7 Merge branch 'en/sequencer-comment-messages'
Prefix '#' to the commit title in the "rebase -i" todo file, just
like a merge commit being replayed.

* en/sequencer-comment-messages:
  sequencer: make it clearer that commit descriptions are just comments
2025-05-27 13:59:11 -07:00
Junio C Hamano
d8b48af391 Merge branch 'sj/use-mmap-to-check-packed-refs'
The code path to access the "packed-refs" file while "fsck" is
taught to mmap the file, instead of reading the whole file in the
memory.

* sj/use-mmap-to-check-packed-refs:
  packed-backend: mmap large "packed-refs" file during fsck
  packed-backend: extract snapshot allocation in `load_contents`
  packed-backend: fsck should warn when "packed-refs" file is empty
2025-05-27 13:59:10 -07:00
Junio C Hamano
6e5fb398d3 Merge branch 'ds/sparse-apply-add-p'
"git apply" and "git add -i/-p" code paths no longer unnecessarily
expand sparse-index while working.

* ds/sparse-apply-add-p:
  p2000: add performance test for patch-mode commands
  reset: integrate sparse index with --patch
  git add: make -p/-i aware of sparse index
  apply: integrate with the sparse index
2025-05-27 13:59:09 -07:00
Junio C Hamano
f545f401be Merge branch 'en/merge-tree-check'
"git merge-tree" learned an option to see if it resolves cleanly
without actually creating a result.

* en/merge-tree-check:
  merge-tree: add a new --quiet flag
  merge-ort: add a new mergeability_only option
2025-05-27 13:59:08 -07:00
Junio C Hamano
17d9dbd3c2 Merge branch 'jk/no-funny-object-types'
Support to create a loose object file with unknown object type has
been dropped.

* jk/no-funny-object-types:
  object-file: drop support for writing objects with unknown types
  hash-object: handle --literally with OPT_NEGBIT
  hash-object: merge HASH_* and INDEX_* flags
  hash-object: stop allowing unknown types
  t: add lib-loose.sh
  t/helper: add zlib test-tool
  oid_object_info(): drop type_name strbuf
  fsck: stop using object_info->type_name strbuf
  oid_object_info_convert(): stop using string for object type
  cat-file: use type enum instead of buffer for -t option
  object-file: drop OBJECT_INFO_ALLOW_UNKNOWN_TYPE flag
  cat-file: make --allow-unknown-type a noop
  object-file.h: fix typo in variable declaration
2025-05-27 13:59:08 -07:00
Junio C Hamano
dcb89740a0 Merge branch 'md/userdiff-bash-shell-function'
The userdiff pattern for shell scripts has been updated to cope
with more bash-isms.

* md/userdiff-bash-shell-function:
  userdiff: extend Bash pattern to cover more shell function forms
2025-05-27 13:59:06 -07:00
Mark Mentovai
1d9a66493b apply: set file mode when --reverse creates a deleted file
Commit 01aff0a (apply: correctly reverse patch's pre- and post-image
mode bits, 2023-12-26) revised reverse_patches() to maintain the desired
property that when only one of patch::old_mode and patch::new_mode is
set, the mode will be carried in old_mode. That property is generally
correct, with one notable exception: when creating a file, only new_mode
will be set. Since reversing a deletion results in a creation, new_mode
must be set in that case.

Omitting handling for this case means that reversing a patch that
removes an executable file will not result in the executable permission
being set on the re-created file. Existing test coverage for file modes
focuses only on mode changes of existing files.

Swap old_mode and new_mode in reverse_patches() for what's represented
in the patch as a file deletion, as it is transformed into a file
creation under reversal. This causes git apply --reverse to set the
executable permission properly when re-creating a deleted executable
file.

Add tests ensuring that git apply sets file modes correctly on file
creation, both in the forward and reverse directions.

Signed-off-by: Mark Mentovai <mark@chromium.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-05-27 06:48:07 -07:00
Mark Mentovai
2cc8c17d67 t4129: test that git apply warns for unexpected mode changes
There is no test covering what commit 01aff0a (apply: correctly reverse
patch's pre- and post-image mode bits, 2023-12-26) addressed. Prior to
that commit, git apply was erroneously unaware of a file's expected mode
while reverse-patching a file whose mode was not changing.

Add the missing test coverage to assure that git apply is aware of the
expected mode of a file being patched when the patch does not indicate
that the file's mode is changing. This is achieved by arranging a file
mode so that it doesn't agree with patch being applied, and checking git
apply's output for the warning it's supposed to raise in this situation.
Test in both reverse and normal (forward) directions.

Signed-off-by: Mark Mentovai <mark@chromium.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-05-27 06:48:07 -07:00
Junio C Hamano
95c79efb8d Merge branch 'ds/scalar-no-maintenance'
Two "scalar" subcommands that adds a repository that hasn't been
under "scalar"'s control are taught an option not to enable the
scheduled maintenance on it.

* ds/scalar-no-maintenance:
  scalar reconfigure: improve --maintenance docs
  scalar reconfigure: add --maintenance=<mode> option
  scalar clone: add --no-maintenance option
  scalar register: add --no-maintenance option
  scalar: customize register_dir()'s behavior
2025-05-23 15:34:07 -07:00
Karthik Nayak
368d8c86f7 t: remove unexpected SANITIZE_LEAK variables
As of 1fc7ddf35b (test-lib: unconditionally enable leak checking,
2024-11-20), both the `GIT_TEST_PASSING_SANITIZE_LEAK` and
`TEST_PASSES_SANITIZE_LEAK` variables no longer have any meaning, the
leak checks are enabled by default. However, some newly added tests
include them by mistake. Let's clean this up.

Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Acked-by: Justin Tobler <jltobler@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-05-20 15:09:33 -07:00
Justin Tobler
68cb0b5253 builtin/receive-pack: add option to skip connectivity check
During git-receive-pack(1), connectivity of the object graph is
validated to ensure that the received packfile does not leave the
repository in a broken state. This is done via git-rev-list(1) and
walking the objects, which can be expensive for large repositories.

Generally, this check is critical to avoid an incomplete received
packfile from corrupting a repository. Server operators may have
additional knowledge though around exactly how Git is being used on the
server-side which can be used to facilitate more efficient connectivity
computation of incoming objects.

For example, if it can be ensured that all objects in a repository are
connected and do not depend on any missing objects, the connectivity of
newly written objects can be checked by walking the object graph
containing only the new objects from the updated tips and identifying
the missing objects which represent the boundary between the new objects
and the repository. These boundary objects can be checked in the
canonical repository to ensure the new objects connect as expected and
thus avoid walking the rest of the object graph.

Git itself cannot make the guarantees required for such an optimization
as it is possible for a repository to contain an unreachable object that
references a missing object without the repository being considered
corrupt.

Introduce the --skip-connectivity-check option for git-receive-pack(1)
which bypasses this connectivity check to give more control to the
server-side. Note that without proper server-side validation of newly
received objects handled outside of Git, usage of this option risks
corrupting a repository.

Signed-off-by: Justin Tobler <jltobler@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-05-20 11:43:36 -07:00
Justin Tobler
95262afe78 t5410: test receive-pack connectivity check
As part of git-recieve-pack(1), the connectivity of objects is checked.
Add a test validating that git-receive-pack(1) fails due to an incoming
packfile that would leave the repository with missing objects. Instead
of creating a new test file, "t5410" is generalized for receive-pack
testing.

Signed-off-by: Justin Tobler <jltobler@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-05-20 11:43:36 -07:00
Junio C Hamano
90eedabbf7 Merge branch 'ps/reftable-read-block-perffix'
Performance regression in not-yet-released code has been corrected.

* ps/reftable-read-block-perffix:
  reftable: fix perf regression when reading blocks of unwanted type
2025-05-19 16:02:48 -07:00
Junio C Hamano
a9dcacbf2a Merge branch 'jk/oidmap-cleanup'
Code cleanup.

* jk/oidmap-cleanup:
  raw_object_store: drop extra pointer to replace_map
  oidmap: add size function
  oidmap: rename oidmap_free() to oidmap_clear()
2025-05-19 16:02:47 -07:00
Junio C Hamano
9af978fa04 Merge branch 'rc/t1001-test-path-is-file'
Test update.

* rc/t1001-test-path-is-file:
  t1001: replace 'test -f' with 'test_path_is_file'
2025-05-19 16:02:47 -07:00
Junio C Hamano
4bb72548fc Merge branch 'sc/bundle-uri-use-all-refs-in-bundle'
Bundle-URI feature did not use refs recorded in the bundle other
than normal branches as anchoring points to optimize the follow-up
fetch during "git clone"; now it is told to utilize all.

* sc/bundle-uri-use-all-refs-in-bundle:
  bundle-uri: add test for bundle-uri clones with tags
  bundle-uri: copy all bundle references ino the refs/bundle space
2025-05-19 16:02:45 -07:00
Junio C Hamano
0b8d22fd40 Merge branch 'pw/sequencer-reflog-use-after-free'
Use-after-free fix in the sequencer.

* pw/sequencer-reflog-use-after-free:
  sequencer: rework reflog message handling
  sequencer: move reflog message functions
2025-05-19 16:02:44 -07:00
Elijah Newren
29d7bf1951 merge-tree: add a new --quiet flag
Git Forges may be interested in whether two branches can be merged while
not being interested in what the resulting merge tree is nor which files
conflicted.  For such cases, add a new --quiet flag which
will make use of the new mergeability_only flag added to merge-ort in
the previous commit.  This option allows the merge machinery to, in the
outer layer of the merge:
    * exit early when a conflict is detected
    * avoid writing (most) merged blobs/trees to the object store

Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-05-16 15:09:14 -07:00
Elijah Newren
e42667241d sequencer: make it clearer that commit descriptions are just comments
Every once in a while, users report that editing the commit summaries
in the todo list does not get reflected in the rebase operation,
suggesting that users are (a) only using one-line commit messages, and
(b) not understanding that the commit summaries are merely helpful
comments to help them find the right hashes.

It may be difficult to correct users' poor commit messages, but we can
at least try to make it clearer that the commit summaries are not
directives of some sort by inserting a comment character.  Hopefully
that leads to them looking a little further and noticing the hints at
the bottom to use 'reword' or 'edit' directives.

Yes, this change may look funny at first since it hardcodes '#' rather
than using comment_line_str.  However:

  * comment_line_str exists to allow disambiguation between lines in
    a commit message and lines that are instructions to users editing
    the commit message.  No such disambiguation is needed for these
    comments that occur on the same line after existing directives
  * the exact "comment" character(s) on regular pick lines used aren't
    actually important; I could have used anything, including completely
    random variable length text for each line and it'd work because we
    ignore everything after 'pick' and the hash.
  * The whole point of this change is to signal to users that they
    should NOT be editing any part of the line after the hash (and if
    they do so, their edits will be ignored), while the whole point of
    comment_line_str is to allow highly flexible editing.  So making
    it more general by using comment_line_str actually feels
    counterproductive.
  * The character for merge directives absolutely must be '#'; that
    has been deeply hardcoded for a long time (see below), and will
    break if some other comment character is used instead.  In a
    desire to have pick and merge directives be similar, I use the
    same comment character for both.
  * Perhaps merge directives could be fixed to not be inflexible about
    the comment character used, if someone feels highly motivated, but
    I think that should be done in a separate follow-on patch.

Here are (some of?) the locations where '#' has already been hardcoded
for a long time for merges:

  1) In check_label_or_ref_arg():
	case TODO_LABEL:
		/*
		 * '#' is not a valid label as the merge command uses it to
		 * separate merge parents from the commit subject.
		 */

  2) In do_merge():

	/*
	 * For octopus merges, the arg starts with the list of revisions to be
	 * merged. The list is optionally followed by '#' and the oneline.
	 */
	merge_arg_len = oneline_offset = arg_len;
	for (p = arg; p - arg < arg_len; p += strspn(p, " \t\n")) {
		if (!*p)
			break;
		if (*p == '#' && (!p[1] || isspace(p[1]))) {

  3) In label_oid():

		if ((buf->len == the_hash_algo->hexsz &&
		     !get_oid_hex(label, &dummy)) ||
		    (buf->len == 1 && *label == '#') ||
		    hashmap_get_from_hash(&state->labels,
					  strihash(label), label)) {
			/*
			 * If the label already exists, or if the label is a
			 * valid full OID, or the label is a '#' (which we use
			 * as a separator between merge heads and oneline), we
			 * append a dash and a number to make it unique.
			 */

Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-05-16 12:28:27 -07:00
Derrick Stolee
ecf9ba20e3 p2000: add performance test for patch-mode commands
The previous three changes contributed performance improvements to 'git
apply', 'git add -p', and 'git reset -p' when using a sparse index. The
improvement to 'git apply' also improved 'git checkout -p'. Add
performance tests to demonstrate this (and to help validate that
performance remains good in the future).

In the truncated test output below, we see that the full checkout
performance changes within noise expectations, but the sparse index
cases improve 33% and then 96% for 'git add -p' and 41% and then 95% for
'git reset -p'. 'git checkout -p' improves immediatley by 91% because it
does not need any change to its builtin.

  Test                                    HEAD~4  HEAD~3       HEAD~2       HEAD~1
  -------------------------------------------------------------------------------------
  2000.118: ... git add -p (full-v3)        0.79  0.79  +0.0%  0.82  +3.8%  0.82  +3.8%
  2000.119: ... git add -p (full-v4)        0.74  0.76  +2.7%  0.74  +0.0%  0.76  +2.7%
  2000.120: ... git add -p (sparse-v3)      1.94  1.28 -34.0%  0.07 -96.4%  0.07 -96.4%
  2000.121: ... git add -p (sparse-v4)      1.93  1.28 -33.7%  0.06 -96.9%  0.06 -96.9%
  2000.122: ... git checkout -p (full-v3)   1.18  1.18  +0.0%  1.18  +0.0%  1.19  +0.8%
  2000.123: ... git checkout -p (full-v4)   1.10  1.12  +1.8%  1.11  +0.9%  1.11  +0.9%
  2000.124: ... git checkout -p (sparse-v3) 1.31  0.11 -91.6%  0.11 -91.6%  0.11 -91.6%
  2000.125: ... git checkout -p (sparse-v4) 1.29  0.11 -91.5%  0.11 -91.5%  0.11 -91.5%
  2000.126: ... git reset -p (full-v3)      0.81  0.80  -1.2%  0.83  +2.5%  0.83  +2.5%
  2000.127: ... git reset -p (full-v4)      0.78  0.77  -1.3%  0.77  -1.3%  0.78  +0.0%
  2000.128: ... git reset -p (sparse-v3)    1.58  0.92 -41.8%  0.91 -42.4%  0.07 -95.6%
  2000.129: ... git reset -p (sparse-v4)    1.58  0.92 -41.8%  0.92 -41.8%  0.07 -95.6%

It is worth noting that if our test was more involved and had multiple
hunks to evaluate, then the time spent in 'git apply' would dominate due
to multiple index loads and writes. As it stands, we need the sparse
index improvement in 'git add -p' itself to confirm this performance
improvement.

Since the change for 'git add -i' is identical, we avoid a second test
case for that similar operation.

Signed-off-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-05-16 12:02:47 -07:00
Derrick Stolee
efab7dc1f4 reset: integrate sparse index with --patch
Similar to the previous change for 'git add -p', the reset builtin
checked for integration with the sparse index after possibly redirecting
its logic toward the interactive logic. This means that the builtin
would expand the sparse index to a full one upon read.

Move this check earlier within cmd_reset() to improve performance here.

Add tests to guarantee that we are not universally expanding the index.
Add behavior tests to check that we are doing the same operations as a
full index.

Signed-off-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-05-16 12:02:47 -07:00
Derrick Stolee
02ed8555f6 git add: make -p/-i aware of sparse index
It is slow to expand a sparse index in-memory due to parsing of trees.
We aim to minimize that performance cost when possible. 'git add -p'
uses 'git apply' child processes to modify the index, but still there
are some expansions that occur.

It turns out that control flows out of cmd_add() in the interactive
cases before the lines that confirm that the builtin is integrated with
the sparse index.

Moving that integration point earlier in cmd_add() allows 'git add -i'
and 'git add -p' to operate without expanding a sparse index to a full
one.

Add test cases that confirm that these interactive add options work with
the sparse index.

Signed-off-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-05-16 12:01:51 -07:00
Derrick Stolee
952de281fe apply: integrate with the sparse index
The sparse index allows storing directory entries in the index, marked
with the skip-wortkree bit and pointing to a tree object. This may be an
unexpected data shape for some implementation areas, so we are rolling
it out incrementally on a builtin-per-builtin basis.

This change enables the sparse index for 'git apply'. The main
motivation for this change is that 'git apply' is used as a child
process of 'git add -p' and expanding the sparse index for each of those
child processes can lead to significant performance issues.

The good news is that the actual index manipulation code used by 'git
apply' is already integrated with the sparse index, so the only product
change is to mark the builtin as allowing the sparse index so it isn't
inflated on read.

The more involved part of this change is around adding tests that verify
how 'git apply' behaves in a sparse-checkout environment and whether or
not the index expands in certain operations.

Signed-off-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-05-16 12:00:33 -07:00
Moumita Dhar
ea8a71b40d userdiff: extend Bash pattern to cover more shell function forms
The previous function regex required explicit matching of function
bodies using `{`, `(`, `((`, or `[[`, which caused several issues:

- It failed to capture valid functions where `{` was on the next line
  due to line continuation (`\`).
- It did not recognize functions with single  command body, such as
  `x () echo hello`.

Replacing the function body matching logic with `.*$`, ensures
that everything on the function definition line is captured.

Additionally, the word regex is refined to better recognize shell
syntax, including additional parameter expansion operators and
command-line options.

Signed-off-by: Moumita Dhar <dhar61595@gmail.com>
Acked-by: Johannes Sixt <j6t@kdbg.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-05-16 11:52:41 -07:00
Jeff King
65a6a79b42 hash-object: stop allowing unknown types
When passed the "--literally" option, hash-object will allow any
arbitrary string for its "-t" type option. Such objects are only useful
for testing or debugging, as they cannot be used in the normal way
(e.g., you cannot fetch their contents!).

Let's drop this feature, which will eventually let us simplify the
object-writing code. This is technically backwards incompatible, but
since such objects were never really functional, it seems unlikely that
anybody will notice.

We will retain the --literally flag, as it also instructs hash-object
not to worry about other format issues (e.g., type-specific things that
fsck would complain about). The documentation does not need to be
updated, as it was always vague about which checks we're loosening (it
uses only the phrase "any garbage").

The code change is a bit hard to verify from just the patch text. We can
drop our local hash_literally() helper, but it was really just wrapping
write_object_file_literally(). We now replace that with calling
index_fd(), as we do for the non-literal code path, but dropping the
INDEX_FORMAT_CHECK flag. This ends up being the same semantically as
what the _literally() code path was doing (modulo handling unknown
types, which is our goal).

We'll be able to clean up these code paths a bit more in subsequent
patches.

The existing test is flipped to show that we now reject the unknown
type. The additional "extra-long type" test is now redundant, as we bail
early upon seeing a bogus type.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-05-16 09:43:11 -07:00
Jeff King
b5643b60ac t: add lib-loose.sh
This commit adds a shell library for writing raw loose objects into the
object database. Normally this is done with hash-object, but the
specific intent here is to allow broken objects that hash-object may not
support.

We'll convert several cases that use "hash-object --literally" to write
objects with invalid types. That works currently, but dropping this
dependency will allow us to remove that feature and simplify the
object-writing code.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-05-16 09:43:11 -07:00
Jeff King
f2ed511a2f t/helper: add zlib test-tool
It's occasionally useful when testing or debugging to be able to do raw
zlib inflate/deflate operations (e.g., to check the bytes of a specific
loose or packed object).

Even though zlib's deflate algorithm is used by many other programs,
this is surprisingly hard to do in a portable way. E.g., gzip can do
this if you manually munge some header bytes. But the result is somewhat
arcane, and we don't assume gzip is available anyway. Likewise, pigz
will handle raw zlib, but we can't assume it is available.

So let's introduce a short test helper for just doing zlib operations.
We'll use it in subsequent patches to add some new tests, but it would
also have come in handy a few times in the past:

  - The hard-coded pack data from 3b910d0c5e (add tests for indexing
    packs with delta cycles, 2013-08-23) could probably be generated on
    the fly.

  - Likewise we could avoid the hard-coded data from 0b1493c2d4
    (git_inflate(): skip zlib_post_call() sanity check on Z_NEED_DICT,
    2025-02-25). Though note this would require support for more zlib
    options.

  - It would have helped with the debugging documented in 41dfbb2dbe
    (howto: add article on recovering a corrupted object, 2013-10-25).

I'll leave refactoring existing tests for another day, but I hope the
examples above show the general utility.

I aimed for simplicity in the code. In particular, it will read all
input into a memory buffer, rather than streaming. That makes the zlib
loops harder to get wrong (which has been a source of subtle bugs in the
past).

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-05-16 09:43:11 -07:00
Jeff King
4ae0e9423c fsck: stop using object_info->type_name strbuf
When fsck-ing a loose object, we use object_info's type_name strbuf to
record the parsed object type as a string. For most objects this is
redundant with the object_type enum, but it does let us report the
string when we encounter an object with an unknown type (for which there
is no matching enum value).

There are a few downsides, though:

  1. The code to report these cases is not actually robust. Since we did
     not pass a strbuf to unpack_loose_header(), we only retrieved types
     from headers up to 32 bytes. In longer cases, we'd simply say
     "object corrupt or missing".

  2. This is the last caller that uses object_info's type_name strbuf
     support. It would be nice to refactor it so that we can simplify
     that code.

  3. Likewise, we'll check the hash of the object using its unknown type
     (again, as long as that type is short enough). That depends on the
     hash_object_file_literally() code, which we'd eventually like to
     get rid of.

So we can simplify things by bailing immediately in read_loose_object()
when we encounter an unknown type. This has a few user-visible effects:

  a. Instead of producing a single line of error output like this:

       error: 26ed13ce3564fbbb44e35bde42c7da717ea004a6: object is of unknown type 'bogus': .git/objects/26/ed13ce3564fbbb44e35bde42c7da717ea004a6

     we'll now issue two lines (the first from read_loose_object() when
     we see the unparsable header, and the second from the fsck code,
     since we couldn't read the object):

       error: unable to parse type from header 'bogus 4' of .git/objects/26/ed13ce3564fbbb44e35bde42c7da717ea004a6
       error: 26ed13ce3564fbbb44e35bde42c7da717ea004a6: object corrupt or missing: .git/objects/26/ed13ce3564fbbb44e35bde42c7da717ea004a6

     This is a little more verbose, but this sort of error should be
     rare (such objects are almost impossible to work with, and cannot
     be transferred between repositories as they are not representable
     in packfiles). And as a bonus, reporting the broken header in full
     could help with debugging other cases (e.g., a header like "blob
     xyzzy\0" would fail in parsing the size, but previously we'd not
     have showed the offending bytes).

  b. An object with an unknown type will be reported as corrupt, without
     actually doing a hash check. Again, I think this is unlikely to
     matter in practice since such objects are totally unusable.

We'll update one fsck test to match the new error strings. And we can
remove another test that covered the case of an object with an unknown
type _and_ a hash corruption. Since we'll skip the hash check now in
this case, the test is no longer interesting.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-05-16 09:43:10 -07:00
Jeff King
f227fc7d43 cat-file: make --allow-unknown-type a noop
The cat-file command has some minor support for handling objects with
"unknown" types. I.e., strings that are not "blob", "commit", "tree", or
"tag".

In theory this could be used for debugging or experimenting with
extensions to Git. But in practice this support is not very useful:

  1. You can get the type and size of such objects, but nothing else.
     Not even the contents!

  2. Only loose objects are supported, since packfiles use numeric ids
     for the types, rather than strings.

  3. Likewise you cannot ever transfer objects between repositories,
     because they cannot be represented in the packfiles used for the
     on-the-wire protocol.

The support for these unknown types complicates the object-parsing code,
and has led to bugs such as b748ddb7a4 (unpack_loose_header(): fix
infinite loop on broken zlib input, 2025-02-25). So let's drop it.

The first step is to remove the user-facing parts, which are accessible
only via cat-file. This is technically backwards-incompatible, but given
the limitations listed above, these objects couldn't possibly be useful
in any workflow.

However, we can't just rip out the option entirely. That would hurt a
caller who ran:

  git cat-file -t --allow-unknown-object <oid>

and fed it normal, well-formed objects. There --allow-unknown-type was
doing nothing, but we wouldn't want to start bailing with an error. So
to protect any such callers, we'll retain --allow-unknown-type as a
noop.

The code change is fairly small (but we'll able to clean up more code in
follow-on patches). The test updates drop any use of the option. We
still retain tests that feed the broken objects to cat-file without
--allow-unknown-type, as we should continue to confirm that those
objects are rejected. Note that in one spot we can drop a layer of loop,
re-indenting the body; viewing the diff with "-w" helps there.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-05-16 09:43:09 -07:00
Junio C Hamano
38fc278819 Merge branch 'jc/t6011-mv-ro-fix'
Test fix.

* jc/t6011-mv-ro-fix:
  t6011: fix misconversion from perl to sed
2025-05-15 17:24:57 -07:00
Junio C Hamano
4dda60c9df Merge branch 'ps/maintenance-missing-tasks'
Make repository clean-up tasks "gc" can do available to "git
maintenance" front-end.

* ps/maintenance-missing-tasks:
  builtin/maintenance: introduce "rerere-gc" task
  builtin/gc: move rerere garbage collection into separate function
  builtin/maintenance: introduce "worktree-prune" task
  builtin/gc: move pruning of worktrees into a separate function
  builtin/gc: remove global variables where it is trivial to do
  builtin/gc: fix indentation of `cmd_gc()` parameters
2025-05-15 17:24:56 -07:00
shejialuo
784ceccb91 packed-backend: fsck should warn when "packed-refs" file is empty
We assume the "packed-refs" won't be empty and instead has at least one
line in it (even when there are no refs packed, there is the file header
line). Because there is no terminating LF in the empty file, we will
report "packedRefEntryNotTerminated(ERROR)" to the user.

However, the runtime code paths would accept an empty "packed-refs"
file, for example, "create_snapshot" would simply return the "snapshot"
without checking the content of "packed-refs". So, we should skip
checking the content of "packed-refs" when it is empty during fsck.

After 694b7a1999 (repack_without_ref(): write peeled refs in the
rewritten file, 2013-04-22), we would always write a header into the
"packed-refs" file. So, versions of Git that are not too ancient never
write such an empty "packed-refs" file.

As an empty file often indicates a sign of a filesystem-level issue, the
way we want to resolve this inconsistency is not make everybody totally
silent but notice and report the anomaly.

Let's create a "FSCK_INFO" message id "EMPTY_PACKED_REFS_FILE" to report
to the users that "packed-refs" is empty.

Signed-off-by: shejialuo <shejialuo@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-05-14 12:32:58 -07:00
Junio C Hamano
330a09e4a5 Merge branch 'kj/glob-path-with-special-char'
"git add 'f?o'" did not add 'foo' if 'f?o', an unusual pathname,
also existed on the working tree, which has been corrected.

* kj/glob-path-with-special-char:
  dir.c: literal match with wildcard in pathspec should still glob
2025-05-13 14:05:07 -07:00
Junio C Hamano
a4ad13dd19 Merge branch 'ng/xdiff-truly-minimal'
"git diff --minimal" used to give non-minimal output when its
optimization kicked in, which has been disabled.

* ng/xdiff-truly-minimal:
  xdiff: disable cleanup_records heuristic with --minimal
2025-05-12 14:22:50 -07:00
Junio C Hamano
6dbc41631d Merge branch 'ds/fix-thin-fix'
"git index-pack --fix-thin" used to abort to prevent a cycle in
delta chains from forming in a corner case even when there is no
such cycle.

* ds/fix-thin-fix:
  index-pack: allow revisiting REF_DELTA chains
  t5309: create failing test for 'git index-pack'
  test-tool: add pack-deltas helper
2025-05-12 14:22:49 -07:00
Jeff King
2744646834 oidmap: rename oidmap_free() to oidmap_clear()
This function does not free the oidmap struct itself; it just drops all
items from the map (using hashmap_clear_() internally). It should be
called oidmap_clear(), per CodingGuidelines.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-05-12 13:06:26 -07:00
Patrick Steinhardt
1970333644 reftable: fix perf regression when reading blocks of unwanted type
In fd888311fbc (reftable/table: move reading block into block reader,
2025-04-07), we have refactored how reftable blocks are read so that
most of the logic is contained in the "block.c" subsystem itself. Most
importantly, the whole logic to read the data itself is now contained in
that subsystem.

This change caused a significant performance regression though when
reading blocks that aren't of the specific type one is searching for:

    Benchmark 1: update-ref: create 100k refs (revision = fd888311fbc~)
      Time (mean ± σ):      2.171 s ±  0.028 s    [User: 1.189 s, System: 0.977 s]
      Range (min … max):    2.117 s …  2.206 s    10 runs

    Benchmark 2: update-ref: create 100k refs (revision = fd888311fbc)
      Time (mean ± σ):      3.418 s ±  0.030 s    [User: 2.371 s, System: 1.037 s]
      Range (min … max):    3.377 s …  3.473 s    10 runs

    Summary
      update-ref: create 100k refs (revision = fd888311fbc~) ran
        1.57 ± 0.02 times faster than update-ref: create 100k refs (revision = fd888311fbc)

The root caute of the performance regression is that we changed when
exactly blocks of an uninteresting type are being discarded. Previous to
the refactoring in the mentioned commit we'd load the block data, read
its type, notice that it's not the wanted type and discard the block.
After the commit though we don't discard the block immediately, but we
fully decode it only to realize that it's not the desired type. We then
discard the block again, but have already performed a bunch of pointless
work.

Fix the regression by making `reftable_block_init()` return early in
case the block is not of the desired type. This fixes the performance
hit:

    Benchmark 1: update-ref: create 100k refs (revision = HEAD~)
      Time (mean ± σ):      2.712 s ±  0.018 s    [User: 1.990 s, System: 0.716 s]
      Range (min … max):    2.682 s …  2.741 s    10 runs

    Benchmark 2: update-ref: create 100k refs (revision = HEAD)
      Time (mean ± σ):      1.670 s ±  0.012 s    [User: 0.991 s, System: 0.676 s]
      Range (min … max):    1.652 s …  1.693 s    10 runs

    Summary
      update-ref: create 100k refs (revision = HEAD) ran
        1.62 ± 0.02 times faster than update-ref: create 100k refs (revision = HEAD~)

Note that the baseline performance is lower than in the original due to
a couple of unrelated performance improvements that have landed since
the original commit.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-05-12 10:55:24 -07:00
Rodrigo Carvalho
bac220e154 t1001: replace 'test -f' with 'test_path_is_file'
'test_path_is_file' is a modern path checking method in Git's development.
 Replace the basic shell command 'test -f' with this approach.

Signed-off-by: Rodrigo Carvalho <rodrigorsdc@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-05-12 10:09:21 -07:00
Phillip Wood
5dbaec628d sequencer: rework reflog message handling
It has been reported that "git rebase --rebase-merges" can create
corrupted reflog entries like

    e9c962f2ea0 HEAD@{8}: <binary>�: Merged in <branch> (pull request #4441)

This is due to a use-after-free bug that happens because
reflog_message() uses a static `struct strbuf` and is not called to
update the current reflog message stored in `ctx->reflog_message` when
creating the merge. This means `ctx->reflog_message` points to a stale
reflog message that has been freed by subsequent call to
reflog_message() by a command such as `reset` that used the return value
directly rather than storing the result in `ctx->reflog_message`.

Fix this by creating the reflog message nearer to where the commit is
created and storing it in a local variable which is passed as an
additional parameter to run_git_commit() rather than storing the message
in `struct replay_ctx`. This makes it harder to forget to call
`reflog_message()` before creating a commit and using a variable with a
narrower scope means that a stale value cannot carried across a from one
iteration of the loop to the next which should prevent any similar
use-after-free bugs in the future.

A existing test is modified to demonstrate that merges are now created
with the correct reflog message.

Reported-by: Kristoffer Haugsbakk <code@khaugsbakk.name>
Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-05-09 13:29:23 -07:00
Junio C Hamano
0730906043 Merge branch 'ps/mv-contradiction-fix'
"git mv a a/b dst" would ask to move the directory 'a' itself, as
well as its contents, in a single destination directory, which is
a contradicting request that is impossible to satisfy. This case is
now detected and the command errors out.

* ps/mv-contradiction-fix:
  builtin/mv: convert assert(3p) into `BUG()`
  builtin/mv: bail out when trying to move child and its parent
2025-05-08 12:36:32 -07:00
Derrick Stolee
a34fef86e0 scalar reconfigure: add --maintenance=<mode> option
When users want to enable the latest and greatest configuration options
recommended by Scalar after a Git upgrade, 'scalar reconfigure --all' is
a great option that iterates over all repos in the multi-valued
'scalar.repos' config key.

However, this feature previously forced users to enable background
maintenance. In some environments this is not preferred.

Add a new --maintenance=<mode> option to 'scalar reconfigure' that
provides options for enabling (default), disabling, or leaving
background maintenance config as-is.

Helped-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-05-07 14:04:32 -07:00
Derrick Stolee
882ce0c475 scalar clone: add --no-maintenance option
When creating a new enlistment via 'scalar clone', the default is to set
up situations that work for most user scenarios. Background maintenance
is one of those highly-recommended options for most users.

However, when using 'scalar clone' to create an enlistment in a
different situation, such as prepping a VM image, it may be valuable to
disable background maintenance so the manual maintenance steps do not
get blocked by concurrent background maintenance activities.

Add a new --no-maintenance option to 'scalar clone'.

Signed-off-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-05-07 14:04:31 -07:00
Derrick Stolee
9816e24a78 scalar register: add --no-maintenance option
When registering a repository with Scalar to get the latest opinionated
configuration, the 'scalar register' command will also set up background
maintenance. This is a recommended feature for most user scenarios.

However, this is not always recommended in some scenarios where
background modifications may interfere with foreground activities.
Specifically, setting up a clone for use in automation may require doing
certain maintenance steps in the foreground that could become blocked by
concurrent background maintenance operations.

Allow the user to specify --no-maintenance to 'scalar register'. This
requires updating the method prototype for register_dir(), so use the
default of enabling this value when otherwise specified.

Signed-off-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-05-07 14:04:31 -07:00