We have converted all callers of `for_each_loose_object()` and
`for_each_packed_object()` to use their new replacement functions
instead. We can thus remove them now.
Do so and inline `packfile_store_for_each_object_internal()` now that it
only has a single callsite again. This makes it a bit easier to follow
the callback indirection that is happening there.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
To figure out which objects expired objects we enumerate all loose and
packed objects individually so that we can figure out their respective
mtimes. Refactor the code to instead use `odb_for_each_object()` with a
request that ask for the object mtime instead.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When enumerating objects that are supposed to be stored in a new cruft
pack we use `for_each_packed_object()` and then derive each object's
mtime individually. Refactor this logic to instead use the new
`packfile_store_for_each_object()` function with an object info request
that asks for the respective mtimes.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
There are some use cases where we need to figure out the mtime for
objects. Most importantly, this is the case when we want to prune
unreachable objects. But getting at that data requires users to manually
derive the info either via the loose object's mtime, the packfiles'
mtime or via the ".mtimes" file.
Introduce a new `struct object_info::mtimep` pointer that allows callers
to request an object's mtime. This new field will be used in a
subsequent commit.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We're using `for_each_loose_object()` and `for_each_packed_object()` at
a couple of callsites to enumerate all loose and packed objects,
respectively. These functions will be removed in a subsequent commit in
favor of the newly introduced `odb_source_loose_for_each_object()` and
`packfile_store_for_each_object()` replacements.
Prepare for this by refactoring the sites accordingly.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We have multiple callsites where we enumerate all promisor objects in
the object database via `for_each_packed_object()`. This is done by
passing the `ODB_FOR_EACH_OBJECT_PROMISOR_ONLY` flag, which causes us to
skip over all non-promisor objects.
These callsites can be trivially converted to `odb_for_each_object()` as
we know to skip enumeration of loose objects in case the `PROMISOR_ONLY`
flag was passed by the caller.
Refactor the sites accordingly.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In git-fsck(1) we have two callsites where we iterate over all objects
via `for_each_loose_object()` and `for_each_packed_object()`. Both of
these are trivially convertible with `odb_for_each_object()`.
Refactor these callsites accordingly.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Introduce a new function `odb_for_each_object()` that knows to iterate
through all objects part of a given object database. This function is
essentially a simple wrapper around the object database sources.
Subsequent commits will adapt callers to use this new function.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Introduce a new function `packfile_store_for_each_object()`. This
function is the equivalent to `odb_source_loose_for_each_object()` in
that it:
- Works on a single packfile store and thus per object source.
- Passes a `struct object_info` to the callback function.
As such, it provides the same callback interface as we already provide
for loose objects now. These functions will be used in a subsequent step
to implement `odb_for_each_object()`.
The `for_each_packed_object()` function continues to exist for now, but
it will be removed at the end of this patch series.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In the next commit we're about to introduce a new function that knows to
iterate through objects of a given packfile store. Same as with the
equivalent function for loose objects, this new function will also be
agnostic of backends by using a `struct object_info`.
Prepare for this by extracting a new shared function to iterate through
a single packfile store.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We have multiple divergent interfaces to iterate through objects of a
specific backend:
- `for_each_loose_object()` yields all loose objects.
- `for_each_packed_object()` (somewhat obviously) yields all packed
objects.
These functions have different function signatures, which makes it hard
to create a common abstraction layer that covers both of these.
Introduce a new function `odb_source_loose_for_each_object()` to plug
this gap. This function doesn't take any data specific to loose objects,
but instead it accepts a `struct object_info` that will be populated the
exact same as if `odb_source_loose_read_object()` was called.
The benefit of this new interface is that we can continue to pass
backend-specific data, as `struct object_info` contains a union for
these exact use cases. This will allow us to unify how we iterate
through objects across both loose and packed objects in a subsequent
commit.
The `for_each_loose_object()` function continues to exist for now, but
it will be removed at the end of this patch series.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Extract a new function that allows us to read object info for a specific
loose object via a user-supplied path. This function will be used in a
subsequent commit.
Note that this also allows us to drop `stat_loose_object()`, which is
a simple wrapper around `odb_loose_path()` plus lstat(3p).
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The `flags` parameter accepted by various `for_each_object()` functions
is a bitfield of multiple flags. Such parameters are typically unsigned
in the Git codebase, but we use `enum odb_for_each_object_flags` in
some places.
Adapt these function signatures to use the correct type.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Rename the `FOR_EACH_OBJECT_*` flags to have an `ODB_` prefix. This
prepares us for a new upcoming `odb_for_each_object()` function and
ensures that both the function and its flags have the same prefix.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* ps/packfile-store-in-odb-source:
packfile: move MIDX into packfile store
packfile: refactor `find_pack_entry()` to work on the packfile store
packfile: inline `find_kept_pack_entry()`
packfile: only prepare owning store in `packfile_store_prepare()`
packfile: only prepare owning store in `packfile_store_get_packs()`
packfile: move packfile store into object source
packfile: refactor misleading code when unusing pack windows
packfile: refactor kept-pack cache to work with packfile stores
packfile: pass source to `prepare_pack()`
packfile: create store via its owning source
odb: properly close sources before freeing them
builtin/gc: fix condition for whether to write commit graphs
* ps/read-object-info-improvements:
packfile: drop repository parameter from `packed_object_info()`
packfile: skip unpacking object header for disk size requests
packfile: disentangle return value of `packed_object_info()`
packfile: always populate pack-specific info when reading object info
packfile: extend `is_delta` field to allow for "unknown" state
packfile: always declare object info to be OI_PACKED
object-file: always set OI_LOOSE when reading object info
The function `packed_object_info()` takes a packfile and offset and
returns the object info for the corresponding object. Despite these two
parameters though it also takes a repository pointer. This is redundant
information though, as `struct packed_git` already has a repository
pointer that is always populated.
Drop the redundant parameter.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
While most of the object info requests for a packed object require us to
unpack its headers, reading its disk size doesn't. We still unpack the
object header in that case though, which is unnecessary work.
Skip reading the header if only the disk size is requested. This leads
to a small speedup when reading disk size, only. The following benchmark
was done in the Git repository:
Benchmark 1: ./git rev-list --disk-usage HEAD (rev = HEAD~)
Time (mean ± σ): 105.2 ms ± 0.6 ms [User: 91.4 ms, System: 13.3 ms]
Range (min … max): 103.7 ms … 106.0 ms 27 runs
Benchmark 2: ./git rev-list --disk-usage HEAD (rev = HEAD)
Time (mean ± σ): 96.7 ms ± 0.4 ms [User: 86.2 ms, System: 10.0 ms]
Range (min … max): 96.2 ms … 98.1 ms 30 runs
Summary
./git rev-list --disk-usage HEAD (rev = HEAD) ran
1.09 ± 0.01 times faster than ./git rev-list --disk-usage HEAD (rev = HEAD~)
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The `packed_object_info()` function returns the type of the packed
object. While we use an `enum object_type` to store the return value,
this type is not to be confused with the actual object type. It _may_
contain the object type, but it may just as well encode that the given
packed object is stored as a delta.
We have removed the only caller that relied on this returned object type
in the preceding commit, so let's simplify semantics and return either 0
on success or a negative error code otherwise.
This unblocks a small optimization where we can skip reading the object
type altogether.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When reading object information via `packed_object_info()` we may not
populate the object info's packfile-specific fields. This leads to
inconsistent object info depending on whether the info was populated via
`packfile_store_read_object_info()` or `packed_object_info()`.
Fix this inconsistency so that we can always assume the pack info to be
populated when reading object info from a pack.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The `struct object_info::u::packed::is_delta` field determines whether
or not a specific object is stored as a delta. It only stores whether or
not the object is stored as delta, so it is treated as a boolean value.
This boolean is insufficient though: when reading a packed object via
`packfile_store_read_object_info()` we know to skip parsing the actual
object when the user didn't request any object-specific data. In that
case we won't read the object itself, but will only look up its position
in the packfile. Consequently, we do not know whether it is a delta or
not.
This isn't really an issue right now, as the check for an empty request
is broken. But a subsequent commit will fix it, and once we do we will
have the need to also represent an "unknown" delta state.
Prepare for this change by introducing a new enum that encodes the
object type. We don't use the "unknown" state just yet, but will start
to do so in a subsequent commit.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When reading object info via a packfile we yield one of two types:
- The object can either be OI_PACKED, which is what a caller would
typically expect.
- Or it can be OI_DBCACHED if it is stored in the delta base cache.
The latter really is an implementation detail though, and callers
typically don't care at all about the difference. Furthermore, the
information whether or not it is part of the delta base cache can
already be derived via the `is_delta` field, so the fact that we discern
between OI_PACKED and OI_DBCACHED only further complicates the
interface.
There aren't all that many callers that care about the `whence` field in
the first place. In fact, there's only three:
- `packfile_store_read_object_info()` checks for `whence == OI_PACKED`
and then populates the packfile information of the object info
structure. We now start to do this also for deltified objects, which
gives its callers strictly more information.
- `repack_local_links()` wants to determine whether the object is part
of a promisor pack and checks for `whence == OI_PACKED`. If so, it
verifies that the packfile is a promisor pack. It's arguably wrong
to declare that an object is not part of a promisor pack only
because it is stored in the delta base cache.
- `is_not_in_promisor_pack_obj()` does the same, but checks that a
specific object is _not_ part of a promisor pack. The same reasoning
as above applies.
Drop the OI_DBCACHED enum completely. None of the callers seem to care
about the distinction.
Note that this also fixes a segfault introduced in 8c1b84bc97
(streaming: move logic to read packed objects streams into backend,
2025-11-23), which refactors how we stream packed objects. The intent is
to only read packed objects in case they are stored non-deltified as
we'd otherwise have to deflate them first. But the check for whether or
not the object is stored as a delta was unconditionally done via
`oi.u.packed.is_delta`, which is only valid in case `oi.whence` is
`OI_PACKED`. But under some circumstances we got `OI_DBCACHED` here,
which means that none of the `oi.u.packed` fields were initialized at
all. Consequently, we assumed the object was not stored as a delta, and
then try to read the object from `oi.u.packed.pack`, which is a `NULL`
pointer and thus causes a segfault.
Add a test case for this issue so that this cannot regress in the
future anymore.
Reported-by: Matt Smiley <msmiley@gitlab.com>
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
There are some early returns in `odb_source_loose_read_object_info()`
in cases where we don't have to open the loose object. These return
paths do not set `struct object_info::whence` to `OI_LOOSE` though, so
it becomes impossible for the caller to tell the format of such an
object.
The root cause of this really is that we have so many different return
paths in the function. As a consequence, it's harder than necessary to
make sure that all successful exit paths sot up the `whence` field as
expected.
Address this by refactoring the function to have a single exit path.
Like this, we can trivially set up the `whence` field when we exit
successfully from the function.
Note that we also:
- Rename `status` to `ret` to match our usual coding style, but also
to show that the old `status` variable is now always getting the
expected value. Furthermore, the value is not initialized anymore,
which has the consequence that most compilers will warn for exit
paths where we forgot to set it.
- Move the setup of scratch pointers closer to `parse_loose_header()`
to show where it's needed.
- Guard a couple of variables on cleanup so that they only get
released in case they have been set up.
- Reset `oi->delta_base_oid` towards the end of the function, together
with all the other object info pointers.
Overall, all these changes result in a diff that is somewhat hard to
read. But the end result is significantly easier to read and reason
about, so I'd argue this one-time churn is worth it.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Code clean-up, unifying various hand-rolled "list of commit
objects" and use the commit_stack API.
* rs/commit-stack:
commit-reach: use commit_stack
commit-graph: use commit_stack
commit: add commit_stack_grow()
shallow: use commit_stack
pack-bitmap-write: use commit_stack
commit: add commit_stack_init()
test-reach: use commit_stack
remote: use commit_stack for src_commits
remote: use commit_stack for sent_tips
remote: use commit_stack for local_commits
name-rev: use commit_stack
midx: use commit_stack
log: use commit_stack
revision: export commit_stack
Diagnose invalid bundle-URI that lack the URI entry, instead of
crashing.
* sb/bundle-uri-without-uri:
bundle-uri: validate that bundle entries have a uri
More doc style updates.
* ja/doc-synopsis-style-more:
doc: convert git-remote to synopsis style
doc: convert git stage to use synopsis block
doc: convert git-status tables to AsciiDoc format
doc: convert git-status to synopsis style
doc: fix t0450-txt-doc-vs-help to select only first synopsis block
The multi-pack index still is tracked as a member of the object database
source, but ultimately the MIDX is always tied to one specific packfile
store.
Move the structure into `struct packfile_store` accordingly. This
ensures that the packfile store now keeps track of all data related to
packfiles.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The function `find_pack_entry()` doesn't work on a specific packfile
store, but instead works on the whole repository. This causes a bit of a
conceptual mismatch in its callers:
- `packfile_store_freshen_object()` supposedly acts on a store, and
its callers know to iterate through all sources already.
- `packfile_store_read_object_info()` behaves likewise.
The only exception that doesn't know to handle iteration through sources
is `has_object_pack()`, but that function is trivial to adapt.
Refactor the code so that `find_pack_entry()` works on the packfile
store level instead.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The `find_kept_pack_entry()` function is only used in
`has_object_kept_pack()`, which is only a trivial wrapper itself. Inline
the latter into the former.
Furthermore, reorder the code so that we can drop the declaration of the
function in "packfile.h". This allows us to make the function file-local.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When calling `packfile_store_prepare()` we prepare not only the provided
packfile store, but also all those of all other sources part of the same
object database. This was required when the store was still sitting on
the object database level. But now that it sits on the source level it's
not anymore.
Refactor the code so that we only prepare the single packfile store
passed by the caller. Adapt callers accordingly.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When calling `packfile_store_get_packs()` we prepare not only the
provided packfile store, but also all those of all other sources part of
the same object database. This was required when the store was still
sitting on the object database level. But now that it sits on the source
level it's not anymore.
Adapt the code so that we only prepare the MIDX of the provided store.
All callers only work in the context of a single store or call the
function in a loop over all sources, so this change shouldn't have any
practical effects.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The packfile store is a member of `struct object_database`, which means
that we have a single store per database. This doesn't really make much
sense though: each source connected to the database has its own set of
packfiles, so there is a conceptual mismatch here. This hasn't really
caused much of a problem in the past, but with the advent of pluggable
object databases this is becoming more of a problem because some of the
sources may not even use packfiles in the first place.
Move the packfile store down by one level from the object database into
the object database source. This ensures that each source now has its
own packfile store, and we can eventually start to abstract it away
entirely so that the caller doesn't even know what kind of store it
uses.
Note that we only need to adjust a relatively small number of callers,
way less than one might expect. This is because most callers are using
`repo_for_each_pack()`, which handles enumeration of all packfiles that
exist in the repository. So for now, none of these callers need to be
adapted. The remaining callers that iterate through the packfiles
directly and that need adjustment are those that are a bit more tangled
with packfiles. These will be adjusted over time.
Note that this patch only moves the packfile store, and there is still a
bunch of functions that seemingly operate on a packfile store but that
end up iterating over all sources. These will be adjusted in subsequent
commits.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The function `unuse_one_window()` is responsible for unmapping one of
the packfile windows, which is done when we have exceeded the allowed
number of window.
The function receives a `struct packed_git` as input, which serves as an
additional packfile that should be considered to be closed. If not
given, we seemingly skip that and instead go through all of the
repository's packfiles. The conditional that checks whether we have a
packfile though does not make much sense anymore, as we dereference the
packfile regardless of whether or not it is a `NULL` pointer to derive
the repository's packfile store.
The function was originally introduced via f0e17e86e1 (pack: move
release_pack_memory(), 2017-08-18), and here we indeed had a caller that
passed a `NULL` pointer. That caller was later removed via 9827d4c185
(packfile: drop release_pack_memory(), 2019-08-12), so starting with
that commit we always pass a `struct packed_git`. In 9c5ce06d74
(packfile: use `repository` from `packed_git` directly, 2024-12-03) we
then inadvertently started to rely on the fact that the pointer is never
`NULL` because we use it now to identify the repository.
Arguably, it didn't really make sense in the first place that the caller
provides a packfile, as the selected window would have been overridden
anyway by the subsequent loop over all packfiles if there was an older
window. So the overall logic is quite misleading overall. The only case
where it _could_ make a difference is when there were two packfiles with
the same `last_used` value, but that case doesn't ever happen because
the `pack_used_ctr` is strictly increasing.
Refactor the code so that we instead pass in the object database to
help make the code less misleading.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The kept pack cache is a cache of packfiles that are marked as kept
either via an accompanying ".kept" file or via an in-memory flag. The
cache can be retrieved via `kept_pack_cache()`, where one needs to pass
in a repository.
Ultimately though the kept-pack cache is a property of the packfile
store, and this causes problems in a subsequent commit where we want to
move down the packfile store to be a per-object-source entity.
Prepare for this and refactor the kept-pack cache to work on top of a
packfile store instead. While at it, rename both the function and flags
specific to the kept-pack cache so that they can be properly attributed
to the respective subsystems.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When preparing a packfile we pass various pieces attached to the pack's
object database source via the `struct prepare_pack_data`. Refactor this
code to instead pass in the source directly. This reduces the number of
variables we need to pass and allows for a subsequent refactoring where
we start to prepare the pack via the source.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In subsequent patches we're about to move the packfile store from the
object database layer into the object database source layer. Once done,
we'll have one packfile store per source, where the source is owning the
store.
Prepare for this future and refactor `packfile_store_new()` to be
initialized via an object database source instead of via the object
database itself.
This refactoring leads to a weird in-between state where the store is
owned by the object database but created via the source. But this makes
subsequent refactorings easier because we can now start to access the
owning source of a given store.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The ort merge machinery hit an assertion failure in a history with
criss-cross merges renamed a directory and a non-directory, which
has been corrected.
* en/ort-recursive-d-f-conflict-fix:
merge-ort: fix corner case recursive submodule/directory conflict handling
Running "git diff" with "--name-only" and other options that allows
us not to look at the blob contents, while objects that are lazily
fetched from a promisor remote, caused use-after-free, which has
been corrected.
* ds/diff-lazy-fetch-with-name-only-fix:
diff: avoid segfault with freed entries
Code clean-up.
* rs/tag-wo-the-repository:
tag: stop using the_repository
tag: support arbitrary repositories in parse_tag()
tag: support arbitrary repositories in gpg_verify_tag()
tag: use algo of repo parameter in parse_tag_buffer()
It is possible to hit a memory leak when reading data from a submodule
via git-grep(1):
Direct leak of 192 byte(s) in 1 object(s) allocated from:
#0 0x55555562e726 in calloc (git+0xda726)
#1 0x555555964734 in xcalloc ../wrapper.c:154:8
#2 0x555555835136 in load_multi_pack_index_one ../midx.c:135:2
#3 0x555555834fd6 in load_multi_pack_index ../midx.c:382:6
#4 0x5555558365b6 in prepare_multi_pack_index_one ../midx.c:716:17
#5 0x55555586c605 in packfile_store_prepare ../packfile.c:1103:3
#6 0x55555586c90c in packfile_store_reprepare ../packfile.c:1118:2
#7 0x5555558546b3 in odb_reprepare ../odb.c:1106:2
#8 0x5555558539e4 in do_oid_object_info_extended ../odb.c:715:4
#9 0x5555558533d1 in odb_read_object_info_extended ../odb.c:862:8
#10 0x5555558540bd in odb_read_object ../odb.c:920:6
#11 0x55555580a330 in grep_source_load_oid ../grep.c:1934:12
#12 0x55555580a13a in grep_source_load ../grep.c:1986:10
#13 0x555555809103 in grep_source_is_binary ../grep.c:2014:7
#14 0x555555807574 in grep_source_1 ../grep.c:1625:8
#15 0x555555807322 in grep_source ../grep.c:1837:10
#16 0x5555556a5c58 in run ../builtin/grep.c:208:10
#17 0x55555562bb42 in void* ThreadStartFunc<false>(void*) lsan_interceptors.cpp.o
#18 0x7ffff7a9a979 in start_thread (/nix/store/xx7cm72qy2c0643cm1ipngd87aqwkcdp-glibc-2.40-66/lib/libc.so.6+0x9a979) (BuildId: cddea92d6cba8333be952b5a02fd47d61054c5ab)
#19 0x7ffff7b22d2b in __GI___clone3 (/nix/store/xx7cm72qy2c0643cm1ipngd87aqwkcdp-glibc-2.40-66/lib/libc.so.6+0x122d2b) (BuildId: cddea92d6cba8333be952b5a02fd47d61054c5ab)
The root caues of this leak is the way we set up and release the
submodule:
1. We use `repo_submodule_init()` to initialize a new repository. This
repository is stored in `repos_to_free`.
2. We now read data from the submodule repository.
3. We then call `repo_clear()` on the submodule repositories.
4. `repo_clear()` calls `odb_free()`.
5. `odb_free()` calls `odb_free_sources()` followed by `odb_close()`.
The issue here is the 5th step: we call `odb_free_sources()` _before_ we
call `odb_close()`. But `odb_free_sources()` already frees all sources,
so the logic that closes them in `odb_close()` now becomes a no-op. As a
consequence, we never explicitly close sources at all.
Fix the leak by closing the store before we free the sources.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When performing auto-maintenance we check whether commit graphs need to
be generated by counting the number of commits that are reachable by any
reference, but not covered by a commit graph. This search is performed
by iterating through all references and then doing a depth-first search
until we have found enough commits that are not present in the commit
graph.
This logic has a memory leak though:
Direct leak of 16 byte(s) in 1 object(s) allocated from:
#0 0x55555562e433 in malloc (git+0xda433)
#1 0x555555964322 in do_xmalloc ../wrapper.c:55:8
#2 0x5555559642e6 in xmalloc ../wrapper.c:76:9
#3 0x55555579bf29 in commit_list_append ../commit.c:1872:35
#4 0x55555569f160 in dfs_on_ref ../builtin/gc.c:1165:4
#5 0x5555558c33fd in do_for_each_ref_iterator ../refs/iterator.c:431:12
#6 0x5555558af520 in do_for_each_ref ../refs.c:1828:9
#7 0x5555558ac317 in refs_for_each_ref ../refs.c:1833:9
#8 0x55555569e207 in should_write_commit_graph ../builtin/gc.c:1188:11
#9 0x55555569c915 in maintenance_is_needed ../builtin/gc.c:3492:8
#10 0x55555569b76a in cmd_maintenance ../builtin/gc.c:3542:9
#11 0x55555575166a in run_builtin ../git.c:506:11
#12 0x5555557502f0 in handle_builtin ../git.c:779:9
#13 0x555555751127 in run_argv ../git.c:862:4
#14 0x55555575007b in cmd_main ../git.c:984:19
#15 0x5555557523aa in main ../common-main.c:9:11
#16 0x7ffff7a2a4d7 in __libc_start_call_main (/nix/store/xx7cm72qy2c0643cm1ipngd87aqwkcdp-glibc-2.40-66/lib/libc.so.6+0x2a4d7) (BuildId: cddea92d6cba8333be952b5a02fd47d61054c5ab)
#17 0x7ffff7a2a59a in __libc_start_main@GLIBC_2.2.5 (/nix/store/xx7cm72qy2c0643cm1ipngd87aqwkcdp-glibc-2.40-66/lib/libc.so.6+0x2a59a) (BuildId: cddea92d6cba8333be952b5a02fd47d61054c5ab)
#18 0x5555555f0934 in _start (git+0x9c934)
The root cause of this memory leak is our use of `commit_list_append()`.
This function expects as parameters the item to append and the _tail_ of
the list to append. This tail will then be overwritten with the new tail
of the list so that it can be used in subsequent calls. But we call it
with `commit_list_append(parent->item, &stack)`, so we end up losing
everything but the new item.
This issue only surfaces when counting merge commits. Next to being a
memory leak, it also shows that we're in fact miscounting as we only
respect children of the last parent. All previous parents are discarded,
so their children will be disregarded unless they are hit via another
reference.
While crafting a test case for the issue I was puzzled that I couldn't
establish the proper border at which the auto-condition would be
fulfilled. As it turns out, there's another bug: if an object is at the
tip of any reference we don't mark it as seen. Consequently, if it is
the tip of or reachable via another ref, we'd count that object multiple
times.
Fix both of these bugs so that we properly count objects without leaking
any memory.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>