When reading object information via `packed_object_info()` we may not
populate the object info's packfile-specific fields. This leads to
inconsistent object info depending on whether the info was populated via
`packfile_store_read_object_info()` or `packed_object_info()`.
Fix this inconsistency so that we can always assume the pack info to be
populated when reading object info from a pack.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The `struct object_info::u::packed::is_delta` field determines whether
or not a specific object is stored as a delta. It only stores whether or
not the object is stored as delta, so it is treated as a boolean value.
This boolean is insufficient though: when reading a packed object via
`packfile_store_read_object_info()` we know to skip parsing the actual
object when the user didn't request any object-specific data. In that
case we won't read the object itself, but will only look up its position
in the packfile. Consequently, we do not know whether it is a delta or
not.
This isn't really an issue right now, as the check for an empty request
is broken. But a subsequent commit will fix it, and once we do we will
have the need to also represent an "unknown" delta state.
Prepare for this change by introducing a new enum that encodes the
object type. We don't use the "unknown" state just yet, but will start
to do so in a subsequent commit.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When reading object info via a packfile we yield one of two types:
- The object can either be OI_PACKED, which is what a caller would
typically expect.
- Or it can be OI_DBCACHED if it is stored in the delta base cache.
The latter really is an implementation detail though, and callers
typically don't care at all about the difference. Furthermore, the
information whether or not it is part of the delta base cache can
already be derived via the `is_delta` field, so the fact that we discern
between OI_PACKED and OI_DBCACHED only further complicates the
interface.
There aren't all that many callers that care about the `whence` field in
the first place. In fact, there's only three:
- `packfile_store_read_object_info()` checks for `whence == OI_PACKED`
and then populates the packfile information of the object info
structure. We now start to do this also for deltified objects, which
gives its callers strictly more information.
- `repack_local_links()` wants to determine whether the object is part
of a promisor pack and checks for `whence == OI_PACKED`. If so, it
verifies that the packfile is a promisor pack. It's arguably wrong
to declare that an object is not part of a promisor pack only
because it is stored in the delta base cache.
- `is_not_in_promisor_pack_obj()` does the same, but checks that a
specific object is _not_ part of a promisor pack. The same reasoning
as above applies.
Drop the OI_DBCACHED enum completely. None of the callers seem to care
about the distinction.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
There are some early returns in `odb_source_loose_read_object_info()`
in cases where we don't have to open the loose object. These return
paths do not set `struct object_info::whence` to `OI_LOOSE` though, so
it becomes impossible for the caller to tell the format of such an
object.
The root cause of this really is that we have so many different return
paths in the function. As a consequence, it's harder than necessary to
make sure that all successful exit paths sot up the `whence` field as
expected.
Address this by refactoring the function to have a single exit path.
Like this, we can trivially set up the `whence` field when we exit
successfully from the function.
Note that we also:
- Rename `status` to `ret` to match our usual coding style, but also
to show that the old `status` variable is now always getting the
expected value. Furthermore, the value is not initialized anymore,
which has the consequence that most compilers will warn for exit
paths where we forgot to set it.
- Move the setup of scratch pointers closer to `parse_loose_header()`
to show where it's needed.
- Guard a couple of variables on cleanup so that they only get
released in case they have been set up.
- Reset `oi->delta_base_oid` towards the end of the function, together
with all the other object info pointers.
Overall, all these changes result in a diff that is somewhat hard to
read. But the end result is significantly easier to read and reason
about, so I'd argue this one-time churn is worth it.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When various *object_info() functions are given an extended object
info structure as NULL by a caller that does not want any details,
the code uses a file-scope static blank_oi and passes it down to
the helper functions they use, to avoid handling NULL specifically.
The ps/object-read-stream topic graduated to 'master' recently
however had a bug that assumed that two identically named file-scope
static variables in two functions are the same, which of course is
not the case. This made "git commit" take 0.38 seconds to 1508
seconds in some case, as reported by Aaron Plattner here:
https://lore.kernel.org/git/f4ba7e89-4717-4b36-921f-56537131fd69@nvidia.com/
We _could_ move the blank_oi variable to the global scope in common
section to fix this regression, but explicitly handling the NULL is
a much safer fix. It would also reduce the chance of errors that
somebody accidentally writes into blank_oi, making its contents
dirty, which potentially will make subsequent calls into the
function misbehave. By explicitly handling NULL input, we no longer
have to worry about it.
Reported-by: Aaron Plattner <aplattner@nvidia.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* ps/object-read-stream: (32 commits)
streaming: drop redundant type and size pointers
streaming: move into object database subsystem
streaming: refactor interface to be object-database-centric
streaming: move logic to read packed objects streams into backend
streaming: move logic to read loose objects streams into backend
streaming: make the `odb_read_stream` definition public
streaming: get rid of `the_repository`
streaming: rely on object sources to create object stream
packfile: introduce function to read object info from a store
streaming: move zlib stream into backends
streaming: create structure for filtered object streams
streaming: create structure for packed object streams
streaming: create structure for loose object streams
streaming: create structure for in-core object streams
streaming: allocate stream inside the backend-specific logic
streaming: explicitly pass packfile info when streaming a packed object
streaming: propagate final object type via the stream
streaming: drop the `open()` callback function
streaming: rename `git_istream` into `odb_read_stream`
object-file: refactor writing objects via a stream
...
Rewrite the only use of "mktemp()" that is subject to TOCTOU race
and Stop using the insecure "mktemp()" function.
* rs/ban-mktemp:
compat: remove gitmkdtemp()
banned.h: ban mktemp(3)
compat: remove mingw_mktemp()
compat: use git_mkdtemp()
wrapper: add git_mkdtemp()
The "git_istream" abstraction has been revamped to make it easier
to interface with pluggable object database design.
* ps/object-read-stream:
streaming: drop redundant type and size pointers
streaming: move into object database subsystem
streaming: refactor interface to be object-database-centric
streaming: move logic to read packed objects streams into backend
streaming: move logic to read loose objects streams into backend
streaming: make the `odb_read_stream` definition public
streaming: get rid of `the_repository`
streaming: rely on object sources to create object stream
packfile: introduce function to read object info from a store
streaming: move zlib stream into backends
streaming: create structure for filtered object streams
streaming: create structure for packed object streams
streaming: create structure for loose object streams
streaming: create structure for in-core object streams
streaming: allocate stream inside the backend-specific logic
streaming: explicitly pass packfile info when streaming a packed object
streaming: propagate final object type via the stream
streaming: drop the `open()` callback function
streaming: rename `git_istream` into `odb_read_stream`
"git repo struct" learned to take "-z" as a synonym to "--format=nul".
* lo/repo-struct-z:
repo: add -z as an alias for --format=nul to git-repo-structure
repo: use [--format=... | -z] instead of [-z] in git-repo-info synopsis
repo: remove blank line from Documentation/git-repo.adoc
A help message from "git branch" now mentions "git help" instead of
"man" when suggesting to read some documentation.
* kh/advise-w-git-help-in-branch:
branch: advice using git-help(1) instead of man(1)
Build fix.
* tc/meson-cross-compile-fix:
meson: use is_cross_build() where possible
meson: only detect ICONV_OMITS_BOM if possible
meson: ignore subprojects/.wraplock
"git last-modified" used to mishandle "--" to mark the beginning of
pathspec, which has been corrected.
* js/last-modified-with-sparse-checkouts:
last-modified: support sparse checkouts
Halve the memory consumed by artificial filepairs created during
"git diff --find-copioes-harder", also making the operation run
faster.
* rs/diff-index-find-copies-harder-optim:
diff-index: don't queue unchanged filepairs with diff_change()
Recent optimization to "last-modified" command introduced use of
uninitialized block of memory, which has been corrected.
* tc/last-modified-active-paths-optimization:
last-modified: fix use of uninitialized memory
The use of "revision" (a connected set of commits) has been
clarified in the "git replay" documentation.
* en/replay-doc-revision-range:
Documentation/git-replay.adoc: fix errors around revision range
A few tests have been updated to work under the shell compatible
mode of zsh.
* bc/zsh-testsuite:
t5564: fix test hang under zsh's sh mode
t0614: use numerical comparison with test_line_count
"git replay" forgot to omit the "gpgsig-sha256" extended header
from the resulting commit the same way it omits "gpgsig", which has
been corrected.
* pw/replay-exclude-gpgsig-fix:
replay: do not copy "gpgsign-sha256" header
While investigating the config options set by 'scalar' I noticed this
one wasn't documented.
Signed-off-by: Matthew Hughes <matthewhughes934@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When pushing to a set of remotes using a nickname for the group, the
client initializes the connection to each remote, talks to the
remote and reads and parses capabilities line, and holds the
capabilities in a file-scope static variable server_capabilities_v1.
There are a few other such file-scope static variables, and these
connections cannot be parallelized until they are refactored to a
structure that keeps track of active connections.
Which is *not* the theme of this patch ;-)
For a single connection, the server_capabilities_v1 variable is
initialized to NULL (at the program initialization), populated when
we talk to the other side, used to look up capabilities of the other
side possibly multiple times, and the memory is held by the variable
until program exit, without leaking. When talking to multiple remotes,
however, the server capabilities from the second connection overwrites
without freeing the one from the first connection, which leaks.
==1080970==ERROR: LeakSanitizer: detected memory leaks
Direct leak of 421 byte(s) in 2 object(s) allocated from:
#0 0x5615305f849e in strdup (/home/gitster/g/git-jch/bin/bin/git+0x2b349e) (BuildId: 54d149994c9e85374831958f694bd0aa3b8b1e26)
#1 0x561530e76cc4 in xstrdup /home/gitster/w/build/wrapper.c:43:14
#2 0x5615309cd7fa in process_capabilities /home/gitster/w/build/connect.c:243:27
#3 0x5615309cd502 in get_remote_heads /home/gitster/w/build/connect.c:366:4
#4 0x561530e2cb0b in handshake /home/gitster/w/build/transport.c:372:3
#5 0x561530e29ed7 in get_refs_via_connect /home/gitster/w/build/transport.c:398:9
#6 0x561530e26464 in transport_push /home/gitster/w/build/transport.c:1421:16
#7 0x561530800bec in push_with_options /home/gitster/w/build/builtin/push.c:387:8
#8 0x5615307ffb99 in do_push /home/gitster/w/build/builtin/push.c:442:7
#9 0x5615307fe926 in cmd_push /home/gitster/w/build/builtin/push.c:664:7
#10 0x56153065673f in run_builtin /home/gitster/w/build/git.c:506:11
#11 0x56153065342f in handle_builtin /home/gitster/w/build/git.c:779:9
#12 0x561530655b89 in run_argv /home/gitster/w/build/git.c:862:4
#13 0x561530652cba in cmd_main /home/gitster/w/build/git.c:984:19
#14 0x5615308dda0a in main /home/gitster/w/build/common-main.c:9:11
#15 0x7f051651bca7 in __libc_start_call_main csu/../sysdeps/nptl/libc_start_call_main.h:58:16
SUMMARY: AddressSanitizer: 421 byte(s) leaked in 2 allocation(s).
Free the capablities data for the previous server before overwriting
it with the next server to plug this leak.
The added test fails without the freeing with SANITIZE=leak; I
somehow couldn't get it fail reliably with SANITIZE=leak,address
though.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Join two paragraphs that start with the standard “The default <hook>,
when enabled” into one and put it at the end of the “pre-commit”
section.
The trailing whitespace paragraph was added in the first commit for the
doc, in 6d35cc76 (Document hooks., 2005-09-02). Then 3e14dd2c (mention
use of "hooks.allownonascii" in "man githooks", 2019-02-20) updated the
“pre-commit” section to mention the non-ASCII check that was added in
d00e364d.[1] But this paragraph was added one-past the original
“default” paragraph, after the env. variable paragraph, and starts
exactly the same. That causes the flow of this section to feel
off (paragraphs in order):
1. Invoked by <cmd> and what parameters it takes
2. The default 'pre-commit' hook catches introduction of trailing
whitespace
3. `GIT_EDITOR=:`
4. The default pre-commit' hook catches introduction of non-ASCII
filenames
Let’s instead join these two paragrahs and explain the whole behavior of
the default script.
† 1: Extend sample pre-commit hook to check for non ascii filenames,
2009-05-19
Signed-off-by: Kristoffer Haugsbakk <code@khaugsbakk.name>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
gitmkdtemp() has become a trivial wrapper around git_mkdtemp(). Remove
this now unnecessary layer of indirection.
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Older versions of mktemp(3) generate easily guessable file names. The
function checks if the generated name is used, which is unreliable, as
a file with that name might then be created by some other process before
we can do it ourselves. The function was dropped from POSIX due to its
security problems. Forbid its use.
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Remove the mktemp(3) compatibility function now that its last caller was
removed by the previous commit.
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
A file might appear at the path returned by mktemp(3) before we call
mkdir(2). Use the more robust git_mkdtemp() instead, which retries a
number of times and doesn't need to call lstat(2).
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Extend git_mkstemps_mode() to optionally call mkdir(2) instead of
open(2), then use that ability to create a mkdtemp(3) replacement,
git_mkdtemp(). We'll start using it in the next commit.
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The error message given by "git config set", when the variable
being updated has more than one values defined, used old style "git
config" syntax with an incorrect option in its hint, both of which
have been corrected.
* rs/config-set-multi-error-message-fix:
config: fix suggestion for failed set of multi-valued option
The option help text given by "git config unset -h" described
the "--all" option to "replace", not "unset", multiple variables,
which has been corrected.
* rs/config-unset-opthelp-fix:
config: fix short help of unset flags
Code refactoring around object database sources.
* ps/object-source-management:
odb: handle recreation of quarantine directories
odb: handle changing a repository's commondir
chdir-notify: add function to unregister listeners
odb: handle initialization of sources in `odb_new()`
http-push: stop setting up `the_repository` for each reference
t/helper: stop setting up `the_repository` repeatedly
builtin/index-pack: fix deferred fsck outside repos
oidset: introduce `oidset_equal()`
odb: move logic to disable ref updates into repo
odb: refactor `odb_clear()` to `odb_free()`
odb: adopt logic to close object databases
setup: convert `set_git_dir()` to have file scope
path: move `enter_repo()` into "setup.c"
Dockerised jobs at the GitHub Actions CI have been taught to show
more details of failed tests.
* js/ci-show-breakage-in-dockerized-jobs:
ci(dockerized): do show the result of failing tests again
The "--committer-date-is-author-date" option of "git am/rebase" is
a misguided one. The documentation is updated to discourage its
use.
* kh/doc-committer-date-is-author-date:
doc: warn against --committer-date-is-author-date
"git config get --path" segfaulted on an ":(optional)path" that
does not exist, which has been corrected.
* jc/optional-path:
config: really treat missing optional path as not configured
config: really pretend missing :(optional) value is not there
config: mark otherwise unused function as file-scope static
Code clean-up.
* en/xdiff-cleanup-2:
xdiff: rename rindex -> reference_index
xdiff: change rindex from long to size_t in xdfile_t
xdiff: make xdfile_t.nreff a size_t instead of long
xdiff: make xdfile_t.nrec a size_t instead of long
xdiff: split xrecord_t.ha into line_hash and minimal_perfect_hash
xdiff: use unambiguous types in xdl_hash_record()
xdiff: use size_t for xrecord_t.size
xdiff: make xrecord_t.ptr a uint8_t instead of char
xdiff: use ptrdiff_t for dstart/dend
doc: define unambiguous type mappings across C and Rust
Other Git commands that have nul-terminated output, such as git-config,
git-status, git-ls-files, and git-repo-info have a flag `-z` for using
the null character as the record separator.
Add the `-z` flag to git-repo-structure as an alias for `--format=nul`,
making it consistent with the behavior of the other commands.
Signed-off-by: Lucas Seiki Oshiro <lucasseikioshiro@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>