At now, we have already implemented the ref consistency checks for both
"files-backend" and "packed-backend". Although we would check some
redundant things, it won't cause trouble. So, let's integrate it into
the "git-fsck(1)" command to get feedback from the users. And also by
calling "git refs verify" in "git-fsck(1)", we make sure that the new
added checks don't break.
Introduce a new function "fsck_refs" that initializes and runs a child
process to execute the "git refs verify" command. In order to provide
the user interface create a progress which makes the total task be 1.
It's hard to know how many loose refs we will check now. We might
improve this later.
Then, introduce the option to allow the user to disable checking ref
database consistency. Put this function in the very first execution
sequence of "git-fsck(1)" due to that we don't want the existing code of
"git-fsck(1)" which would implicitly check the consistency of refs to
die the program.
Last, update the test to exercise the code.
Mentored-by: Patrick Steinhardt <ps@pks.im>
Mentored-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: shejialuo <shejialuo@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When there is a "sorted" trait in the header of the "packed-refs" file,
it means that each entry is sorted increasingly by comparing the
refname. We should add checks to verify whether the "packed-refs" is
sorted in this case.
Update the "packed_fsck_ref_header" to know whether there is a "sorted"
trail in the header. It may seem that we could record all refnames
during the parsing process and then compare later. However, this is not
a good design due to the following reasons:
1. Because we need to store the state across the whole checking
lifetime, we would consume a lot of memory if there are many entries
in the "packed-refs" file.
2. We cannot reuse the existing compare function "cmp_packed_ref_records"
which cause repetition.
Because "cmp_packed_ref_records" needs an extra parameter "struct
snaphost", extract the common part into a new function
"cmp_packed_ref_records" to reuse this function to compare.
Then, create a new function "packed_fsck_ref_sorted" to parse the file
again and user the new fsck message "packedRefUnsorted(ERROR)" to report
to the user if the file is not sorted.
Mentored-by: Patrick Steinhardt <ps@pks.im>
Mentored-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: shejialuo <shejialuo@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"packed-backend.c::next_record" will parse the ref entry to check the
consistency. This function has already checked the following things:
1. Parse the main line of the ref entry to inspect whether the oid is
not correct. Then, check whether the next character is oid. Then
check the refname.
2. If the next line starts with '^', it would continue to parse the
peeled oid and check whether the last character is '\n'.
As we decide to implement the ref consistency check for "packed-refs",
let's port these two checks and update the test to exercise the code.
Mentored-by: Patrick Steinhardt <ps@pks.im>
Mentored-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: shejialuo <shejialuo@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"packed-backend.c::next_record" will use "check_refname_format" to check
the consistency of the refname. If it is not OK, the program will die.
However, it is reported in [1], we cannot catch some corruption. But we
already have the code path and we must miss out something.
We use the following code to get the refname:
strbuf_add(&iter->refname_buf, p, eol - p);
iter->base.refname = iter->refname_buf.buf
In the above code, `p` is the start pointer of the refname and `eol` is
the next newline pointer. We calculate the length of the refname by
subtracting the two pointers. Then we add the memory range between `p`
and `eol` to get the refname.
However, if there are some NUL characters in the memory range between `p`
and `eol`, we will see the refname as a valid ref name as long as the
memory range between `p` and first occurred NUL character is valid.
In order to catch above corruption, create a new function
"refname_contains_nul" by searching the first NUL character. If it is
not at the end of the string, there must be some NUL characters in the
refname.
Use this function in "next_record" function to die the program if
"refname_contains_nul" returns true.
[1] https://lore.kernel.org/git/6cfee0e4-3285-4f18-91ff-d097da9de737@rd10.de/
Reported-by: R. Diez <rdiez-temp3@rd10.de>
Mentored-by: Patrick Steinhardt <ps@pks.im>
Mentored-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: shejialuo <shejialuo@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In "packed-backend.c::create_snapshot", if there is a header (the line
which starts with '#'), we will check whether the line starts with "#
pack-refs with: ". However, we need to consider other situations and
discuss whether we need to add checks.
1. If the header does not exist, we should not report an error to the
user. This is because in older Git version, we never write header in
the "packed-refs" file. Also, we do allow no header in "packed-refs"
in runtime.
2. If the header content does not start with "# packed-ref with: ", we
should report an error just like what "create_snapshot" does. So,
create a new fsck message "badPackedRefHeader(ERROR)" for this.
3. If the header content is not the same as the constant string
"PACKED_REFS_HEADER". This is expected because we make it extensible
intentionally and runtime "create_snapshot" won't complain about
unknown traits. In order to align with the runtime behavior. There is
no need to report.
As we have analyzed, we only need to check the case 2 in the above. In
order to do this, use "open_nofollow" function to get the file
descriptor and then read the "packed-refs" file via "strbuf_read". Like
what "create_snapshot" and other functions do, we could split the line
by finding the next newline in the buffer. When we cannot find a
newline, we could report an error.
So, create a function "packed_fsck_ref_next_line" to find the next
newline and if there is no such newline, use
"packedRefEntryNotTerminated(ERROR)" to report an error to the user.
Then, parse the first line to apply the checks. Update the test to
exercise the code.
Mentored-by: Patrick Steinhardt <ps@pks.im>
Mentored-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: shejialuo <shejialuo@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We always write a space after "# pack-refs with:" but we don't align
with this rule in the "create_snapshot" method where we would check
whether header starts with "# pack-refs with:". It might seem that we
should undoubtedly tighten this rule, however, we don't have any
technical documentation about this and there is a possibility that we
would break the compatibility for other third-party libraries.
By investigating influential third-party libraries, we could conclude
how these libraries handle the header of "packed-refs" file:
1. libgit2 is fine and always writes the space. It also expects the
whitespace to exist.
2. JGit does not expect th header to have a trailing space, but expects
the "peeled" capability to have a leading space, which is mostly
equivalent because that capability is typically the first one we
write. It always writes the space.
3. gitoxide expects the space t exist and writes it.
4. go-git doesn't create the header by default.
As many third-party libraries expect a single space after "# pack-refs
with:", if we forget to write the space after the colon,
"create_snapshot" won't catch this. And we would break other
re-implementations. So, we'd better tighten the rule by checking whether
the header starts with "# pack-refs with: ".
Mentored-by: Patrick Steinhardt <ps@pks.im>
Mentored-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: shejialuo <shejialuo@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Although "git-fsck(1)" and "packed-backend.c" will check some
consistency and correctness of "packed-refs" file, they never check the
filetype of the "packed-refs". Let's verify that the "packed-refs" has
the expected filetype, confirming it is created by "git pack-refs"
command.
We could use "open_nofollow" wrapper to open the raw "packed-refs" file.
If the returned "fd" value is less than 0, we could check whether the
"errno" is "ELOOP" to report an error to the user. And then we use
"fstat" to check whether the "packed-refs" file is a regular file.
Reuse "FSCK_MSG_BAD_REF_FILETYPE" fsck message id to report the error to
the user if "packed-refs" is not a regular file.
Mentored-by: Patrick Steinhardt <ps@pks.im>
Mentored-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: shejialuo <shejialuo@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In "packed-backend.c", there are some functions such as "create_snapshot"
and "next_record" which would check the correctness of the content of
the "packed-ref" file. When anything is bad, the program will die.
It may seem that we have nothing relevant to above feature, because we
are going to read and parse the raw "packed-ref" file without creating
the snapshot and using the ref iterator to check the consistency.
However, when using "get_worktrees" in "builtin/refs", we would parse
the "HEAD" information. If the referent of the "HEAD" is inside the
"packed-ref", we will call "create_snapshot" function to parse the
"packed-ref" to get the information. No matter whether the entry of
"HEAD" in "packed-ref" is correct, "create_snapshot" would call
"verify_buffer_safe" to check whether there is a newline in the last
line of the file. If not, the program will die.
Although this behavior has no harm for the program, it will
short-circuit the program. When the users execute "git refs verify" or
"git fsck", we should avoid reading the head information, which may
execute the read operation in packed backend with stricter checks to die
the program. Instead, we should continue to check other parts of the
"packed-refs" file completely.
Fortunately, in 465a22b338 (worktree: skip reading HEAD when repairing
worktrees, 2023-12-29), we have introduced a function
"get_worktrees_internal" which allows us to get worktrees without
reading head information.
Create a new exposed function "get_worktrees_without_reading_head", then
replace the "get_worktrees" in "builtin/refs" with the new created
function.
Mentored-by: Patrick Steinhardt <ps@pks.im>
Mentored-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: shejialuo <shejialuo@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
For every test, we would execute the command "cd repo" in the first but
we never execute the command "cd .." to restore the working directory.
However, it's either not a good idea use above way. Because if any test
fails between "cd repo" and "cd ..", the "cd .." will never be reached.
And we cannot correctly restore the working directory.
Let's use subshell to ensure that the current working directory could be
restored to the correct path.
Mentored-by: Patrick Steinhardt <ps@pks.im>
Mentored-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: shejialuo <shejialuo@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The build procedure based on meson learned to generate HTML
documention pages.
* ps/build-meson-html:
Documentation: wire up sanity checks for Meson
t/Makefile: make "check-meson" work with Dash
meson: install static files for HTML documentation
meson: generate articles
Documentation: refactor "howto-index.sh" for out-of-tree builds
Documentation: refactor "api-index.sh" for out-of-tree builds
meson: generate user manual
Documentation: inline user-manual.conf
meson: generate HTML pages for all man page categories
meson: fix generation of merge tools
meson: properly wire up dependencies for our docs
meson: wire up support for AsciiDoctor
CI jobs that run threaded programs under LSan has been giving false
positives from time to time, which has been worked around.
This is an alternative to the jk/lsan-race-with-barrier topic with
much smaller change to the production code.
* jk/lsan-race-ignore-false-positive:
test-lib: ignore leaks in the sanitizer's thread code
test-lib: check leak logs for presence of DEDUP_TOKEN
test-lib: simplify leak-log checking
test-lib: rely on logs to detect leaks
Revert barrier-based LSan threading race workaround
Our CI jobs sometimes see false positive leaks like this:
=================================================================
==3904583==ERROR: LeakSanitizer: detected memory leaks
Direct leak of 32 byte(s) in 1 object(s) allocated from:
#0 0x7fa790d01986 in __interceptor_realloc ../../../../src/libsanitizer/lsan/lsan_interceptors.cpp:98
#1 0x7fa790add769 in __pthread_getattr_np nptl/pthread_getattr_np.c:180
#2 0x7fa790d117c5 in __sanitizer::GetThreadStackTopAndBottom(bool, unsigned long*, unsigned long*) ../../../../src/libsanitizer/sanitizer_common/sanitizer_linux_libcdep.cpp:150
#3 0x7fa790d11957 in __sanitizer::GetThreadStackAndTls(bool, unsigned long*, unsigned long*, unsigned long*, unsigned long*) ../../../../src/libsanitizer/sanitizer_common/sanitizer_linux_libcdep.cpp:598
#4 0x7fa790d03fe8 in __lsan::ThreadStart(unsigned int, unsigned long long, __sanitizer::ThreadType) ../../../../src/libsanitizer/lsan/lsan_posix.cpp:51
#5 0x7fa790d013fd in __lsan_thread_start_func ../../../../src/libsanitizer/lsan/lsan_interceptors.cpp:440
#6 0x7fa790adc3eb in start_thread nptl/pthread_create.c:444
#7 0x7fa790b5ca5b in clone3 ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81
This is not a leak in our code, but appears to be a race between one
thread calling exit() while another one is in LSan's stack setup code.
You can reproduce it easily by running t0003 or t5309 with --stress
(these trigger it because of the threading in git-grep and index-pack
respectively).
This may be a bug in LSan, but regardless of whether it is eventually
fixed, it is useful to work around it so that we stop seeing these false
positives.
We can recognize it by the mention of the sanitizer functions in the
DEDUP_TOKEN line. With this patch, the scripts mentioned above should
run with --stress indefinitely.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When we check the leak logs, our original strategy was to check for any
non-empty log file produced by LSan. We later amended that to ignore
noisy lines in 370ef7e40d (test-lib: ignore uninteresting LSan output,
2023-08-28).
This makes it hard to ignore noise which is more than a single line;
we'd have to actually parse the file to determine the meaning of each
line.
But there's an easy line-oriented solution. Because we always pass the
dedup_token_length option, the output will contain a DEDUP_TOKEN line
for each leak that has been found. So if we invert our strategy to stop
ignoring useless lines and only look for useful ones, we can just count
the number of DEDUP_TOKEN lines. If it's non-zero, then we found at
least one leak (it would even give us a count of unique leaks, but we
really only care if it is non-zero).
This should yield the same outcome, but will help us build more false
positive detection on top.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We have a function to count the number of leaks found (actually, it is
the number of processes which produced a log file). Once upon a time we
cared about seeing if this number increased between runs. But we
simplified that away in 95c679ad86 (test-lib: stop showing old leak
logs, 2024-09-24), and now we only care if it returns any results or
not.
In preparation for refactoring it further, let's drop the counting
function entirely, and roll it into the "is it empty" check. The outcome
should be the same, but we'll be free to return a boolean "did we find
anything" without worrying about somebody adding a new call to the
counting function.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When we run with sanitizers, we set abort_on_error=1 so that the tests
themselves can detect problems directly (when the buggy program exits
with SIGABRT). This has one blind spot, though: we don't always check
the exit codes for all programs (e.g., helpers like upload-pack invoked
behind the scenes).
For ASan and UBSan this is mostly fine; they exit as soon as they see an
error, so the unexpected abort of the program causes the test to fail
anyway.
But for LSan, the program runs to completion, since we can only check
for leaks at the end. And in that case we could miss leak reports. And
thus we started checking LSan logs in faececa53f (test-lib: have the
"check" mode for SANITIZE=leak consider leak logs, 2022-07-28).
Originally the logs were optional, but logs are generated (and checked)
always as of 8c1d6691bc (test-lib: GIT_TEST_SANITIZE_LEAK_LOG enabled by
default, 2024-07-11). And we even check them for each test snippet, as
of cf1464331b (test-lib: check for leak logs after every test,
2024-09-24).
So now aborting on error is superfluous for LSan! We can get everything
we need by checking the logs. And checking the logs is actually
preferable, since it gives us more control over silencing false
positives (something we do not yet do, but will soon).
So let's tell LSan to just exit normally, even if it finds leaks. We can
do so with exitcode=0, which also suppresses the abort_on_error flag.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The extra "barrier" approach was too much code whose sole purpose
was to work around a race that is not even ours (i.e. in LSan's
teardown code).
In preparation for queuing a solution taking a much-less-invasive
approach, let's revert them.
CI jobs that run threaded programs under LSan has been giving false
positives from time to time, which has been worked around.
* jk/lsan-race-with-barrier:
grep: work around LSan threading race with barrier
index-pack: work around LSan threading race with barrier
thread-utils: introduce optional barrier type
Revert "index-pack: spawn threads atomically"
test-lib: use individual lsan dir for --stress runs
An earlier "csum-file checksum does not have to be computed with
sha1dc" topic had a few code paths that had initialized an
implementation of a hash function to be used by an unmatching hash
by mistake, which have been corrected.
* ps/weak-sha1-for-tail-sum-fix:
ci: exercise unsafe OpenSSL backend
builtin/fast-import: fix segfault with unsafe SHA1 backend
bulk-checkin: fix segfault with unsafe SHA1 backend
The custom allocator code in the reftable library did not handle
failing realloc() very well, which has been addressed.
* rs/reftable-realloc-errors:
t-reftable-merged: handle realloc errors
reftable: handle realloc error in parse_names()
reftable: fix allocation count on realloc error
reftable: avoid leaks on realloc error
In the preceding commit we have fixed a segfault when using an unsafe
SHA1 backend that is different from the safe one. This segfault only
went by unnoticed because we never set up an unsafe backend in our CI
systems. Fix this ommission by setting `OPENSSL_SHA1_UNSAFE` in our
TEST-vars job.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Same as with the preceding commit, git-fast-import(1) is using the safe
variant to initialize a hashfile checkpoint. This leads to a segfault
when passing the checkpoint into the hashfile subsystem because it would
use the unsafe variants instead:
++ git --git-dir=R/.git fast-import --big-file-threshold=1
AddressSanitizer:DEADLYSIGNAL
=================================================================
==577126==ERROR: AddressSanitizer: SEGV on unknown address 0x000000000040 (pc 0x7ffff7a01a99 bp 0x5070000009c0 sp 0x7fffffff5b30 T0)
==577126==The signal is caused by a READ memory access.
==577126==Hint: address points to the zero page.
#0 0x7ffff7a01a99 in EVP_MD_CTX_copy_ex (/nix/store/h1ydpxkw9qhjdxjpic1pdc2nirggyy6f-openssl-3.3.2/lib/libcrypto.so.3+0x201a99) (BuildId: 41746a580d39075fc85e8c8065b6c07fb34e97d4)
#1 0x555555ddde56 in openssl_SHA1_Clone ../sha1/openssl.h:40:2
#2 0x555555dce2fc in git_hash_sha1_clone_unsafe ../object-file.c:123:2
#3 0x555555c2d5f8 in hashfile_checkpoint ../csum-file.c:211:2
#4 0x5555559647d1 in stream_blob ../builtin/fast-import.c:1110:2
#5 0x55555596247b in parse_and_store_blob ../builtin/fast-import.c:2031:3
#6 0x555555967f91 in file_change_m ../builtin/fast-import.c:2408:5
#7 0x55555595d8a2 in parse_new_commit ../builtin/fast-import.c:2768:4
#8 0x55555595bb7a in cmd_fast_import ../builtin/fast-import.c:3614:4
#9 0x555555b1f493 in run_builtin ../git.c:480:11
#10 0x555555b1bfef in handle_builtin ../git.c:740:9
#11 0x555555b1e6f4 in run_argv ../git.c:807:4
#12 0x555555b1b87a in cmd_main ../git.c:947:19
#13 0x5555561649e6 in main ../common-main.c:64:11
#14 0x7ffff742a1fb in __libc_start_call_main (/nix/store/65h17wjrrlsj2rj540igylrx7fqcd6vq-glibc-2.40-36/lib/libc.so.6+0x2a1fb) (BuildId: bf320110569c8ec2425e9a0c5e4eb7e97f1fb6e4)
#15 0x7ffff742a2b8 in __libc_start_main@GLIBC_2.2.5 (/nix/store/65h17wjrrlsj2rj540igylrx7fqcd6vq-glibc-2.40-36/lib/libc.so.6+0x2a2b8) (BuildId: bf320110569c8ec2425e9a0c5e4eb7e97f1fb6e4)
#16 0x555555772c84 in _start (git+0x21ec84)
==577126==Register values:
rax = 0x0000511000000cc0 rbx = 0x0000000000000000 rcx = 0x000000000000000c rdx = 0x0000000000000000
rdi = 0x0000000000000000 rsi = 0x00005070000009c0 rbp = 0x00005070000009c0 rsp = 0x00007fffffff5b30
r8 = 0x0000000000000000 r9 = 0x0000000000000000 r10 = 0x0000000000000000 r11 = 0x00007ffff7a01a30
r12 = 0x0000000000000000 r13 = 0x00007fffffff6b60 r14 = 0x00007ffff7ffd000 r15 = 0x00005555563b9910
AddressSanitizer can not provide additional info.
SUMMARY: AddressSanitizer: SEGV (/nix/store/h1ydpxkw9qhjdxjpic1pdc2nirggyy6f-openssl-3.3.2/lib/libcrypto.so.3+0x201a99) (BuildId: 41746a580d39075fc85e8c8065b6c07fb34e97d4) in EVP_MD_CTX_copy_ex
==577126==ABORTING
./test-lib.sh: line 1039: 577126 Aborted git --git-dir=R/.git fast-import --big-file-threshold=1 < input
error: last command exited with $?=134
not ok 167 - R: blob bigger than threshold
The segfault is only exposed in case the unsafe and safe backends are
different from one another.
Fix the issue by initializing the context with the unsafe SHA1 variant.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In 1b9e9be8b4 (csum-file.c: use unsafe SHA-1 implementation when
available, 2024-09-26) we have converted our `struct hashfile` to use
the unsafe SHA1 backend, which results in a significant speedup. One
needs to be careful with how to use that structure now though because
callers need to consistently use either the safe or unsafe variants of
SHA1, as otherwise one can easily trigger corruption.
As it turns out, we have one inconsistent usage in our tree because we
directly initialize `struct hashfile_checkpoint::ctx` with the safe
variant of SHA1, but end up writing to that context with the unsafe
ones. This went unnoticed so far because our CI systems do not exercise
different hash functions for these two backends, and consequently safe
and unsafe variants are equivalent. But when using SHA1DC as safe and
OpenSSL as unsafe backend this leads to a crash an t1050:
++ git -c core.compression=0 add large1
AddressSanitizer:DEADLYSIGNAL
=================================================================
==1367==ERROR: AddressSanitizer: SEGV on unknown address 0x000000000040 (pc 0x7ffff7a01a99 bp 0x507000000db0 sp 0x7fffffff5690 T0)
==1367==The signal is caused by a READ memory access.
==1367==Hint: address points to the zero page.
#0 0x7ffff7a01a99 in EVP_MD_CTX_copy_ex (/nix/store/h1ydpxkw9qhjdxjpic1pdc2nirggyy6f-openssl-3.3.2/lib/libcrypto.so.3+0x201a99) (BuildId: 41746a580d39075fc85e8c8065b6c07fb34e97d4)
#1 0x555555ddde56 in openssl_SHA1_Clone ../sha1/openssl.h:40:2
#2 0x555555dce2fc in git_hash_sha1_clone_unsafe ../object-file.c:123:2
#3 0x555555c2d5f8 in hashfile_checkpoint ../csum-file.c:211:2
#4 0x555555b9905d in deflate_blob_to_pack ../bulk-checkin.c:286:4
#5 0x555555b98ae9 in index_blob_bulk_checkin ../bulk-checkin.c:362:15
#6 0x555555ddab62 in index_blob_stream ../object-file.c:2756:9
#7 0x555555dda420 in index_fd ../object-file.c:2778:9
#8 0x555555ddad76 in index_path ../object-file.c:2796:7
#9 0x555555e947f3 in add_to_index ../read-cache.c:771:7
#10 0x555555e954a4 in add_file_to_index ../read-cache.c:804:9
#11 0x5555558b5c39 in add_files ../builtin/add.c:355:7
#12 0x5555558b412e in cmd_add ../builtin/add.c:578:18
#13 0x555555b1f493 in run_builtin ../git.c:480:11
#14 0x555555b1bfef in handle_builtin ../git.c:740:9
#15 0x555555b1e6f4 in run_argv ../git.c:807:4
#16 0x555555b1b87a in cmd_main ../git.c:947:19
#17 0x5555561649e6 in main ../common-main.c:64:11
#18 0x7ffff742a1fb in __libc_start_call_main (/nix/store/65h17wjrrlsj2rj540igylrx7fqcd6vq-glibc-2.40-36/lib/libc.so.6+0x2a1fb) (BuildId: bf320110569c8ec2425e9a0c5e4eb7e97f1fb6e4)
#19 0x7ffff742a2b8 in __libc_start_main@GLIBC_2.2.5 (/nix/store/65h17wjrrlsj2rj540igylrx7fqcd6vq-glibc-2.40-36/lib/libc.so.6+0x2a2b8) (BuildId: bf320110569c8ec2425e9a0c5e4eb7e97f1fb6e4)
#20 0x555555772c84 in _start (git+0x21ec84)
==1367==Register values:
rax = 0x0000511000001080 rbx = 0x0000000000000000 rcx = 0x000000000000000c rdx = 0x0000000000000000
rdi = 0x0000000000000000 rsi = 0x0000507000000db0 rbp = 0x0000507000000db0 rsp = 0x00007fffffff5690
r8 = 0x0000000000000000 r9 = 0x0000000000000000 r10 = 0x0000000000000000 r11 = 0x00007ffff7a01a30
r12 = 0x0000000000000000 r13 = 0x00007fffffff6b38 r14 = 0x00007ffff7ffd000 r15 = 0x00005555563b9910
AddressSanitizer can not provide additional info.
SUMMARY: AddressSanitizer: SEGV (/nix/store/h1ydpxkw9qhjdxjpic1pdc2nirggyy6f-openssl-3.3.2/lib/libcrypto.so.3+0x201a99) (BuildId: 41746a580d39075fc85e8c8065b6c07fb34e97d4) in EVP_MD_CTX_copy_ex
==1367==ABORTING
./test-lib.sh: line 1023: 1367 Aborted git $config add large1
error: last command exited with $?=134
not ok 4 - add with -c core.compression=0
Fix the issue by using the unsafe variant instead.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
There's a race with LSan when spawning threads and one of the threads
calls die(). We worked around one such problem with index-pack in the
previous commit, but it exists in git-grep, too. You can see it with:
make SANITIZE=leak THREAD_BARRIER_PTHREAD=YesOnLinux
cd t
./t0003-attributes.sh --stress
which fails pretty quickly with:
==git==4096424==ERROR: LeakSanitizer: detected memory leaks
Direct leak of 32 byte(s) in 1 object(s) allocated from:
#0 0x7f906de14556 in realloc ../../../../src/libsanitizer/lsan/lsan_interceptors.cpp:98
#1 0x7f906dc9d2c1 in __pthread_getattr_np nptl/pthread_getattr_np.c:180
#2 0x7f906de2500d in __sanitizer::GetThreadStackTopAndBottom(bool, unsigned long*, unsigned long*) ../../../../src/libsanitizer/sanitizer_common/sanitizer_linux_libcdep.cpp:150
#3 0x7f906de25187 in __sanitizer::GetThreadStackAndTls(bool, unsigned long*, unsigned long*, unsigned long*, unsigned long*) ../../../../src/libsanitizer/sanitizer_common/sanitizer_linux_libcdep.cpp:614
#4 0x7f906de17d18 in __lsan::ThreadStart(unsigned int, unsigned long long, __sanitizer::ThreadType) ../../../../src/libsanitizer/lsan/lsan_posix.cpp:53
#5 0x7f906de143a9 in ThreadStartFunc<false> ../../../../src/libsanitizer/lsan/lsan_interceptors.cpp:431
#6 0x7f906dc9bf51 in start_thread nptl/pthread_create.c:447
#7 0x7f906dd1a677 in __clone3 ../sysdeps/unix/sysv/linux/x86_64/clone3.S:78
As with the previous commit, we can fix this by inserting a barrier that
makes sure all threads have finished their setup before continuing. But
there's one twist in this case: the thread which calls die() is not one
of the worker threads, but the main thread itself!
So we need the main thread to wait in the barrier, too, until all
threads have gotten to it. And thus we initialize the barrier for
num_threads+1, to account for all of the worker threads plus the main
one.
If we then test as above, t0003 should run indefinitely.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We sometimes get false positives from our linux-leaks CI job because of
a race in LSan itself. The problem is that one thread is still
initializing its stack in LSan's code (and allocating memory to do so)
while anothe thread calls die(), taking down the whole process and
triggering a leak check.
The problem is described in more detail in 993d38a066 (index-pack: spawn
threads atomically, 2024-01-05), which tried to fix it by pausing worker
threads until all calls to pthread_create() had completed. But that's
not enough to fix the problem, because the LSan setup code runs in the
threads themselves. So even though pthread_create() has returned, we
have no idea if all threads actually finished their setup before letting
any of them do real work.
We can fix that by using a barrier inside the threads themselves,
waiting for all of them to hit the start of their main function before
any of them proceed.
You can test for the race by running:
make SANITIZE=leak THREAD_BARRIER_PTHREAD=YesOnLinux
cd t
./t5309-pack-delta-cycles.sh --stress
which fails quickly before this patch, and should run indefinitely
without it.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
One thread primitive we don't yet support is a barrier: it waits for all
threads to reach a synchronization point before letting any of them
continue. This would be useful for avoiding the LSan race we see in
index-pack (and other places) by having all threads complete their
initialization before any of them start to do real work.
POSIX introduced a pthread_barrier_t in 2004, which does what we want.
But if we want to rely on it:
1. Our Windows pthread emulation would need a new set of wrapper
functions. There's a Synchronization Barrier primitive there, which
was introduced in Windows 8 (which is old enough for us to depend
on).
2. macOS (and possibly other systems) has pthreads but not
pthread_barrier_t. So there we'd have to implement our own barrier
based on the mutex and cond primitives.
Those are do-able, but since we only care about avoiding races in our
LSan builds, there's an easier way: make it a noop on systems without a
native pthread barrier.
This patch introduces a "maybe_thread_barrier" API. The clunky name
(rather than just using pthread_barrier directly) should hopefully clue
people in that on some systems it will do nothing. It's wired to a
Makefile knob which has to be triggered manually, and we enable it for
the linux-leaks CI jobs (since we know we'll have it there).
There are some other possible options:
- we could turn it on all the time for Linux systems based on uname.
But we really only care about it for LSan builds, and there is no
need to add extra code to regular builds.
- we could turn it on only for LSan builds. But that would break
builds on non-Linux platforms (like macOS) that otherwise should
support sanitizers.
- we could trigger only on the combination of Linux and LSan together.
This isn't too hard to do, but the uname check isn't completely
accurate. It is really about what your libc supports, and non-glibc
systems might not have it (though at least musl seems to).
So we'd risk breaking builds on those systems, which would need to
add a new knob. Though the upside would be that running local "make
SANITIZE=leak test" would be protected automatically.
And of course none of this protects LSan runs from races on systems
without pthread barriers. It's probably OK in practice to protect only
our CI jobs, though. The race is rare-ish and most leak-checking happens
through CI.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This reverts commit 993d38a0669a8056d496797516e743e26b6b8b54.
That commit was trying to solve a race between LSan setting up the
threads stack and another thread calling exit(), by making sure that all
pthread_create() calls have finished before doing any work that might
trigger the exit().
But that isn't sufficient. The setup code actually runs in the
individual threads themselves, not in the spawning thread's call to
pthread_create(). So while it may have improved the race a bit, you can
still trigger it pretty quickly with:
make SANITIZE=leak
cd t
./t5309-pack-delta-cycles.sh --stress
Let's back out that failed attempt so we can try again.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When storing output in test-results/, we usually give each numbered run
in a --stress set its own output file. But we don't do that for storing
LSan logs, so something like:
./t0003-attributes.sh --stress
will have many scripts simultaneously creating, writing to, and deleting
the test-results/t0003-attributes.leak directory. This can cause logs
from one run to be attributed to another, spurious failures when
creation and deletion race, and so on.
This has always been broken, but nobody noticed because it's rare to do
a --stress run with LSan (since the point is for the code to run quickly
many times in order to hit races). But if you're trying to find a race
in the leak sanitizing code, it makes sense to use these together.
We can fix it by using $TEST_RESULTS_BASE, which already incorporates
the stress job suffix.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The developer documentation has been updated to give the latest
info on gitk and git-gui maintainer.
* as/gitk-git-gui-repo-update:
Update the official repo of gitk
Check reallocation errors in unit tests, like everywhere else.
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Check the final reallocation for adding the terminating NULL and handle
it just like those in the loop. Simply use REFTABLE_ALLOC_GROW instead
of keeping the REFTABLE_REALLOC_ARRAY call and adding code to preserve
the original pointer value around it.
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When realloc(3) fails, it returns NULL and keeps the original allocation
intact. REFTABLE_ALLOC_GROW overwrites both the original pointer and
the allocation count variable in that case, simultaneously leaking the
original allocation and misrepresenting the number of storable items.
parse_names() avoids the leak by keeping the original pointer if
reallocation fails, but still increase the allocation count in such a
case as if it succeeded. That's OK, because the error handling code
just frees everything and doesn't look at names_cap anymore.
reftable_buf_add() does the same, but here it is a problem as it leaves
the reftable_buf in a broken state, with ->alloc being roughly twice as
big as the actually allocated memory, allowing out-of-bounds writes in
subsequent calls.
Reimplement REFTABLE_ALLOC_GROW to avoid leaks, keep allocation counts
in sync and still signal failures to callers while avoiding code
duplication in callers. Make it an expression that evaluates to 0 if no
reallocation is needed or it succeeded and 1 on failure while keeping
the original pointer and allocation counter values.
Adjust REFTABLE_ALLOC_GROW_OR_NULL to the new calling convention for
REFTABLE_ALLOC_GROW, but keep its support for non-size_t alloc variables
for now.
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When realloc(3) fails, it returns NULL and keeps the original allocation
intact. REFTABLE_ALLOC_GROW overwrites both the original pointer and
the allocation count variable in that case, simultaneously leaking the
original allocation and misrepresenting the number of storable items.
parse_names() and reftable_buf_add() avoid leaking by restoring the
original pointer value on failure, but all other callers seem to be OK
with losing the old allocation. Add a new variant of the macro,
REFTABLE_ALLOC_GROW_OR_NULL, which plugs the leak and zeros the
allocation counter. Use it for those callers.
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Wire up sanity checks for Meson to verify that no man pages are missing.
This check is similar to the same check we already have for our tests.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The "check-meson" target uses process substitution to check whether
extracted contents from "meson.build" match expected contents. Process
substitution is unportable though and thus the target will fail when
using for example Dash.
Fix this by writing data into a temporary directory.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Now that we generate man pages, articles and user manual with Meson the
only thing that is still missing in an installation of HTML documents is
a couple of static files. Wire these up to finalize Meson's support for
generating HTML documentation.
Diffing an installation that uses our Makefile with an installation that
uses Meson only surfaces a couple of discepancies now:
- Meson doesn't install "everyday.html" and "git-remote-helpers.html".
These files are marked as obsolete and don't contain any useful
information anymore: they simply point to their modern equivalents.
- Meson doesn't install "*.txt" files when asking for HTML docs. I'm
not sure why our Makefiles do this in the first place, and it does
seem like the resulting installation is fully functional even
without those files.
Other than that, both layout and file contents are the exact same.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
While the Meson build system already knows to generate man pages and our
user manual, it does not yet generate the random assortment of articles
that we have. Plug this gap.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The "howto-index.sh" is used to generate an index of our how-to docs. It
receives as input the paths to these documents, which would typically be
relative to the "Documentation/" directory in Makefile-based builds. In
an out-of-tree build though it will get relative that may be rooted
somewhere else entirely.
The file paths do end up in the generated index, and the expectation is
that they should always start with "howto/". But for out-of-tree builds
we would populate it with the paths relative to the build directory,
which is wrong.
Fix the issue by using `$(basename "$file")` to generate the path. While
at it, move the script into "howto/" to align it with the location of
the comparable "api-index.sh" script.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The "api-index.sh" script generates an index of API-related
documentation. The script does not handle out-of-tree builds and thus
cannot be used easily by Meson.
Refactor it to be independent of locations by both accepting a source
directory where the API docs live as well as a path to an output file.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Our documentation contains a user manual that gives people a short
introduction to Git. Our Makefile knows to generate the manual into
three different formats: an HTML page, a PDF and an info page. The Meson
build instructions don't yet generate any of these.
While wiring up all these formats I hit a couple of road blocks with how
we generate our info pages. Even though I eventually resolved these, it
made me question whether anybody actually uses info pages in the first
place. Checking through a couple of downstream consumers I couldn't find
a single user of either the info pages nor of our PDF manual in Arch
Linux, Debian, Fedora, Ubuntu, FreeBSD or OpenBSDFedora. So it's rather
safe to assume that there aren't really any users out there, and thus
the added complexity does not seem worth it.
Wire up support for building the user manual in HTML format and
conciously skip over the other two formats. This is basically a form of
silent deprecation: if people out there use the other two formats they
will eventually complain about them missing in Meson, which means we can
wire them up at a later point. If they don't we can phase out these
formats eventually.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When generating our user manual we set up a bit of extra configuration
compared to our normal configuration. This is done by having an extra
"user-manual.conf" file that Asciidoc seems to pull in automatically due
to matching filenames with "user-manual.txt". This dependency is quite
hidden though and thus easy to miss. Furthermore, it seems that Asciidoc
does not know to pull it in for out-of-tree builds where we use relative
paths.
The setup in AsciiDoctor is somewhat different: instead of having two
sets of configuration, we condition the use of manual-specific configs
based on whether the document type is "book". And as we only build our
user manual with that type this is sufficient.
Use the same trick for our user manual by inlining the configuration
into "asciidoc.conf.in" and making it conditional on whether or not
"doctype-book" is defined.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When generating HTML pages for our man pages we only generate them for
category 1 in Meson, which are the pages corresponding to our built-in
commands. I cannot tell why I added this filter though: our Makefile
installs all man pages, so a Meson-based build misses out on many of
them.
Fix this by removing the filter.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Our buildsystems generate a list of diff and merge tools that ultimately
end up in our documentation. And while Meson does wire up the logic, it
tries to use the TOOL_MODE environment variable to set up the mode. This
is wrong though: the mode is set via an argument that we have fixed to
'diff' mode by accident.
Fix this such that merge tools are properly generated.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
A couple of Meson documentation targets use `meson.current_source_dir()`
to resolve inputs. This has the downside that it does not automagically
make Meson track these inputs as a dependency. After all, string
arguments really can be anything, even if they happen to match an actual
filesystem path.
Adapt these build targets to instead use inputs.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>