A bunch of tests rely on Perl to print data in various different ways.
These usages fall into the following categories:
- Print data conditionally by matching patterns. These usecases can be
converted to use awk(1) rather easily.
- Print data repeatedly. These usecases can typically be converted to
use a combination of `test-tool genzeros` and sed(1).
- Print data in reverse. These usecases can be converted to use
awk(1) or `sort -r`.
Refactor the tests accordingly so that we can drop a couple of
PERL_TEST_HELPERS prerequisites.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In the early days of Git, Perl was used quite prominently throughout the
project. This has changed significantly as almost all of the executables
we ship nowadays have eventually been rewritten in C. Only a handful of
subsystems remain that require Perl:
- gitweb, a read-only web interface.
- A couple of scripts that allow importing repositories from GNU Arch,
CVS and Subversion.
- git-send-email(1), which can be used to send mails.
- git-request-pull(1), which is used to request somebody to pull from
a URL by sending an email.
- git-filter-branch(1), which uses Perl with the `--state-branch`
option. This command is typically recommended against nowadays in
favor of git-filter-repo(1).
- Our Perl bindings for Git.
- The netrc Git credential helper.
None of these subsystems can really be considered to be part of the
"core" of Git, and an installation without them is fully functional.
It is more likely than not that an end user wouldn't even notice that
any features are missing if those tools weren't installed. But while
Perl nowadays very much is an optional dependency of Git, there is a
significant limitation when Perl isn't available: developers cannot run
our test suite.
Preceding commits have started to lift this restriction by removing the
strict dependency on Perl in many central parts of the test library. But
there are still many tests that rely on small Perl helpers to do various
different things.
Introduce a new PERL_TEST_HELPERS prerequisite that guards all tests
that require Perl. This prerequisite is explicitly different than the
preexisting PERL prerequisite:
- PERL records whether or not features depending on the Perl
interpreter are built.
- PERL_TEST_HELPERS records whether or not a Perl interpreter is
available for our tests.
By having these two separate prerequisites we can thus distinguish
between tests that inherently depend on Perl because the underlying
feature does, and those tests that depend on Perl because the test
itself is using Perl.
Adapt all tests to set the PERL_TEST_HELPERS prerequisite as needed.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Now that the default value for TEST_PASSES_SANITIZE_LEAK is `true` there
is no longer a need to have that variable declared in all of our tests.
Drop it.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
git-rev-list(1) can speed up its object size calculations for reachable
objects via a bitmap walk, if there is any bitmap. This is done in
`try_bitmap_disk_usage()`, which tries to optimistically load the bitmap
and then use it, if available. It never frees it though, leading to a
memory leak. Fix this.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When performing revision queries with `--objects` and
`--use-bitmap-index`, the output may incorrectly contain objects which
are packed, even when the `--unpacked` option is given. This affects
traversals, but also other querying operations, like `--count`,
`--disk-usage`, etc.
Like in the previous commit, the fix is to exclude those objects from
the result set before they are shown to the user (or, in this case,
before the bitmap containing the result of the traversal is enumerated
and its objects listed).
This is performed by a new function in pack-bitmap.c, called
`filter_packed_objects_from_bitmap()`. Note that we do not have to
inspect individual bits in the result bitmap, since we know that the
first N (where N is the number of objects in the bitmap's pack/MIDX)
bits correspond to objects which packed by definition.
In other words, for an object to have a bitmap position (not in the
extended index), it must appear in either the bitmap's pack or one of
the packs in its MIDX.
This presents an appealing optimization to us, which is that we can
simply memset() the corresponding number of `eword_t`'s to zero,
provided that we handle any objects which spill into the next word (but
don't occupy all 64 bits of the word itself).
We only have to handle objects in the bitmap's extended index. These
objects may (or may not) appear in one or more pack(s). Since these
objects are known to not appear in either the bitmap's MIDX or pack,
they may be stored as loose, appear in other pack(s), or both.
Before returning a bitmap containing the result of the traversal back to
the caller, drop any bits from the extended index which appear in one or
more packs. This implements the correct behavior for rev-list operations
which use the bitmap index to compute their result.
Co-authored-by: Jeff King <peff@peff.net>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The '--disk-usage' option for git-rev-list was introduced in 16950f8384
(rev-list: add --disk-usage option for calculating disk usage, 2021-02-09).
This is very useful for people inspect their git repo's objects usage
infomation, but the resulting number is quit hard for a human to read.
Teach git rev-list to output a human readable result when using
'--disk-usage'.
Signed-off-by: Li Linchao <lilinchao@oschina.cn>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
It can sometimes be useful to see which refs are contributing to the
overall repository size (e.g., does some branch have a bunch of objects
not found elsewhere in history, which indicates that deleting it would
shrink the size of a clone).
You can find that out by generating a list of objects, getting their
sizes from cat-file, and then summing them, like:
git rev-list --objects --no-object-names main..branch
git cat-file --batch-check='%(objectsize:disk)' |
perl -lne '$total += $_; END { print $total }'
Though note that the caveats from git-cat-file(1) apply here. We "blame"
base objects more than their deltas, even though the relationship could
easily be flipped. Still, it can be a useful rough measure.
But one problem is that it's slow to run. Teaching rev-list to sum up
the sizes can be much faster for two reasons:
1. It skips all of the piping of object names and sizes.
2. If bitmaps are in use, for objects that are in the
bitmapped packfile we can skip the oid_object_info()
lookup entirely, and just ask the revindex for the
on-disk size.
This patch implements a --disk-usage option which produces the same
answer in a fraction of the time. Here are some timings using a clone of
torvalds/linux:
[rev-list piped to cat-file, no bitmaps]
$ time git rev-list --objects --no-object-names --all |
git cat-file --buffer --batch-check='%(objectsize:disk)' |
perl -lne '$total += $_; END { print $total }'
1459938510
real 0m29.635s
user 0m38.003s
sys 0m1.093s
[internal, no bitmaps]
$ time git rev-list --disk-usage --objects --all
1459938510
real 0m31.262s
user 0m30.885s
sys 0m0.376s
Even though the wall-clock time is slightly worse due to parallelism,
notice the CPU savings between the two. We saved 21% of the CPU just by
avoiding the pipes.
But the real win is with bitmaps. If we use them without the new option:
[rev-list piped to cat-file, bitmaps]
$ time git rev-list --objects --no-object-names --all --use-bitmap-index |
git cat-file --batch-check='%(objectsize:disk)' |
perl -lne '$total += $_; END { print $total }'
1459938510
real 0m6.244s
user 0m8.452s
sys 0m0.311s
then we're faster to generate the list of objects, but we still spend a
lot of time piping and looking things up. But if we do both together:
[internal, bitmaps]
$ time git rev-list --disk-usage --objects --all --use-bitmap-index
1459938510
real 0m0.219s
user 0m0.169s
sys 0m0.049s
then we get the same answer much faster.
For "--all", that answer will correspond closely to "du objects/pack",
of course. But we're actually checking reachability here, so we're still
fast when we ask for more interesting things:
$ time git rev-list --disk-usage --use-bitmap-index v5.0..v5.10
374798628
real 0m0.429s
user 0m0.356s
sys 0m0.072s
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>