The code to maintain mapping between object names in multiple hash
functions is being added, written in Rust.
* bc/sha1-256-interop-02:
object-file-convert: always make sure object ID algo is valid
rust: add a small wrapper around the hashfile code
rust: add a new binary object map format
rust: add functionality to hash an object
rust: add a build.rs script for tests
hash: expose hash context functions to Rust
write-or-die: add an fsync component for the object map
csum-file: define hashwrite's count as a uint32_t
rust: add additional helpers for ObjectID
hash: add a function to look up hash algo structs
rust: add a hash algorithm abstraction
rust: add a ObjectID struct
hash: use uint32_t for object_id algorithm
conversion: don't crash when no destination algo
repository: require Rust support for interoperability
"git history" history rewriting UI.
* ps/history:
builtin/history: implement "reword" subcommand
builtin: add new "history" command
wt-status: provide function to expose status for trees
replay: yield the object ID of the final rewritten commit
replay: small set of cleanups
builtin/replay: move core logic into "libgit.a"
builtin/replay: extract core logic to replay revisions
"auto filter" logic for large-object promisor remote.
Comments?
* cc/lop-filter-auto:
fetch-pack: wire up and enable auto filter logic
promisor-remote: keep advertised filter in memory
list-objects-filter-options: implement auto filter resolution
list-objects-filter-options: support 'auto' mode for --filter
doc: fetch: document `--filter=<filter-spec>` option
fetch: make filter_options local to cmd_fetch()
clone: make filter_options local to cmd_clone()
promisor-remote: allow a client to store fields
promisor-remote: refactor initialising field lists
Preparation of xdiff/ codebase to work with Rust
Comments?
* en/xdiff-cleanup-3:
SQUASH??? cocci
xdiff: move xdl_cleanup_records() from xprepare.c to xdiffi.c
xdiff: remove dependence on xdlclassifier from xdl_cleanup_records()
xdiff: replace xdfile_t.dend with xdfenv_t.delta_end
xdiff: replace xdfile_t.dstart with xdfenv_t.delta_start
xdiff: cleanup xdl_trim_ends()
xdiff: use xdfenv_t in xdl_trim_ends() and xdl_cleanup_records()
xdiff: let patience and histogram benefit from xdl_trim_ends()
xdiff: don't waste time guessing the number of lines
xdiff: make classic diff explicit by creating xdl_do_classic_diff()
ivec: introduce the C side of ivec
The iconv library on macOS fails to correctly handle stateful
ISO/IEC 2022 encoded strings. Work it around instead of replacing
it wholesale from homebrew.
* tb/macos-iconv-workarounds:
utf8.c: Enable workaround for iconv under macOS 14/15
utf8.c: Prepare workaround for iconv under macOS 14/15
Introduce a more robust way to parse a decimal integer stored in a
piece of memory that is not necessarily terminated with NUL (which
Asan strict-string-check complains even when use of strtol() is
safe due to varified existence of whitespace after the digits).
* jk/parse-int:
fsck: use parse_unsigned_from_buf() for parsing timestamp
cache-tree: use parse_int_from_buf()
parse: add functions for parsing from non-string buffers
parse: prefer bool to int for boolean returns
When rewriting history via git-rebase(1) there are a few very common use
cases:
- The ordering of two commits should be reversed.
- A commit should be split up into two commits.
- A commit should be dropped from the history completely.
- Multiple commits should be squashed into one.
- Editing an existing commit that is not the tip of the current
branch.
While these operations are all doable, it often feels needlessly kludgey
to do so by doing an interactive rebase, using the editor to say what
one wants, and then perform the actions. Also, some operations like
splitting up a commit into two are way more involved than that and
require a whole series of commands.
Rebases also do not update dependent branches. The use of stacked
branches has grown quite common with competing version control systems
like Jujutsu though, so it clearly is a need that users have. While
rebases _can_ serve this use case if one always works on the latest
stacked branch, it is somewhat awkward and very easy to get wrong.
Add a new "history" command to plug these gaps. This command will have
several different subcommands to imperatively rewrite history for common
use cases like the above.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Move the core logic used to replay commits into "libgit.a" so that it
can be easily reused by other commands. It will be used in a subsequent
commit where we're about to introduce a new git-history(1) command.
Note that with this change we have no sign-comparison warnings anymore,
and neither do we depend on `the_repository`.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The previous commit introduced a workaround in utf8.c to deal
with broken iconv implementations.
It is enabled when
A MacOS version is used that has a buggy iconv library and
there is no external library provided (and linked against)
from neither MacPorts nor Homebrew.
Signed-off-by: Torsten Bögershausen <tboegi@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Workaround the "iconv" shipped as part of macOS, which is broken
handling stateful ISO/IEC 2022 encoded strings.
* rs/macos-iconv-workaround:
macOS: use iconv from Homebrew if needed and present
macOS: make Homebrew use configurable
Trying to use Rust's Vec in C, or git's ALLOC_GROW() macros (via
wrapper functions) in Rust is painful because:
* C doesn't define its own vector type, and even though Rust does
have Vec its painful to use on the C side (more on that below).
However its still not viable to use Rust's Vec type because Git
needs to be able to compile without Rust. So ivec was created
expressley to be interoperable between C and Rust without needing
Rust.
* C doing vector things the Rust way would require wrapper functions,
and Rust doing vector things the C way would require wrapper
functions, so ivec was created to ensure a consistent contract
between the 2 languages for how to manipulate a vector.
* Currently, Rust defines its own 'Vec' type that is generic, but its
memory allocator and struct layout weren't designed for
interoperability with C (or any language for that matter), meaning
that the C side cannot push to or expand a 'Vec' without defining
wrapper functions in Rust that C can call. Without special care,
the two languages might use different allocators (malloc/free on
the C side, and possibly something else in Rust), which would make
it difficult for a function in one language to free elements
allocated by a call from a function in the other language.
* Similarly, git defines ALLOC_GROW() and related macros in
git-compat-util.h. While we could add functions allowing Rust to
invoke something similar to those macros, passing three variables
(pointer, length, allocated_size) instead of a single variable
(vector) across the language boundary requires more cognitive
overhead for readers to keep track of and makes it easier to make
mistakes. Further, for low-level components that we want to
eventually convert to pure Rust, such triplets would feel very out
of place.
To address these issue, introduce a new type, ivec -- short for
interoperable vector. (We refer to it as 'ivec' generally, though on
the Rust side the struct is called IVec to match Rust style.) This new
type is specifically designed for FFI purposes, so that both languages
handle the vector in the same way, though it could be used on either
side independently. This type is designed such that it can easily be
replaced by a Rust 'Vec' once interoperability is no longer a concern.
One particular item to note is that Git's macros to handle vec
operations infer the amount that a vec needs to grow from the size of
a pointer, but that makes it somewhat specific to the macros used in C.
To avoid defining every ivec function as a macro I opted to also
include an element_size field that allows concrete functions like
push() to know how much to grow the memory. This element_size also
helps in verifying that the ivec is correct when passing from C to
Rust.
Signed-off-by: Ezekiel Newren <ezekielnewren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The library function iconv(3) supplied with macOS versions 15.7.2
(Sequoia) and 26.1 (Tahoe) is unreliable when doing conversions from
ISO-2022-JP to UTF-8 in multiple steps; t3900 reports this breakage:
not ok 17 - ISO-2022-JP should be shown in UTF-8 now
not ok 25 - ISO-2022-JP should be shown in UTF-8 now
not ok 38 - commit --fixup into ISO-2022-JP from UTF-8
As a workaround, use libiconv from Homebrew, if available. Search it in
its default locations: /opt/homebrew for Apple Silicon and /usr/local
for macOS Intel, with the former taking precedence. Respect ICONVDIR if
already set by the user, though.
Helped-by: Koji Nakamaru <koji.nakamaru@gree.net>
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
On macOS we opportunistically use Homebrew-installed versions of
gettext(3) and msgfmt(1). Make that behavior configurable by providing
make variables to disable Homebrew usage (NO_HOMEBREW) and to allow
using a non-default installation location (HOMEBREW_PREFIX).
Include and link only the gettext keg via the symlink opt/gettext
pointing to its installed version instead of using the Homebrew prefix.
This is simpler and prevents accidentally including other libraries.
Suggested-by: Carlo Marcelo Arenas Belón <carenas@gmail.com>
Suggested-by: Torsten Bögershausen <tboegi@web.de>
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In a following commit, we are going to allow passing "auto" as a
<filterspec> to the `--filter=<filterspec>` option, but only for some
commands. Other commands that support the `--filter=<filterspec>`
option should still die() when 'auto' is passed.
Let's set up the "list-objects-filter-options.{c,h}" infrastructure to
support that:
- Add a new `unsigned int allow_auto_filter : 1;` flag to
`struct list_objects_filter_options` which specifies if "auto" is
accepted or not.
- Change gently_parse_list_objects_filter() to parse "auto" if it's
accepted.
- Make sure we die() if "auto" is combined with another filter.
- Update list_objects_filter_release() to preserve the
allow_auto_filter flag, as this function is often called (via
opt_parse_list_objects_filter) to reset the struct before parsing a
new value.
Let's also update `list-objects-filter.c` to recognize the new
`LOFC_AUTO` choice. Since "auto" must be resolved to a concrete filter
before filtering actually begins, initializing a filter with
`LOFC_AUTO` is invalid and will trigger a BUG().
Note that ideally combining "auto" with "auto" could be allowed, but in
practice, it's probably not worth the added code complexity. And if we
really want it, nothing prevents us to allow it in future work.
If we ever want to give a meaning to combining "auto" with a different
filter too, nothing prevents us to do that in future work either.
While at it, let's add a new "u-list-objects-filter-options.c" file for
`struct list_objects_filter_options` related unit tests. For now it
only tests gently_parse_list_objects_filter() though.
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Further application of MEMZERO_ARRAY() macro to the rest of the
code base.
* jc/memzero-array:
cocci: use MEMZERO_ARRAY() a bit more
coccicheck: emit the contents of cocci patch
MEMZERO_ARRAY() helper is introduced to avoid clearing only the
first N bytes of an N-element array whose elements are larger than
a byte.
* tc/memzero-array:
contrib/coccinelle: pass include paths to spatch(1)
git-compat-util: introduce MEMZERO_ARRAY() macro
Rewrite the only use of "mktemp()" that is subject to TOCTOU race
and Stop using the insecure "mktemp()" function.
* rs/ban-mktemp:
compat: remove gitmkdtemp()
banned.h: ban mktemp(3)
compat: remove mingw_mktemp()
compat: use git_mkdtemp()
wrapper: add git_mkdtemp()
The "git_istream" abstraction has been revamped to make it easier
to interface with pluggable object database design.
* ps/object-read-stream:
streaming: drop redundant type and size pointers
streaming: move into object database subsystem
streaming: refactor interface to be object-database-centric
streaming: move logic to read packed objects streams into backend
streaming: move logic to read loose objects streams into backend
streaming: make the `odb_read_stream` definition public
streaming: get rid of `the_repository`
streaming: rely on object sources to create object stream
packfile: introduce function to read object info from a store
streaming: move zlib stream into backends
streaming: create structure for filtered object streams
streaming: create structure for packed object streams
streaming: create structure for loose object streams
streaming: create structure for in-core object streams
streaming: allocate stream inside the backend-specific logic
streaming: explicitly pass packfile info when streaming a packed object
streaming: propagate final object type via the stream
streaming: drop the `open()` callback function
streaming: rename `git_istream` into `odb_read_stream`
Telling the user "you got some error messages" without showing what
the errors are is almost useless in CI environment, as the errors
cannot be examined without downloading build artifacts.
Arrange it to spew out the output when it fails.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Since Aug 2006, the DarwinPorts project renamed themselves as
MacPorts. Those who are not intimately familiar with the Opensource
ecosystem around macOS from olden days, the name DarwinPorts may not
ring a bell, even when they are using MacPorts.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Reviewed-by: Carlo Marcelo Arenas Belón <carenas@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In the previous commit a new coccinelle rule is added. But neiter
`make coccicheck` nor `meson compile coccicheck` did detect a case in
builtin/last-modified.c.
This case involves the field `scratch` in `struct last_modified`. This
field is of type `struct bitmap` and that struct has a member
`eword_t *words`. Both are defined in `ewah/ewok.h`. Now, while
builtin/last-modified.c does include that header (with the subdir in the
#include directive), it seems coccinelle does not process it. So it's
unaware of the type of `words` in the bitmap, and it doesn't recognize
the rule from previous commit that uses:
type T;
T *ptr;
Fix coccicheck by passing all possible include paths inside the Git
project so spatch(1) can find the headers and can determine the types.
Signed-off-by: Toon Claes <toon@iotcl.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
gitmkdtemp() has become a trivial wrapper around git_mkdtemp(). Remove
this now unnecessary layer of indirection.
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Various issues detected by Asan have been corrected.
* jk/asan-bonanza:
t: enable ASan's strict_string_checks option
fsck: avoid parse_timestamp() on buffer that isn't NUL-terminated
fsck: remove redundant date timestamp check
fsck: avoid strcspn() in fsck_ident()
fsck: assert newline presence in fsck_ident()
cache-tree: avoid strtol() on non-string buffer
Makefile: turn on NO_MMAP when building with ASan
pack-bitmap: handle name-hash lookups in incremental bitmaps
compat/mmap: mark unused argument in git_munmap()
If you have a buffer that is not NUL-terminated but want to parse an
integer, there aren't many good options. If you use strtol() and
friends, you risk running off the end of the buffer if there is no
non-digit terminating character. And even if you carefully make sure
that there is such a character, ASan's strict-string-check mode will
still complain.
You can copy bytes into a temporary buffer, terminate it, and then call
strtol(), but doing so adds some pitfalls (like making sure you soak up
whitespace and leading +/- signs, and reporting overflow for overly long
input). Or you can hand-parse the digits, but then you need to take some
care to handle overflow (and again, whitespace and +/- signs).
These things aren't impossible to do right, but it's error-prone to have
to do them in every spot that wants to do such parsing. So let's add
some functions which can be used across the code base.
There are a few choices regarding the interface and the implementation.
First, the implementation:
- I went with with parsing the digits (rather than buffering and
passing to libc functions). It ends up being a similar amount of
code because we have to do some parsing either way. And likewise
overflow detection depends on the exact type the caller wants, so we
either have to do it by hand or write a separate wrapper for
strtol(), strtoumax(), and so on.
- Unsigned overflow detection is done using the same techniques as in
unsigned_add_overflows(), etc. We can't use those macros directly
because our core function is type-agnostic (so the caller passes in
the max value, rather than us deriving it on the fly). This is
similar to how git_parse_int(), etc, work.
- Signed overflow detection assumes that we can express a negative
value with magnitude one larger than our maximum positive value
(e.g., -128..127 for a signed 8-bit value). I doubt this is
guaranteed by the standard, but it should hold in practice, and we
make the same assumption in git_parse_int(), etc. The nice thing
about this is that we can derive the range from the number of bits
in the type. For ints, you obviously could use INT_MIN..INT_MAX, but
for an arbitrary type, we can use maximum_signed_value_of_type().
- I didn't bother with handling bases other than 10. It would
complicate the code, and I suspect it won't be needed. We could
probably retro-fit it later without too much work, if need be.
For the interface:
- What do we call it? We have git_parse_int() and friends, which aim
to make parsing less error-prone. And in some ways, these are just
buffer (rather than string) versions of those functions. But not
entirely. Those functions are aimed at parsing a single user-facing
value. So they accept a unit prefix (e.g., "10k"), which we won't
always want. And they insist that the whole string is consumed
(rather than passing back an "end" pointer).
We also have strtol_i() and strtoul_ui() wrappers, which try to make
error handling simpler (especially around overflow), but mostly
behave like their libc counterparts. These also don't pass out an
end pointer, though.
So I started a new namespace, "parse_<type>_from_buf".
- Like those other functions above, we use an out-parameter to store
the result, which lets us return an error code directly. This avoids
the complicated errno dance for detecting overflow that you get with
strtol().
What should the error code look like? git_parse_int() uses a bool
for success/failure. But strtol_ui() uses the syscall-like "0 is
success, -1 is error" convention.
I went with the bool approach here. Since the names are closest to
those functions, I thought it would cause the least confusion.
- Unlike git_parse_signed() and friends, we do not insist that the
entire buffer be consumed. For parsing a specific standalone string
that makes sense, but within an unterminated buffer you are much
more likely to be parsing multiple fields from a larger data set.
We pass out an "end" pointer the same way strtol() does. Another
option is to accept the input as an in-out parameter and advance the
pointer ourselves (and likewise shrink the length pointer). That
would let you do something like:
if (!parse_int_from_buf(&p, &len, &out))
return error(...);
/* "p" and "len" were adjusted automatically */
if (!len || *p++ != ' ')
return error(...);
That saves a few lines of code in some spots, but requires a few
more in others (depending on whether the caller has a length in the
first place or is using an end pointer). Of the two callers I intend
to immediately convert, we have one of each type!
I went with the strtol() approach as flexible and time-tested.
- We could likewise take the input buffer as two pointers (start and
end) rather than a pointer and a length. That again makes life
easier for some callers and harder for others. I stuck with pointer
and length as the more usual interface.
- What happens when a caller passes in a NULL end pointer? This is
allowed by strtol(). But I think it's often a sign of a lurking bug,
because there's no way to know how much was consumed (and even if a
caller wants to assume everything is consumed, you have no way to
verify it). So it is simply an error in this interface (you'd get a
segfault).
I am tempted to say that if the end pointer is NULL the functions
could confirm that the entire buffer was consumed, as a convenience.
But that felt a bit magical and surprising.
Like git_parse_*(), there is a generic signed/unsigned helper, and then
we can add type-specific helpers on top. I've added an int helper here
to start, and we'll add more as we convert callers.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* jk/asan-bonanza:
t: enable ASan's strict_string_checks option
fsck: avoid parse_timestamp() on buffer that isn't NUL-terminated
fsck: remove redundant date timestamp check
fsck: avoid strcspn() in fsck_ident()
fsck: assert newline presence in fsck_ident()
cache-tree: avoid strtol() on non-string buffer
Makefile: turn on NO_MMAP when building with ASan
pack-bitmap: handle name-hash lookups in incremental bitmaps
compat/mmap: mark unused argument in git_munmap()
The "streaming" terminology is somewhat generic, so it may not be
immediately obvious that "streaming.{c,h}" is specific to the object
database. Rectify this by moving it into the "odb/" directory so that it
can be immediately attributed to the object subsystem.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Git often uses mmap() to access on-disk files. This leaves a blind spot
in our SANITIZE=address builds, since ASan does not seem to handle mmap
at all. Nor does the OS notice most out-of-bounds access, since it tends
to round up to the nearest page size (so depending on how big the map
is, you might have to overrun it by up to 4095 bytes to trigger a
segfault).
The previous commit demonstrates a memory bug that we missed. We could
have made a new test where the out-of-bounds access was much larger, or
where the mapped file ended closer to a page boundary. But the point of
running the test suite with sanitizers is to catch these problems
without having to construct specific tests.
Let's enable NO_MMAP for our ASan builds by default, which should give
us better coverage. This does increase the memory usage of Git, since
we're copying from the filesystem into heap. But the repositories in the
test suite tend to be small, so the overhead isn't really noticeable
(and ASan already has quite a performance penalty).
There are a few other known bugs that this patch will help flush out.
However, they aren't directly triggered in the test suite (yet). So
it's safe to turn this on now without breaking the test suite, which
will help us add new tests to demonstrate those other bugs as we fix
them.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Our new binary object map code avoids needing to be intimately involved
with file handling by simply writing data to an object implement Write.
This makes it very easy to test by writing to a Cursor wrapping a Vec
for tests, and thus decouples it from intimate knowledge about how we
handle files.
However, we will actually want to write our data to an actual file,
since that's the most practical way to persist data. Implement a
wrapper around the hashfile code that implements the Write trait so that
we can write our object map into a file.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Our current loose object format has a few problems. First, it is not
efficient: the list of object IDs is not sorted and even if it were,
there would not be an efficient way to look up objects in both
algorithms.
Second, we need to store mappings for things which are not technically
loose objects but are not packed objects, either, and so cannot be
stored in a pack index. These kinds of things include shallows, their
parents, and their trees, as well as submodules. Yet we also need to
implement a sensible way to store the kind of object so that we can
prune unneeded entries. For instance, if the user has updated the
shallows, we can remove the old values.
For these reasons, introduce a new binary object map format. The
careful reader will notice that it resembles very closely the pack index
v3 format. Add an in-memory object map as well, and allow writing to a
batched map, which can then be written later as one of the binary object
maps. Include several tests for round tripping and data lookup across
algorithms.
Note that the use of this code elsewhere in Git will involve some C code
and some C-compatible code in Rust that will be introduced in a future
commit. Thus, for example, we ignore the fact that if there is no
current batch and the caller asks for data to be written, this code does
nothing, mostly because this code also does not involve itself with
opening or manipulating files. The C code that we will add later will
implement this functionality at a higher level and take care of this,
since the code which is necessary for writing to the object store is
deeply involved with our C abstractions and it would require extensive
work (which would not be especially valuable at this point) to port
those to Rust.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Cargo uses the build.rs script to determine how to compile and link a
binary. The only binary we're generating, however, is for our tests,
but in a future commit, we're going to link against libgit.a for some
functionality and we'll need to make sure the test binaries are
complete.
Add a build.rs file for this case and specify the files we're going to
be linking against. Because we cannot specify different dependencies
when building our static library versus our tests, update the Makefile
to specify these dependencies for our static library to avoid race
conditions during build.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We'd like to be able to write some Rust code that can work with object
IDs. Add a structure here that's identical to struct object_id in C,
for easy use in sharing across the FFI boundary. We will use this
structure in several places in hot paths, such as index-pack or
pack-objects when converting between algorithms, so prioritize efficient
interchange over a more idiomatic Rust approach.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When Scalar was made a canonical part of Git in 7b5c93c6c68 (scalar:
include in standard Git build & installation, 2022-09-02), it was added
to all relevant Makefile targets except for the `strip` target.
Let's correct that.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The file "builtin/repo.c" uses utf8_strwidth() to calculate the display
width of UTF-8 characters in a table, but the resulting output is still
misaligned. Add test cases for both utf8_strwidth and utf8_strnwidth to
verify that they correctly compute the display width for UTF-8
characters.
Also updated the build configuration in Makefile and meson.build to
include the new test suite in the build process.
Signed-off-by: Jiang Xin <worldhello.net@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Clean-up "git repack" machinery to prepare for incremental update
of midx files.
* tb/incremental-midx-part-3.1: (49 commits)
builtin/repack.c: clean up unused `#include`s
repack: move `write_cruft_pack()` out of the builtin
repack: move `write_filtered_pack()` out of the builtin
repack: move `pack_kept_objects` to `struct pack_objects_args`
repack: move `finish_pack_objects_cmd()` out of the builtin
builtin/repack.c: pass `write_pack_opts` to `finish_pack_objects_cmd()`
repack: extract `write_pack_opts_is_local()`
repack: move `find_pack_prefix()` out of the builtin
builtin/repack.c: use `write_pack_opts` within `write_cruft_pack()`
builtin/repack.c: introduce `struct write_pack_opts`
repack: 'write_midx_included_packs' API from the builtin
builtin/repack.c: inline packs within `write_midx_included_packs()`
builtin/repack.c: pass `repack_write_midx_opts` to `midx_included_packs`
builtin/repack.c: inline `remove_redundant_bitmaps()`
builtin/repack.c: reorder `remove_redundant_bitmaps()`
repack: keep track of MIDX pack names using existing_packs
builtin/repack.c: use a string_list for 'midx_pack_names'
builtin/repack.c: extract opts struct for 'write_midx_included_packs()'
builtin/repack.c: remove ref snapshotting from builtin
repack: remove pack_geometry API from the builtin
...
CI improvements to handle the recent Rust integration better.
* ps/ci-rust:
rust: support for Windows
ci: verify minimum supported Rust version
ci: check for common Rust mistakes via Clippy
rust/varint: add safety comments
ci: check formatting of our Rust code
ci: deduplicate calls to `apt-get update`
Instead of three library archives (one for git, one for reftable,
and one for xdiff), roll everything into a single libgit.a archive.
This would help later effort to FFI into Rust.
* en/make-libgit-a:
make: delete REFTABLE_LIB, add reftable to LIB_OBJS
make: delete XDIFF_LIB, add xdiff to LIB_OBJS
In an identical fashion as the previous commit, move the function
`write_cruft_pack()` into its own compilation unit, and make the
function visible through the repack.h API.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In a similar fashion as in previous commits, move the function
`write_filtered_pack()` out of the builtin and into its own compilation
unit.
This function is now part of the repack.h API, but implemented in its
own "repack-filtered.c" unit as it is a separate component from other
kinds of repacking operations.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When writing a MIDX, 'git repack' takes a snapshot of the repository's
references and writes the result out to a file, which it then passes to
'git multi-pack-index write' via the '--refs-snapshot'.
This is done in order to make bitmap selections with respect to what we
are packing, thus avoiding a race where an incoming reference update
causes us to try and write a bitmap for a commit not present in the
MIDX.
Extract this functionality out into a new repack-midx.c compilation
unit, and expose the necessary functions via the repack.h API.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Now that the pack_geometry API is fully factored and isolated from the
rest of the builtin, declare it within repack.h and move its
implementation to "repack-geometry.c" as a separate component.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Now that we have properly factored the portion of the builtin which is
responsible for repacking promisor objects, we can move that function
(and associated dependencies) out of the builtin entirely.
Similar to previous extractions, this function is declared in repack.h,
but implemented in a separate repack-promisor.c file. This is done to
separate promisor-specific repacking functionality from generic repack
utilities (like "existing_packs", and "generated_pack" APIs).
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Over the years, builtin/repack.c has turned into a grab-bag of
functionality powering the 'git repack' builtin. Among its many
capabilities, it:
- can build and spawn 'git pack-objects' commands, which in turn
generate new packs
- has infrastructure to manage the set of existing packs in a
repository
- has infrastructure to split a sequence of packs into a geometric
progression based on object size
- can manage both generating and combining cruft packs together
- can write new MIDXs
to name a few.
As a result, this builtin has accumulated a lot of code, making adding
new functionality difficult. In the future, 'repack' will learn how to
manage a chain of incremental MIDXs, adding yet more functionality into
the builtin.
As a prerequisite step, let's first move some of the functionality in
the builtin into its own repack.[ch].
This will be done over the course of many steps, since there are many
individual components, some of which will end up in other, yet-to-exist
compilation units of their own. Some of the code movement here is also
non-trivial, so performing it in individual steps will make it easier to
verify.
Let's start by migrating 'struct pack_objects_args' (and the related
corresponding pack_objects_args_release() function) into repack.h, and
teach both the Makefile and Meson how to build the new compilation unit.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The initial patch series that introduced Rust into the core of Git only
cared about macOS and Linux. This specifically leaves out Windows, which
indeed fails to build right now due to two issues:
- The Rust runtime requires `GetUserProfileDirectoryW()`, but we don't
link against "userenv.dll".
- The path of the Rust library built on Windows is different than on
most other systems systems.
Fix both of these issues to support Windows.
Note that this commit fixes the Meson-based job in GitHub's CI. Meson
auto-detects the availability of Rust, and as the Windows runner has
Rust installed by default it already enabled Rust support there. But due
to the above issues that job fails consistently.
Install Rust on GitLab CI, as well, to improve test coverage there.
Based-on-patch-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Based-on-patch-by: Ezekiel Newren <ezekielnewren@gmail.com>
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The reftable backend learned to sanity check its on-disk data more
carefully.
* kn/reftable-consistency-checks:
refs/reftable: add fsck check for checking the table name
reftable: add code to facilitate consistency checks
fsck: order 'fsck_msg_type' alphabetically
Documentation/fsck-msgids: remove duplicate msg id
reftable: check for trailing newline in 'tables.list'
refs: move consistency check msg to generic layer
refs: remove unused headers
Dip our toes a bit to (optionally) use Rust implemented helper
called from our C code.
* ps/rust-balloon:
ci: enable Rust for breaking-changes jobs
ci: convert "pedantic" job into full build with breaking changes
BreakingChanges: announce Rust becoming mandatory
varint: reimplement as test balloon for Rust
varint: use explicit width for integers
help: report on whether or not Rust is enabled
Makefile: introduce infrastructure to build internal Rust library
Makefile: reorder sources after includes
meson: add infrastructure to build internal Rust library
* ps/rust-balloon:
ci: enable Rust for breaking-changes jobs
ci: convert "pedantic" job into full build with breaking changes
BreakingChanges: announce Rust becoming mandatory
varint: reimplement as test balloon for Rust
varint: use explicit width for integers
help: report on whether or not Rust is enabled
Makefile: introduce infrastructure to build internal Rust library
Makefile: reorder sources after includes
meson: add infrastructure to build internal Rust library
The `git refs verify` command is used to run consistency checks on the
reference backends. This command is also invoked when users run 'git
fsck'. While the files-backend has some fsck checks added, the reftable
backend lacks such checks. Let's add the required infrastructure and a
check to test for the files present in the reftable directory.
Since the reftable library is treated as an independent library we
should ensure that the library code works independently without
knowledge about Git's internals. To do this, add both 'reftable/fsck.c'
and 'reftable/reftable-fsck.h'. Which provide an entry point
'reftable_fsck_check' for running fsck checks over a provided reftable
stack. The callee provides the function with callbacks to handle issue
and information reporting.
The added check, goes over all tables in the reftable stack validates
that they have a valid name. It not, it raises an error.
While here, move 'reftable/error.o' in the Makefile to retain
lexicographic ordering.
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Same idea as the previous commit except that I don't know when or if
reftable will be turned into a Rust crate.
Signed-off-by: Ezekiel Newren <ezekielnewren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>