79260 Commits

Author SHA1 Message Date
Junio C Hamano
a7652bf99c Merge branch 'ms/reftable-block-writer-errors'
Give more meaningful error return values from block writer layer of
the reftable ref-API backend.

* ms/reftable-block-writer-errors:
  reftable: adapt write_object_record() to propagate block_writer_add() errors
  reftable: adapt writer_add_record() to propagate block_writer_add() errors
  reftable: propagate specific error codes in block_writer_add()
2025-04-08 11:43:12 -07:00
Junio C Hamano
b97b360c51 Merge branch 'en/assert-wo-side-effects'
Ensure what we write in assert() does not have side effects,
and introduce ASSERT() macro to mark those that cannot be
mechanically checked for lack of side effects.

* en/assert-wo-side-effects:
  treewide: replace assert() with ASSERT() in special cases
  ci: add build checking for side-effects in assert() calls
  git-compat-util: introduce ASSERT() macro
2025-04-08 11:43:12 -07:00
Karthik Nayak
221e8fcb7f update-ref: add --batch-updates flag for stdin mode
When updating multiple references through stdin, Git's update-ref
command normally aborts the entire transaction if any single update
fails. This atomic behavior prevents partial updates. Introduce a new
batch update system, where the updates the performed together similar
but individual updates are allowed to fail.

Add a new `--batch-updates` flag that allows the transaction to continue
even when individual reference updates fail. This flag can only be used
in `--stdin` mode and builds upon the batch update support added to the
refs subsystem in the previous commits. When enabled, failed updates are
reported in the following format:

  rejected SP (<old-oid> | <old-target>) SP (<new-oid> | <new-target>) SP <rejection-reason> LF

Update the documentation to reflect this change and also tests to cover
different scenarios where an update could be rejected.

Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Acked-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-04-08 07:59:49 -07:00
Karthik Nayak
31726bb90d refs: support rejection in batch updates during F/D checks
The `refs_verify_refnames_available()` is used to batch check refnames
for F/D conflicts. While this is the more performant alternative than
its individual version, it does not provide rejection capabilities on a
single update level. For batched updates, this would mean a rejection of
the entire transaction whenever one reference has a F/D conflict.

Modify the function to call `ref_transaction_maybe_set_rejected()` to
check if a single update can be rejected. Since this function is only
internally used within 'refs/' and we want to pass in a `struct
ref_transaction *` as a variable. We also move and mark
`refs_verify_refnames_available()` to 'refs-internal.h' to be an
internal function.

Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Acked-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-04-08 07:57:21 -07:00
Karthik Nayak
23fc8e4f61 refs: implement batch reference update support
Git supports making reference updates with or without transactions.
Updates with transactions are generally better optimized. But
transactions are all or nothing. This means, if a user wants to batch
updates to take advantage of the optimizations without the hard
requirement that all updates must succeed, there is no way currently to
do so. Particularly with the reftable backend where batching multiple
reference updates is more efficient than performing them sequentially.

Introduce batched update support with a new flag,
'REF_TRANSACTION_ALLOW_FAILURE'. Batched updates while different from
transactions, use the transaction infrastructure under the hood. When
enabled, this flag allows individual reference updates that would
typically cause the entire transaction to fail due to non-system-related
errors to be marked as rejected while permitting other updates to
proceed. System errors referred by 'REF_TRANSACTION_ERROR_GENERIC'
continue to result in the entire transaction failing. This approach
enhances flexibility while preserving transactional integrity where
necessary.

The implementation introduces several key components:

  - Add 'rejection_err' field to struct `ref_update` to track failed
    updates with failure reason.

  - Add a new struct `ref_transaction_rejections` and a field within
    `ref_transaction` to this struct to allow quick iteration over
    rejected updates.

  - Modify reference backends (files, packed, reftable) to handle
    partial transactions by using `ref_transaction_set_rejected()`
    instead of failing the entire transaction when
    `REF_TRANSACTION_ALLOW_FAILURE` is set.

  - Add `ref_transaction_for_each_rejected_update()` to let callers
    examine which updates were rejected and why.

This foundational change enables batched update support throughout the
reference subsystem. A following commit will expose this capability to
users by adding a `--batch-updates` flag to 'git-update-ref(1)',
providing both a user-facing feature and a testable implementation.

Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Acked-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-04-08 07:57:20 -07:00
Karthik Nayak
76e760b999 refs: introduce enum-based transaction error types
Replace preprocessor-defined transaction errors with a strongly-typed
enum `ref_transaction_error`. This change:

  - Improves type safety and function signature clarity.
  - Makes error handling more explicit and discoverable.
  - Maintains existing error cases, while adding new error cases for
    common scenarios.

This refactoring paves the way for more comprehensive error handling
which we will utilize in the upcoming commits to add batch reference
update support.

Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Acked-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-04-08 07:57:20 -07:00
Karthik Nayak
ca89c18d5c refs/reftable: extract code from the transaction preparation
Extract the core logic for preparing individual reference updates from
`reftable_be_transaction_prepare()` into `prepare_single_update()`. This
dedicated function now handles all validation and preparation steps for
each reference update in the transaction, including object ID
verification, HEAD reference handling, and symref processing.

The refactoring consolidates all reference update validation into a
single logical block, which improves code maintainability and
readability. More importantly, this restructuring lays the groundwork
for implementing batched reference update support in the reftable
backend, which will be introduced in a followup commit.

No functional changes are included in this commit - it is purely a code
reorganization to support future enhancements.

Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Acked-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-04-08 07:57:19 -07:00
Karthik Nayak
4dfcf18089 refs/files: remove duplicate duplicates check
Within the files reference backend's transaction's 'finish' phase, a
verification step is currently performed wherein the refnames list is
sorted and examined for multiple updates targeting the same refname.

It has been observed that this verification is redundant, as an
identical check is already executed during the transaction's 'prepare'
stage. Since the refnames list remains unmodified following the
'prepare' stage, this secondary verification can be safely eliminated.

The duplicate check has been removed accordingly, and the
`ref_update_reject_duplicates()` function has been marked as static, as
its usage is now confined to 'refs.c'.

Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Acked-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-04-08 07:57:19 -07:00
Karthik Nayak
c3baddf04f refs: move duplicate refname update check to generic layer
Move the tracking of refnames in `affected_refnames` from individual
backends into the generic layer in 'refs.c'. This centralizes the
duplicate refname detection that was previously handled separately by
each backend.

Make some changes to accommodate this move:

  - Add a `string_list` field `refnames` to `ref_transaction` to contain
    all the references in a transaction. This field is updated whenever
    a new update is added via `ref_transaction_add_update`, so manual
    additions in reference backends are dropped.

  - Modify the backends to use this field internally as needed. The
    backends need to check if an update for refname already exists when
    splitting symrefs or adding an update for 'HEAD'.

  - In the reftable backend, within `reftable_be_transaction_prepare()`,
    move the `string_list_has_string()` check above
    `ref_transaction_add_update()`. Since `ref_transaction_add_update()`
    automatically adds the refname to `transaction->refnames`,
    performing the check after will always return true, so we perform
    the check before adding the update.

This helps reduce duplication of functionality between the backends and
makes it easier to make changes in a more centralized manner.

Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Acked-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-04-08 07:57:18 -07:00
Karthik Nayak
05a1834e42 refs/files: remove redundant check in split_symref_update()
In `split_symref_update()`, there were two checks for duplicate
refnames:

  - At the start, `string_list_has_string()` ensures the refname is not
    already in `affected_refnames`, preventing duplicates from being
    added.

  - After adding the refname, another check verifies whether the newly
    inserted item has a `util` value.

The second check is unnecessary because the first one guarantees that
`string_list_insert()` will never encounter a preexisting entry.

The `item->util` field is assigned to validate that a rename doesn't
already exist in the list. The validation is done after the first check.
As this check is removed, clean up the validation and the assignment of
this field in `split_head_update()` and `files_transaction_prepare()`.

Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Acked-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-04-08 07:57:18 -07:00
Patrick Steinhardt
8e0a1ec076 builtin/maintenance: introduce "reflog-expire" task
By default, git-maintenance(1) uses the "gc" task to ensure that the
repository is well-maintained. This can be changed, for example by
either explicitly configuring which tasks should be enabled or by using
the "incremental" maintenance strategy. If so, git-maintenance(1) does
not know to expire reflog entries, which is a subtask that git-gc(1)
knows to perform for the user. Consequently, the reflog will grow
indefinitely unless the user manually trims it.

Introduce a new "reflog-expire" task that plugs this gap:

  - When running the task directly, then we simply execute `git reflog
    expire --all`, which is the same as git-gc(1).

  - When running git-maintenance(1) with the `--auto` flag, then we only
    run the task in case the "HEAD" reflog has at least N reflog entries
    that would be discarded. By default, N is set to 100, but this can
    be configured via "maintenance.reflog-expire.auto". When a negative
    integer has been provided we always expire entries, zero causes us
    to never expire entries, and a positive value specifies how many
    entries need to exist before we consider pruning the entries.

Note that the condition for the `--auto` flags is merely a heuristic and
optimized for being fast. This is because `git maintenance run --auto`
will be executed quite regularly, so scanning through all reflogs would
likely be too expensive in many repositories.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-04-08 07:53:27 -07:00
Patrick Steinhardt
3fef24ac3f builtin/gc: split out function to expire reflog entries
We're about to introduce a new task for git-maintenance(1) that knows to
expire reflog entries. The logic will be shared with git-gc(1), which
already knows how to do this.

Pull out the common logic into a separate function so that we can share
the implementation between both builtins.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-04-08 07:53:27 -07:00
Patrick Steinhardt
d20fc193b6 builtin/reflog: make functions regarding reflog_expire_options public
Make functions that are required to manage `reflog_expire_options`
available elsewhere by moving them into "reflog.c" and exposing them in
the corresponding header. The functions will be used in a subsequent
commit.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-04-08 07:53:27 -07:00
Patrick Steinhardt
964f364de9 builtin/reflog: stop storing per-reflog expiry dates globally
As described in the preceding commit, the per-reflog expiry dates are
stored in a global pair of variables. Refactor the code so that they are
contained in `struct reflog_expire_options` to make the structure useful
in other contexts.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-04-08 07:53:26 -07:00
Patrick Steinhardt
8565827570 builtin/reflog: stop storing default reflog expiry dates globally
When expiring reflog entries, it is possible to configure expiry dates
that depend on the name of the reflog. This requires us to store a
couple of different expiry dates:

  - The default expiry date for reflog entries that aren't otherwise
    specified.

  - The per-reflog expiry date.

  - The currently active set of expiry dates for a given reference.

While the last item is stored in `struct reflog_expire_options`, the
other items aren't, which makes it hard to reuse the structure in other
places.

Refactor the code so that the default expiry date is stored as part of
the structure. The per-reflog expiry dates will be adapted accordingly
in the subsequent commit.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-04-08 07:53:26 -07:00
Patrick Steinhardt
2ed8008399 reflog: rename cmd_reflog_expire_cb to reflog_expire_options
We're about to expose `struct cmd_reflog_expire_cb` via "reflog.h" so
that we can also use this structure in "builtin/gc.c". Once we make it
accessible to a wider scope though it becomes awkwardly named, as it
isn't only useful in the context of a callback. Instead, the function is
containing all kinds of options relevant to whether or not a reflog
entry should be expired.

Rename the structure to `reflog_expire_options` to prepare for this.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-04-08 07:53:25 -07:00
Zheng Yuting
1ac402cdf3 send-email: finer-grained SMTP error handling
Code captured errors but did not process them further.
This treated all failures the same without distinguishing SMTP status.

Add handle-smtp_error to extract SMTP status codes using a regex (as
defined in RFC 5321) and handle errors as follows:

- No error present:
	- If a result is provided, return 1 to indicate success.
	- Otherwise, return 0 to indicate failure.

- Error present with a captured three-digit status code:
	- For 4yz (transient errors), return 1 and allow retries.
	- For 5yz (permanent errors), return 0 to indicate failure.
	- For any other recognized status code, return 1, treating it as
	a transient error.

- Error present but no status code found:
	- Return 1 as a transient error.

Signed-off-by: Zheng Yuting <05ZYT30@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-04-07 14:54:05 -07:00
Zheng Yuting
ce20dec4a4 send-email: capture errors in an eval {} block
Auth relied solely on return values without catching errors. This misjudges
non-credential errors as auth failure without error info.

Patch wraps the entire auth process in an eval {} block to catch
all exceptions, including non-credential errors. It adds a new $error var,
uses 'or do' to prevent flow break, and returns $result ? 1 : 0. And merges
if/else branches, integrates SASL and basic auth, with comments for
future status code handling.

Signed-off-by: Zheng Yuting <05ZYT30@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-04-07 14:54:05 -07:00
Patrick Steinhardt
e0011188ca reftable/table: move printing logic into test helper
The logic to print individual blocks in a table is hosted in the
reftable library. This is only the case due to historical reasons though
because users of the library had no interfaces to read blocks one by
one. Otherwise, printing individual blocks has no place in the reftable
library given that the format will not be generic in the first place.

We have now grown a public interface to iterate through blocks contained
in a table, and thus we can finally move the logic to print them into
the test helper.

Move over the logic and refactor it accordingly. Note that the iterator
also trivially allows us to access index sections, which we previously
didn't print at all. This omission wasn't intentional though, so start
dumping those sections as well so that we can assert that indices are
written as expected.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-04-07 14:53:13 -07:00
Patrick Steinhardt
0f8ee94b63 reftable/constants: make block types part of the public interface
Now that reftable blocks can be read individually via the public
interface it becomes necessary for callers to be able to distinguish the
different types of blocks. Expose the relevant constants.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-04-07 14:53:12 -07:00
Patrick Steinhardt
da89659365 reftable/table: introduce iterator for table blocks
Introduce a new iterator that allows the caller to iterate through all
blocks contained in a table. This gives users more fine-grained control
over how exactly those blocks are being read and exposes information to
callers that was previously inaccessible.

This iterator will be required by a future patch series that adds
consistency checks for the reftable backend. In addition to that though
we will also reimplement `reftable_table_print_blocks()` on top of this
new iterator in a subsequent commit.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-04-07 14:53:12 -07:00
Patrick Steinhardt
c8cbe85a23 reftable/table: add reftable_table to the public interface
The `reftable_table` interface is an internal implementation detail that
callers have no access to. Having direct access to this structure is
important though for a subsequent patch series that will implement
consistency checks for the reftable backend.

Move the structure into "reftable-table.h" so that it part of the public
interface.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-04-07 14:53:12 -07:00
Patrick Steinhardt
50d8459477 reftable/block: expose a generic iterator over reftable records
Expose a generic iterator over reftable records and expose it via the
public interface. Together with an upcoming iterator for reftable blocks
contained in a table this will allow users to trivially iterate through
blocks and their respective records individually.

This functionality will be used to implement consistency checks for the
reftable backend, which requires more fine-grained control over how we
read data.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-04-07 14:53:12 -07:00
Patrick Steinhardt
6da48a5e00 reftable/block: make block iterators reseekable
Refactor the block iterators so that initialization and seeking are
different from one another. This makes the iterator trivially reseekable
by storing the pointer to the block at initialization time, which we can
then reuse on every seek.

This refactoring prepares the code for exposing a `reftable_iterator`
interface for blocks in a subsequent commit. Callsites are adjusted
accordingly.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-04-07 14:53:11 -07:00
Patrick Steinhardt
156d79cef0 reftable/block: store block pointer in the block iterator
The block iterator requires access to a bunch of data from the
underlying `reftable_block` that it is iterating over. This data is
stored by copying over relevant data into a separate set of variables.
This has multiple downsides:

  - We require more storage space than necessary. This is more of a
    theoretical issue as we shouldn't ever have many blocks.

  - We have to perform more bookkeeping, and the variable names are
    inconsistent across the two data structures. This can lead to some
    confusion.

  - The lifetime of the block iterator is tied to the block anyway, but
    we hide that a bit by only storing pointers pointing into the block.

There isn't really any good reason why we rip out parts of the block
instead of storing a pointer to the block itself.

Refactor the code to do so. Despite being simpler, it also allows us to
decouple the lifetime of the block iterator from seeking in a subsequent
commit.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-04-07 14:53:11 -07:00
Patrick Steinhardt
655e18d6b4 reftable/block: create public interface for reading blocks
While users of the reftable library wouldn't generally require access to
individual blocks in a reftable table, there are valid usecases where
one may require low-level access to them. One such upcoming usecase in
the Git codebase is to implement consistency checks for the reftable
library where we want to verify each block individually.

Create a public interface for reading blocks. The interface isn't yet
complete and lacks e.g. a way to read individual records from a block.
Such missing functionality will be backfilled in subsequent commits.

Note that this change also requires us to expose `reftable_buf`, which
is used by the `reftable_block_first_key()` function.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-04-07 14:53:11 -07:00
Patrick Steinhardt
ce76cec964 git-zlib: use struct z_stream_s instead of typedef
Throughout the Git codebase we're using the typedeffed version of
`z_stream`, which maps to `struct z_stream_s`. By using a typedef
instead of the struct it becomes somewhat harder to predeclare the
symbol so that headers depending on the struct can do so without having
to pull in "zlib-compat.h".

We don't yet have users that would really care about this: the only
users that declare `z_stream` as a pointer are in "reftable/block.h",
which is a header that is internal to the reftable library. But in the
next step we're going to expose the `struct reftable_block` publicly,
and that struct does contain a pointer to `z_stream`. And as the public
header shouldn't depend on "reftable/system.h", which is an internal
implementation detail, we won't have the typedef for `z_stream` readily
available.

Prepare for this change by using `struct z_stream_s` throughout our code
base. In case zlib-ng is used we use a define to map from `z_stream_s`
to `zng_stream_s`.

Drop the pre-declaration of `struct z_stream` while at it. This struct
does not exist in the first place, and the declaration wasn't needed
because "reftable/block.h" already includes "reftable/basics.h" which
transitively includes "reftable/system.h" and thus "git-zlib.h".

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-04-07 14:53:11 -07:00
Patrick Steinhardt
12a9aa8cb7 reftable/block: rename block_reader to reftable_block
The `block_reader` structure is used to access parsed data of a reftable
block. The structure is currently treated as an internal implementation
detail and not exposed via our public interfaces. The functionality
provided by the structure is useful to external users of the reftable
library though, for example when implementing consistency checks that
need to scan through the blocks manually.

Rename the structure to `reftable_block` now that the name has been made
available in the preceding commit. This name is in line with the naming
schema used for other data structures like `reftable_table` in that it
describes the underlying entity that it provides access to.

The new data structure isn't yet exposed via the public interface, which
is left for a subsequent commit.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-04-07 14:53:10 -07:00
Patrick Steinhardt
2b3362c10d reftable/block: rename block to block_data
The `reftable_block` structure associates a byte slice with a block
source. As such it only holds the data of a reftable block without
actually encoding any of the details for how to access that data.

Rename the structure to instead be called `reftable_block_data`. Besides
clarifying that this really only holds data, it also allows us to rename
the `reftable_block_reader` to `reftable_block` in the next commit, as
this is the structure that actually encapsulates access to the reftable
blocks.

Rename the `struct reftable_block_reader::block` member accordingly.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-04-07 14:53:10 -07:00
Patrick Steinhardt
fd888311fb reftable/table: move reading block into block reader
The logic to read blocks from a reftable is scattered across both the
table and the block subsystems. Besides causing somewhat fuzzy
responsibilities, it also means that we have to awkwardly pass around
the ownership of blocks between the subsystems.

Refactor the code so that we stop passing the block when initializing a
reader, but instead by passing in the block source plus the offset at
which we're supposed to read a block. Like this, the ownership of the
block itself doesn't need to get handed over as the block reader is the
one owning the block right from the start.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-04-07 14:53:10 -07:00
Patrick Steinhardt
ba620d296a reftable/block: simplify how we track restart points
Restart points record the location of reftable records that do not use
prefix compression and are used to perform a binary search inside of a
block. These restart points are encoded at the end of a block, between
the record data and the footer of a table.

The block structure contains three different variables related to these
restart points:

  - The block length contains the length of the reftable block up to the
    restart points.

  - The restart count contains the number of restart points contained in
    the block.

  - The restart bytes variable tracks where the restart point data
    begins.

Tracking all three of these variables is unnecessary though as the data
can be derived from one another: the block length without restart points
is the exact same as the offset of the restart count data, which we
already track via the `restart_bytes` data.

Refactor the code so that we track the location of restart bytes not as
a pointer, but instead as an offset. This allows us to trivially get rid
of the `block_len` variable as described above. This avoids having the
confusing `block_len` variable and allows us to do less bookkeeping
overall.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-04-07 14:53:09 -07:00
Patrick Steinhardt
1ac4e5e83d reftable/blocksource: consolidate code into a single file
The code that implements block sources is distributed across a couple of
files. Consolidate all of it into "reftable/blocksource.c" and its
accompanying header so that it is easier to locate and more self
contained.

While at it, rename some of the functions to have properly scoped names.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-04-07 14:53:09 -07:00
Patrick Steinhardt
b648bd6549 reftable/reader: rename data structure to "table"
The `struct reftable_reader` subsystem encapsulates a table that has
been read from the disk. As such, the current name of that structure is
somewhat hard to understand as it only talks about the fact that we read
something from disk, without really giving an indicator _what_ that is.

Furthermore, this naming schema doesn't really fit well into how the
other structures are named: `reftable_merged_table`, `reftable_stack`,
`reftable_block` and `reftable_record` are all named after what they
encapsulate.

Rename the subsystem to `reftable_table`, which directly gives a hint
that the data structure is about handling the individual tables part of
the stack.

While this change results in a lot of churn, it prepares for us exposing
the APIs to third-party callers now that the reftable library is a
standalone library that can be linked against by other projects.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-04-07 14:53:09 -07:00
Patrick Steinhardt
6dcc05ffc3 reftable: fix formatting of the license header
The license headers used across the reftable library doesn't follow our
typical coding style for multi-line comments. Fix it.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-04-07 14:53:09 -07:00
Karthik Nayak
4d253071dd blame: print unblamable and ignored commits in porcelain mode
The 'git-blame(1)' command allows users to ignore specific revisions via
the '--ignore-rev <rev>' and '--ignore-revs-file <file>' flags. These
flags are often combined with the 'blame.markIgnoredLines' and
'blame.markUnblamableLines' config options. These config options prefix
ignored and unblamable lines with a '?' and '*', respectively.

However, this option was never extended to the porcelain mode of
'git-blame(1)'. Since the documentation does not indicate this
exclusion, it is a bug.

Fix this by printing 'ignored' and 'unblamable' respectively for the
options when using the porcelain modes.

Helped-by: Patrick Steinhardt <ps@pks.im>
Helped-by: Toon Claes <toon@iotcl.com>
Helped-by: Phillip Wood <phillip.wood123@gmail.com>
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-04-07 14:50:18 -07:00
Patrick Steinhardt
7a7b602267 t5703: refactor test to not depend on Perl
We use Perl due to two different reasons in t5703:

  - To filter advertised capabilities.

  - To set up a CGI script with HTTPD.

Refactor the first category to use `test_grep` instead. Refactoring the
second category would be a bit more involved, so instead we add the
PERL_TEST_HELPERS prerequisite to those individual tests now.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-04-07 14:47:41 -07:00
Patrick Steinhardt
88bef8db84 t5316: refactor max_chain() to not depend on Perl
The `max_chain()` helper function is used to extract the maximum delta
chain of a packfile as printed by git-index-pack(1). The script uses
Perl to extract that data, but it can be trivially refactored to use
awk(1) instead.

Refactor the helper accordingly so that we can drop a couple of
PERL_TEST_HELPERS prerequisites.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-04-07 14:47:41 -07:00
Patrick Steinhardt
9f4bce35b3 t0210: refactor trace2 scrubbing to not use Perl
The output generated by our trace2 mechanism contains several fields
that are dependent on the environment they're being run in, which makes
it somewhat harder to test it. As a countermeasure we scrub the output
and strip out any fields that contain such information.

The logic to do so is implemented in Perl, but it can be trivially
ported to instead use sed(1). Refactor the code accordingly so that we
can drop the PERL_TEST_HELPERS prerequisite.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-04-07 14:47:41 -07:00
Patrick Steinhardt
88ed7b84cd t0021: refactor generate_random_characters() to not depend on Perl
The `generate_random_characters()` helper function generates N
random characters in the range 'a-z' and writes them into a file. The
logic currently uses Perl, but it can be adapted rather easily by:

  - Making `test-tool genrandom` generate an infinite stream.

  - Using `tr -dc` to strip all characters which aren't in the range of
    'a-z'.

  - Using `test_copy_bytes()` to copy the first N bytes.

This allows us to drop the PERL_TEST_HELPERS prerequisite.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-04-07 14:47:40 -07:00
Patrick Steinhardt
cee137b7e5 t/lib-httpd: refactor "one-time-perl" CGI script to not depend on Perl
Our Apache HTTPD setup exposes an "one_time_perl" endpoint to access
repositories. If used, we execute the "apply-one-time-perl.sh" CGI
script that checks whether we have a "one-time-perl" script. If so, that
script gets executed so that it can munge what would be served. Once
done, the script gets removed so that it doesn't execute a second time.

As the name says, this functionality expects the user to pass a Perl
script. This isn't really necessary though: we can just as easily
implement the same thing with arbitrary scripts.

Refactor the code so that we instead expect an arbitrary script to
exist and rename the functionality to "one-time-script". Adapt callers
to use shell utilities instead of Perl so that we can drop the
PERL_TEST_HELPERS prerequisite.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-04-07 14:47:40 -07:00
Patrick Steinhardt
de9eeabd71 t/lib-t6000: refactor name_from_description() to not depend on Perl
The `name_from_description()` test helper uses Perl to munge a given
description and convert it into a name. Refactor it to instead use a
combination of sed(1) and tr(1) so that we drop PERL_TEST_HELPERS
prerequisites in users of this library.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-04-07 14:47:40 -07:00
Patrick Steinhardt
3ca6f20585 t/lib-gpg: refactor sanitize_pgp() to not depend on Perl
The `sanitize_pgp()` test helper uses Perl to strip PGP signatures from
stdin. Refactor it to instead use sed(1) so that we drop the
PERL_TEST_HELPERS prerequisite in users of this library.

Note that we have to add PERL_TEST_HELPERS to a subset of tests in t6300
now that the test suite doesn't bail out early anymore in case the
prerequisite isn't set.

Helped-by: Johannes Schindelin <Johannes.Schindelin@gmx.de>
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-04-07 14:47:40 -07:00
Patrick Steinhardt
4a7af4edbb t: refactor tests depending on Perl for textconv scripts
We have a couple of tests that depend on Perl for textconv scripts.
Refactor these tests to instead be implemented via shell utilities so
that we can drop a couple of PERL_TEST_HELPERS prerequisites.

Note that the conversion in t4030 is not a one-to-one equivalent to the
previous textconv script. Before this change we used to essentially do a
hexdump via Perl. The obvious conversion here would be to use `test-tool
hexdump` like we do for the other tests. But this would lead to a ripple
effect where we would have to adapt a bunch of other tests with a bunch
of seemingly unrelated changes, which would be somewhat awkward.

Instead, we're going with the minimum viable change: the test files we
write contain "\001" and "\000", and the test's expectation is that
those get translated into proper ASCII characters. So instead of doing a
full hexdump, we simply use tr(1) to translate these specific bytes.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-04-07 14:47:39 -07:00
Patrick Steinhardt
6aec8d38fd t: refactor tests depending on Perl to print data
A bunch of tests rely on Perl to print data in various different ways.
These usages fall into the following categories:

  - Print data conditionally by matching patterns. These usecases can be
    converted to use awk(1) rather easily.

  - Print data repeatedly. These usecases can typically be converted to
    use a combination of `test-tool genzeros` and sed(1).

  - Print data in reverse. These usecases can be converted to use
    awk(1) or `sort -r`.

Refactor the tests accordingly so that we can drop a couple of
PERL_TEST_HELPERS prerequisites.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-04-07 14:47:39 -07:00
Patrick Steinhardt
cdbdc6bf8c t: refactor tests depending on Perl substitution operator
We have a bunch of tests that use Perl to perform substitution via the
"s/" operator. These usecases can be trivially replaced with sed(1) and
tr(1).

Refactor the tests accordingly so that we can drop a couple of
PERL_TEST_HELPERS prerequisites.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-04-07 14:47:39 -07:00
Patrick Steinhardt
db8ff64a3a t: refactor tests depending on Perl transliteration operator
We have a bunch of tests that use Perl to perform character
transliteration via the "y/" or "tr/" operator. These usecases can be
trivially replaced with tr(1).

Refactor the tests accordingly so that we can drop a couple of
PERL_TEST_HELPERS prerequisites.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-04-07 14:47:38 -07:00
Patrick Steinhardt
8d531a9d18 Makefile: stop requiring Perl when running tests
The Makefile for our tests has a couple of targets that depend on Perl.
Adapt those targets to only run conditionally in case Perl is available
on the system so that it becomes possible to run the test suite without
Perl.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-04-07 14:47:38 -07:00
Patrick Steinhardt
267143f286 meson: stop requiring Perl when tests are enabled
The Perl interpreter used to be a strict dependency for running our test
suite. This requirement is explicit in the Meson build system, where we
require Perl to be present unless tests have been disabled.

With the preceding commits we have loosened this restriction so that it
is now possible to run tests when Perl is unavailable. Loosen the above
requirement accordingly.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-04-07 14:47:38 -07:00
Patrick Steinhardt
64b3eee038 t: adapt existing PERL prerequisites
A couple of our tests depend on the PERL prerequisite even though it
isn't needed. These tests fall into one of the following classes:

  - The underlying logic used to be implemented in Perl but isn't
    anymore. Here we can simply drop the dependency altogether.

  - The test logic used to depend on Perl but doesn't anymore. Again, we
    can simply drop the dependency.

  - The test logic still relies on a Perl interpreter. These tests
    should use the newly introduced PERL_TEST_HELPERS prerequisite.

Adapt test cases accordingly.

Note that in t1006 we have to introduce another new prerequisite
depending on whether or not the IPC::Open2 module is available. Funny
enough, when starting to use `test_lazy_prereq` to do so we also get a
conflict of variables with the "script" variable that contains the Perl
logic because `test_run_lazy_prereq_` also sets that variable. We thus
rename the variable in t1006 to "perl_script".

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-04-07 14:47:38 -07:00
Patrick Steinhardt
23e21a58d5 t: introduce PERL_TEST_HELPERS prerequisite
In the early days of Git, Perl was used quite prominently throughout the
project. This has changed significantly as almost all of the executables
we ship nowadays have eventually been rewritten in C. Only a handful of
subsystems remain that require Perl:

  - gitweb, a read-only web interface.

  - A couple of scripts that allow importing repositories from GNU Arch,
    CVS and Subversion.

  - git-send-email(1), which can be used to send mails.

  - git-request-pull(1), which is used to request somebody to pull from
    a URL by sending an email.

  - git-filter-branch(1), which uses Perl with the `--state-branch`
    option. This command is typically recommended against nowadays in
    favor of git-filter-repo(1).

  - Our Perl bindings for Git.

  - The netrc Git credential helper.

None of these subsystems can really be considered to be part of the
"core" of Git, and an installation without them is fully functional.
It is more likely than not that an end user wouldn't even notice that
any features are missing if those tools weren't installed. But while
Perl nowadays very much is an optional dependency of Git, there is a
significant limitation when Perl isn't available: developers cannot run
our test suite.

Preceding commits have started to lift this restriction by removing the
strict dependency on Perl in many central parts of the test library. But
there are still many tests that rely on small Perl helpers to do various
different things.

Introduce a new PERL_TEST_HELPERS prerequisite that guards all tests
that require Perl. This prerequisite is explicitly different than the
preexisting PERL prerequisite:

  - PERL records whether or not features depending on the Perl
    interpreter are built.

  - PERL_TEST_HELPERS records whether or not a Perl interpreter is
    available for our tests.

By having these two separate prerequisites we can thus distinguish
between tests that inherently depend on Perl because the underlying
feature does, and those tests that depend on Perl because the test
itself is using Perl.

Adapt all tests to set the PERL_TEST_HELPERS prerequisite as needed.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-04-07 14:47:37 -07:00