git/repack-promisor.c
Patrick Steinhardt dcc9c7ef47 builtin/repack: handle promisor packs with geometric repacking
When performing a fetch with an object filter, we mark the resulting
packfile as a promisor pack. An object part of such a pack may miss any
of its referenced objects, and Git knows to handle this case by fetching
any such missing objects from the promisor remote.

The "promisor" property needs to be retained going forward. So every
time we pack a promisor object, the resulting pack must be marked as a
promisor pack. git-repack(1) does this already: when a repository has a
promisor remote, it knows to pass "--exclude-promisor-objects" to the
git-pack-objects(1) child process. Promisor packs are written separately
when doing an all-into-one repack via `repack_promisor_objects()`.

But we don't support promisor objects when doing a geometric repack yet.
Promisor packs do not get any special treatment there, as we simply
merge promisor and non-promisor packs. The resulting pack is not even
marked as a promisor pack, which essentially corrupts the repository.

This corruption couldn't happen in the real world though: we pass both
"--exclude-promisor-objects" and "--stdin-packs" to git-pack-objects(1)
if a repository has a promisor remote, but as those options are mutually
exclusive we always end up dying. And while we made those flags
compatible with one another in a preceding commit, we still end up dying
in case git-pack-objects(1) is asked to repack a promisor pack.

There's multiple ways to fix this:

  - We can exclude promisor packs from the geometric progression
    altogether. This would have the consequence that we never repack
    promisor packs at all. But in a partial clone it is quite likely
    that the user generates a bunch of promisor packs over time, as
    every backfill fetch would create another one. So this doesn't
    really feel like a sensible option.

  - We can adapt git-pack-objects(1) to support repacking promisor packs
    and include them in the normal geometric progression. But this would
    mean that the set of promisor objects expands over time as the packs
    are merged with normal packs.

  - We can use a separate geometric progression to repack promisor
    packs.

The first two options both have significant downsides, so they aren't
really feasible. But the third option fixes both of these downsides: we
make sure that promisor packs get merged, and at the same time we never
expand the set of promisor objects beyond the set of objects that are
already marked as promisor objects.

Implement this strategy so that geometric repacking works in partial
clones.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-01-14 06:29:24 -08:00

140 lines
4.1 KiB
C

#include "git-compat-util.h"
#include "repack.h"
#include "hex.h"
#include "pack.h"
#include "packfile.h"
#include "path.h"
#include "repository.h"
#include "run-command.h"
struct write_oid_context {
struct child_process *cmd;
const struct git_hash_algo *algop;
};
/*
* Write oid to the given struct child_process's stdin, starting it first if
* necessary.
*/
static int write_oid(const struct object_id *oid,
struct packed_git *pack UNUSED,
uint32_t pos UNUSED, void *data)
{
struct write_oid_context *ctx = data;
struct child_process *cmd = ctx->cmd;
if (cmd->in == -1) {
if (start_command(cmd))
die(_("could not start pack-objects to repack promisor objects"));
}
if (write_in_full(cmd->in, oid_to_hex(oid), ctx->algop->hexsz) < 0 ||
write_in_full(cmd->in, "\n", 1) < 0)
die(_("failed to feed promisor objects to pack-objects"));
return 0;
}
static void finish_repacking_promisor_objects(struct repository *repo,
struct child_process *cmd,
struct string_list *names,
const char *packtmp)
{
struct strbuf line = STRBUF_INIT;
FILE *out;
close(cmd->in);
out = xfdopen(cmd->out, "r");
while (strbuf_getline_lf(&line, out) != EOF) {
struct string_list_item *item;
char *promisor_name;
if (line.len != repo->hash_algo->hexsz)
die(_("repack: Expecting full hex object ID lines only from pack-objects."));
item = string_list_append(names, line.buf);
/*
* pack-objects creates the .pack and .idx files, but not the
* .promisor file. Create the .promisor file, which is empty.
*
* NEEDSWORK: fetch-pack sometimes generates non-empty
* .promisor files containing the ref names and associated
* hashes at the point of generation of the corresponding
* packfile, but this would not preserve their contents. Maybe
* concatenate the contents of all .promisor files instead of
* just creating a new empty file.
*/
promisor_name = mkpathdup("%s-%s.promisor", packtmp,
line.buf);
write_promisor_file(promisor_name, NULL, 0);
item->util = generated_pack_populate(item->string, packtmp);
free(promisor_name);
}
fclose(out);
if (finish_command(cmd))
die(_("could not finish pack-objects to repack promisor objects"));
strbuf_release(&line);
}
void repack_promisor_objects(struct repository *repo,
const struct pack_objects_args *args,
struct string_list *names, const char *packtmp)
{
struct write_oid_context ctx;
struct child_process cmd = CHILD_PROCESS_INIT;
prepare_pack_objects(&cmd, args, packtmp);
cmd.in = -1;
/*
* NEEDSWORK: Giving pack-objects only the OIDs without any ordering
* hints may result in suboptimal deltas in the resulting pack. See if
* the OIDs can be sent with fake paths such that pack-objects can use a
* {type -> existing pack order} ordering when computing deltas instead
* of a {type -> size} ordering, which may produce better deltas.
*/
ctx.cmd = &cmd;
ctx.algop = repo->hash_algo;
for_each_packed_object(repo, write_oid, &ctx,
FOR_EACH_OBJECT_PROMISOR_ONLY);
if (cmd.in == -1) {
/* No packed objects; cmd was never started */
child_process_clear(&cmd);
return;
}
finish_repacking_promisor_objects(repo, &cmd, names, packtmp);
}
void pack_geometry_repack_promisors(struct repository *repo,
const struct pack_objects_args *args,
const struct pack_geometry *geometry,
struct string_list *names,
const char *packtmp)
{
struct child_process cmd = CHILD_PROCESS_INIT;
FILE *in;
if (!geometry->promisor_split)
return;
prepare_pack_objects(&cmd, args, packtmp);
strvec_push(&cmd.args, "--stdin-packs");
cmd.in = -1;
if (start_command(&cmd))
die(_("could not start pack-objects to repack promisor packs"));
in = xfdopen(cmd.in, "w");
for (size_t i = 0; i < geometry->promisor_split; i++)
fprintf(in, "%s\n", pack_basename(geometry->promisor_pack[i]));
for (size_t i = geometry->promisor_split; i < geometry->promisor_pack_nr; i++)
fprintf(in, "^%s\n", pack_basename(geometry->promisor_pack[i]));
fclose(in);
finish_repacking_promisor_objects(repo, &cmd, names, packtmp);
}