The new '--name-hash-version' option for 'git repack' is a simple
pass-through to the underlying 'git pack-objects' subcommand. However,
this subcommand may have other options and a temporary filename as part
of the subcommand execution that may not be predictable or could change
over time.
The existing test_subcommand method requires an exact list of arguments
for the subcommand. This is too rigid for our needs here, so create a
new method, test_subcommand_flex. Use it to check that the
--name-hash-version option is passing through.
Since we are modifying the 'git repack' command, let's bring its usage
in line with the Documentation's synopsis. This removes it from the
allow list in t0450 so it will remain in sync in the future.
Signed-off-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The previous change introduced a new pack_name_hash_v2() function that
intends to satisfy much of the hash locality features of the existing
pack_name_hash() function while also distinguishing paths with similar
final components of their paths.
This change adds a new --name-hash-version option for 'git pack-objects'
to allow users to select their preferred function version. This use of
an integer version allows for future expansion and a direct way to later
store a name hash version in the .bitmap format.
For now, let's consider how effective this mechanism is when repacking a
repository with different name hash versions. Specifically, we will
execute 'git pack-objects' the same way a 'git repack -adf' process
would, except we include --name-hash-version=<n> for testing.
On the Git repository, we do not expect much difference. All path names
are short. This is backed by our results:
| Stage | Pack Size | Repack Time |
|-----------------------|-----------|-------------|
| After clone | 260 MB | N/A |
| --name-hash-version=1 | 127 MB | 129s |
| --name-hash-version=2 | 127 MB | 112s |
This example demonstrates how there is some natural overhead coming from
the cloned copy because the server is hosting many forks and has not
optimized for exactly this set of reachable objects. But the full repack
has similar characteristics for both versions.
Let's consider some repositories that are hitting too many collisions
with version 1. First, let's explore the kinds of paths that are
commonly causing these collisions:
* "/CHANGELOG.json" is 15 characters, and is created by the beachball
[1] tool. Only the final character of the parent directory can
differentiate different versions of this file, but also only the two
most-significant digits. If that character is a letter, then this is
always a collision. Similar issues occur with the similar
"/CHANGELOG.md" path, though there is more opportunity for
differences In the parent directory.
* Localization files frequently have common filenames but
differentiates via parent directories. In C#, the name
"/strings.resx.lcl" is used for these localization files and they
will all collide in name-hash.
[1] https://github.com/microsoft/beachball
I've come across many other examples where some internal tool uses a
common name across multiple directories and is causing Git to repack
poorly due to name-hash collisions.
One open-source example is the fluentui [2] repo, which uses beachball
to generate CHANGELOG.json and CHANGELOG.md files, and these files have
very poor delta characteristics when comparing against versions across
parent directories.
| Stage | Pack Size | Repack Time |
|-----------------------|-----------|-------------|
| After clone | 694 MB | N/A |
| --name-hash-version=1 | 438 MB | 728s |
| --name-hash-version=2 | 168 MB | 142s |
[2] https://github.com/microsoft/fluentui
In this example, we see significant gains in the compressed packfile
size as well as the time taken to compute the packfile.
Using a collection of repositories that use the beachball tool, I was
able to make similar comparisions with dramatic results. While the
fluentui repo is public, the others are private so cannot be shared for
reproduction. The results are so significant that I find it important to
share here:
| Repo | --name-hash-version=1 | --name-hash-version=2 |
|----------|-----------------------|-----------------------|
| fluentui | 440 MB | 161 MB |
| Repo B | 6,248 MB | 856 MB |
| Repo C | 37,278 MB | 6,755 MB |
| Repo D | 131,204 MB | 7,463 MB |
Future changes could include making --name-hash-version implied by a config
value or even implied by default during a full repack.
It is important to point out that the name hash value is stored in the
.bitmap file format, so we must force --name-hash-version=1 when bitmaps
are being read or written. Later, the bitmap format could be updated to
be aware of the name hash version so deltas can be quickly computed
across the bitmapped/not-bitmapped boundary. To promote the safety of
this parameter, the validate_name_hash_version() method will die() if
the given name-hash version is incorrect and will disable newer versions
if not yet compatible with other features, such as --write-bitmap-index.
Signed-off-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Teach 'git notes add' and 'git notes append' a new '-e' flag,
instructing them to open the note in $GIT_EDITOR before saving.
* sa/notes-edit:
notes: teach the -e option to edit messages in editor
Documentation update to clarify that 'uploadpack.allowAnySHA1InWant'
implies both 'allowTipSHA1InWant' and 'allowReachableSHA1InWant'.
* ps/upload-pack-doc:
doc: document how uploadpack.allowAnySHA1InWant impact other allow options
An extra worktree attached to a repository points at each other to
allow finding the repository from the worktree and vice versa
possible. Turn this linkage to relative paths.
* cw/worktree-relative:
worktree: add test for path handling in linked worktrees
worktree: link worktrees with relative paths
worktree: refactor infer_backlink() to use *strbuf
worktree: repair copied repository and linked worktrees
“Trailer” is the preferred nomenclature in this project. Also add a
definite article where I think it makes sense.
As we can see the rest of the document already prefers this term. This
just gets rid of the last stragglers.
Signed-off-by: Kristoffer Haugsbakk <code@khaugsbakk.name>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
These two commands are similar enough to acknowledge each other on their
documentation pages.
See the previous commit where we discussed that option-less update-ref
does not support updating symbolic refs but symbolic-ref does.
Signed-off-by: Kristoffer Haugsbakk <code@khaugsbakk.name>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Add a paragraph which just emphasizes that the command without any
options does not support refs in the final arguments. This is clear
already from the names `<new-oid>` and `<old-oid>` but the right balance
of redundancy makes documentation robust against stray interpretation.
This is also a good place to mention why `--stdin` has those `symref-*`
commands.
Suggested-by: Bence Ferdinandy <bence@ferdinandy.com>
Signed-off-by: Kristoffer Haugsbakk <code@khaugsbakk.name>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
This paragraph interrupts the flow of the section by going into detail
about what a symbolic ref file is and how it is implemented. It is not
clear what the purpose is since symbolic refs were already mentioned
prior (“possibly dereferencing the symbolic refs”). Worse, it can
confuse the reader about what argument can be a symbolic ref since it
just says “it” and not which of the parameters; in turn the reader can
be lead to try `<new-oid>` and then get a confusing error since
update-ref will just say that it is not a valid SHA1.
gitglossary(7) already documents what a symref is, concretely, and quite
well at that.
Reported-by: Bence Ferdinandy <bence@ferdinandy.com>
Signed-off-by: Kristoffer Haugsbakk <code@khaugsbakk.name>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Move the discussion of file system symbolic links to a new “Notes”
section (inspired by the one in git-symbolic-ref(1)) since this is
mostly of historical note at this point, not something that is needed in
the main section of the documentation.
Signed-off-by: Kristoffer Haugsbakk <code@khaugsbakk.name>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Remove paragraphs which explain that using this command is safer than
echoing the branch name into `HEAD`.
Evoking the echo strategy is wrong now under the reftable backend since
this file does not exist. And the ref file backend majority user base
use porcelain commands to manage `HEAD` unless they are intentionally
poking at the implementation.
Maybe this warning was relevant for the usage patterns when it was
added[1] but now it just takes up space.
† 1: 129056370ab (Add missing documentation., 2005-10-04)
Signed-off-by: Kristoffer Haugsbakk <code@khaugsbakk.name>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
The other paragraphs on options say “With <option>,”. Let’s be uniform.
Also add missing word “that”.
Signed-off-by: Kristoffer Haugsbakk <code@khaugsbakk.name>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Used regex to find these typos:
(?<!struct )(?<=\s)([a-z]{1,}) \1(?=\s)
Signed-off-by: Sven Strickroth <email@cs-ware.de>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Notes can be added to a commit using:
- "-m" to provide a message on the command line.
- -C to copy a note from a blob object.
- -F to read the note from a file.
When these options are used, Git does not open an editor,
it simply takes the content provided via these options and
attaches it to the commit as a note.
Improve flexibility to fine-tune the note before finalizing it
by allowing the messages to be prefilled in the editor and edited
after the messages have been provided through -[mF].
Signed-off-by: Abraham Samuel Adekunle <abrahamadekunle50@gmail.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Document how setting of `uploadpack.allowAnySHA1InWant`
influences other `uploadpack` options - `allowTipSHA1InWant`
and `allowReachableSHA1InWant`.
Signed-off-by: Piotr Szlazak <piotr.szlazak@gmail.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
A new configuration variable remote.<name>.serverOption makes the
transport layer act as if the --serverOption=<value> option is
given from the command line.
* xx/remote-server-option-config:
ls-remote: leakfix for not clearing server_options
fetch: respect --server-option when fetching multiple remotes
transport.c:🤝 make use of server options from remote
remote: introduce remote.<name>.serverOption configuration
transport: introduce parse_transport_option() method
Discussing the desire to make breaking changes, declaring that
breaking changes are made at a certain version boundary, and
recording these decisions in this document, are necessary but not
sufficient. We need to make sure that we can implement, test, and
deploy such impactful changes.
Earlier we considered to guard the breaking changes with a run-time
check of the `feature.git<version>` configuration to allow brave
users and developers to opt into them as early adoptors. But the
engineering cost to support such a run-time switch, covering new and
disappearing git subcommands and how "git help" would adjust the
documentation to the run-time switch, would be unrealistically high
to be worth it.
Formalize the mechanism based on a compile-time switch to allow
early adopters to opt into the breaking change in a version of Git
before the planned version for the breaking change.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Doc update to clarify how periodical maintenance are scheduled,
spread across time to avoid thundering hurds.
* sk/doc-maintenance-schedule:
doc: add a note about staggering of maintenance
The way AsciiDoc is used for SYNOPSIS part of the manual pages has
been revamped. The sources, at least for the simple cases, got
vastly pleasant to work with.
* ja/doc-synopsis-markup:
doc: apply synopsis simplification on git-clone and git-init
doc: update the guidelines to reflect the current formatting rules
doc: introduce a synopsis typesetting
We can only check out commits or branches, not refs in general. And the
problem here is if another worktree is using the branch that we want to
check out.
Let’s be more direct and just talk about branches instead of refs.
Also replace “be held” with “in use”. Further, “in use” is not
restricted to a branch being checked out (e.g. the branch could be busy
on a rebase), hence generalize to “or otherwise in use” in the option
description.
Signed-off-by: Kristoffer Haugsbakk <code@khaugsbakk.name>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We explicitly avoid saying "ref <src>" when introducing the source
side of a refspec, because it can be a fully-spelled hexadecimal
object name, and it also can be a pattern that is not quite a "ref".
But we are loose when we introduce <dst> and say "ref <dst>", even
though it can also be a pattern. Let's omit "ref" also from the
destination side.
Clarify that <src> can be a ref, a (limited glob) pattern, or an
object name.
Even though the very original design of refspec expected that '*'
was used only at the end (e.g., "refs/heads/*" was expected, but not
"refs/heads/*-wip"), the code and its use evolved to handle a single
'*' anywhere in the pattern. Update the text to remove the mention
of "the same prefix". Anything that matches the pattern are named
by such a (limited glob) pattern in <src>.
Also put a bit more stress on the fact that we accept only one '*'
in the pattern by saying "one and only one `*`".
Helped-by: Monika Kairaitytė <monika@kibit.lt>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
• Provide a commit message in the example command.
The command will hang since it is waiting for a commit message on
stdin. Which is usable but not straightforward enough since this is
example code.
• Use `||` directly since that is more straightforward than checking the
last exit status.
Also use `echo` and `exit` since `die` is not defined.
• Expose variable declarations.
Signed-off-by: Kristoffer Haugsbakk <code@khaugsbakk.name>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The synopsis for `git config unset` mentions two positional arguments:
`<name>` and `<value>`. While the first argument is correct, the second
is not. Users are expected to provide the value via `--value=<value>`.
Remove the positional argument. The `--value=<value>` option is already
documented correctly, so this is all we need to do to fix the
documentation.
Signed-off-by: Josh Heinrichs <joshiheinrichs@gmail.com>
Reviewed-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Utilize the `server_options` from the corresponding remote during the
handshake in `transport.c` when Git protocol v2 is detected. This helps
initialize the `server_options` in `transport.h:transport` if no server
options are set for the transport (typically via `--server-option` or
`-o`).
While another potential place to incorporate server options from the
remote is in `transport.c:transport_get`, setting server options for a
transport using a protocol other than v2 could lead to unexpected errors
(see `transport.c:die_if_server_options`).
Relevant tests and documentation have been updated accordingly.
Signed-off-by: Xing Xin <xingxin.xx@bytedance.com>
Reviewed-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Currently, server options for Git protocol v2 can only be specified via
the command line option "--server-option" or "-o", which is inconvenient
when users want to specify a list of default options to send. Therefore,
we are introducing a new configuration to hold a list of default server
options, akin to the `push.pushOption` configuration for push options.
Initially, I named the new configuration `fetch.serverOption` to align
with `push.pushOption`. However, after discussing with Patrick, it was
renamed to `remote.<name>.serverOption` as suggested, because:
1. Server options are designed to be server-specific, making it more
logical to use a per-remote configuration.
2. Using "fetch." prefixed configurations in git-clone or git-ls-remote
seems out of place and inconsistent in design.
The parsing logic for `remote.<name>.serverOption` also relies on
`transport.c:parse_transport_option`, similar to `push.pushOption`, and
they follow the same priority design:
1. Server options set in lower-priority configuration files (e.g.,
/etc/gitconfig or $HOME/.gitconfig) can be overridden or unset in
more specific repository configurations using an empty string.
2. Command-line specified server options take precedence over those from
the configuration.
Server options from configuration are stored to the corresponding
`remote.h:remote` as a new field `server_options`. The field will be
utilized in the subsequent commit to help initialize the
`server_options` of `transport.h:transport`.
And documentation have been updated accordingly.
Helped-by: Patrick Steinhardt <ps@pks.im>
Helped-by: Junio C Hamano <gitster@pobox.com>
Reported-by: Liu Zhongbo <liuzhongbo.6666@bytedance.com>
Signed-off-by: Xing Xin <xingxin.xx@bytedance.com>
Reviewed-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
These links should point to `.html` files, not to `.txt` ones.
Compare also to 4945f046c7f (api docs: link to html version of
api-trace2, 2022-09-16).
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>