Preparation of xdiff/ codebase to work with Rust
Comments?
* en/xdiff-cleanup-3:
SQUASH??? cocci
xdiff: move xdl_cleanup_records() from xprepare.c to xdiffi.c
xdiff: remove dependence on xdlclassifier from xdl_cleanup_records()
xdiff: replace xdfile_t.dend with xdfenv_t.delta_end
xdiff: replace xdfile_t.dstart with xdfenv_t.delta_start
xdiff: cleanup xdl_trim_ends()
xdiff: use xdfenv_t in xdl_trim_ends() and xdl_cleanup_records()
xdiff: let patience and histogram benefit from xdl_trim_ends()
xdiff: don't waste time guessing the number of lines
xdiff: make classic diff explicit by creating xdl_do_classic_diff()
ivec: introduce the C side of ivec
The final clean-up phase of the diff output could turn the result of
histogram diff algorithm suboptimal, which has been corrected.
Comments?
* yc/histogram-hunk-shift-fix:
xdiff: re-diff shifted change groups when using histogram algorithm
Only the classic diff uses xdl_cleanup_records(). Move it,
xdl_clean_mmatch(), and the macros to xdiffi.c and call
xdl_cleanup_records() inside of xdl_do_classic_diff(). This better
organizes the code related to the classic diff.
Signed-off-by: Ezekiel Newren <ezekielnewren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Disentangle xdl_cleanup_records() from the classifier so that it can be
moved from xprepare.c into xdiffi.c.
The classic diff is the only algorithm that needs to count the number
of times each line occurs in each file. Make xdl_cleanup_records()
count the number of lines instead of the classifier so it won't slow
down patience or histogram.
Signed-off-by: Ezekiel Newren <ezekielnewren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
View with --color-words. Same argument as delta_start.
Signed-off-by: Ezekiel Newren <ezekielnewren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Placing delta_start in xdfenv_t instead of xdfile_t provides a more
appropriate context since this variable only makes sense with a pair
of files. View with --color-words.
Signed-off-by: Ezekiel Newren <ezekielnewren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This patch is best viewed with a before and after of the whole
function.
Rather than using 2 pointers and walking them. Use direct indexing with
local variables of what is being compared to make it easier to follow
along.
Signed-off-by: Ezekiel Newren <ezekielnewren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
View with --color-words. Prepare these functions to use the fields:
delta_start, delta_end. A future patch will add these fields to
xdfenv_t.
Signed-off-by: Ezekiel Newren <ezekielnewren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The patience diff is set up the exact same way as histogram, see
xdl_do_historgram_diff() in xhistogram.c. xdl_optimize_ctxs() is
redundant now, delete it.
Signed-off-by: Ezekiel Newren <ezekielnewren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
All lines must be read anyway, so classify them after they're read in.
Also move the memset() into xdl_init_classifier().
Signed-off-by: Ezekiel Newren <ezekielnewren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Later patches will prepare xdl_cleanup_records() to be moved into xdiffi.c
since only the classic diff uses that function.
Signed-off-by: Ezekiel Newren <ezekielnewren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
After a diff algorithm has been run, the compaction phase
(xdl_change_compact()) shifts and merges change groups to produce a
cleaner output. However, this shifting could create a new matched group
where both sides now have matching lines. This results in a
wrong-looking diff output which contains redundant lines that are the
same on both files.
Fix this by detecting this situation, and re-diff the texts on each side
to find similar lines, using the fall-back Myer's diff. Only do this for
histogram diff as it's the only algorithm where this is relevant. Below
contains an example, and more details.
For an example, consider two files below:
file1:
A
A
A
A
A
A
A
file2:
A
A
x
A
A
A
A
When using Myer's diff, the algorithm finds that only the "x" has been
changed, and produces a final diff result (these are line diffs, but
using word-diff syntax for ease of presentation):
A A[-A-]{+x+}A AAA
When using histogram diff, the algorithm first discovers the LCS "A
AAA", which it uses as anchor, then produces an intermediate diff:
{+A Ax+}A AAA[- AAA-].
This is a longer diff than Myer's, but it's still self-consistent.
However, the compaction phase attempts to shift the first file's diff
group upwards (note that this shift crosses the anchor that histogram
had used), leading to the final results for histogram diff:
[-A AA-]{+A Ax+}A AAA
This is a technically correct patch but looks clearly redundant to a
human as the first 3 lines should not be in the diff.
The fix would detect that a shift has caused matching to a new group,
and re-diff the "A AA" and "A Ax" parts, which results in "A A"
correctly re-marked as unchanged. This creates the now correct histogram
diff:
A A[-A-]{+x+}A AAA
This issue is not applicable to Myer's diff algorithm as it already
generates a minimal diff, which means a shift cannot result in a smaller
diff output (the default Myer's diff in xdiff is not guaranteed to be
minimal for performance reasons, but it typically does a good enough
job).
It's also not applicable to patience diff, because it uses only unique
lines as anchor for its splits, and falls back to Myer's diff within
each split. Shifting requires both ends having the same lines, and
therefore cannot cross the unique line boundaries established by the
patience algorithm. In contrast histogram diff uses non-unique lines as
anchors, and therefore shifting can cross over them.
This issue is rare in a normal repository. Below is a table of
repositories (`git log --no-merges -p --histogram -1000`), showing how
many times a re-diff was done and how many times it resulted in finding
matching lines (therefore addressing this issue) with the fix. In
general it is fewer than 1% of diff's that exhibit this offending
behavior:
| Repo (1k commits) | Re-diff | Found matching lines |
|--------------------|---------|----------------------|
| llvm-project | 45 | 11 |
| vim | 110 | 9 |
| git | 18 | 2 |
| WebKit | 168 | 1 |
| ripgrep | 22 | 1 |
| cpython | 32 | 0 |
| vscode | 13 | 0 |
Signed-off-by: Yee Cheng Chin <ychin.git@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Code clean-up.
* en/xdiff-cleanup-2:
xdiff: rename rindex -> reference_index
xdiff: change rindex from long to size_t in xdfile_t
xdiff: make xdfile_t.nreff a size_t instead of long
xdiff: make xdfile_t.nrec a size_t instead of long
xdiff: split xrecord_t.ha into line_hash and minimal_perfect_hash
xdiff: use unambiguous types in xdl_hash_record()
xdiff: use size_t for xrecord_t.size
xdiff: make xrecord_t.ptr a uint8_t instead of char
xdiff: use ptrdiff_t for dstart/dend
doc: define unambiguous type mappings across C and Rust
The find_longest_common_sequence() function in patience diff is
inefficient as it calls binary_search() for every unique line it
encounters when deciding where to put it in the sequence. From
instrumentation (using xctrace) on popular repositories, binary_search()
takes up 50-60% of the run time within patience_diff() when performing a
diff.
To optimize this, add a boundary condition check before binary_search()
is called to see if the encountered unique line is located after the
entire currently tracked longest subsequence. If so, skip the
unnecessary binary search and simply append the entry to the end of
sequence. Given that most files compared in a diff are usually quite
similar to each other, this condition is very common, and should be hit
much more frequently than the binary search.
Below are some end-to-end performance results by timing `git log
--shortstat --oneline -500 --patience` on different repositories with
the old and new code. Generally speaking this seems to give at least
8-10% speed up. The "binary search hit %" column describes how often the
algorithm enters the binary search path instead of the new faster path.
Even in the WebKit case we can see that it's quite rare (1.46%).
| Repo | Speed difference | binary search hit % |
|----------|------------------|---------------------|
| vim | 1.27x | 0.01% |
| pytorch | 1.16x | 0.02% |
| cpython | 1.14x | 0.06% |
| ripgrep | 1.14x | 0.03% |
| git | 1.13x | 0.12% |
| vscode | 1.09x | 0.10% |
| WebKit | 1.08x | 1.46% |
The benchmarks were done using hyperfine, on an Apple M1 Max laptop,
with git compiled with `-O3 -flto`.
Signed-off-by: Yee Cheng Chin <ychin.git@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The classic diff adds only the lines that it's going to consider,
during the diff, to an array. A mapping between the compacted
array, and the lines of the file that they reference, is
facilitated by this array.
Signed-off-by: Ezekiel Newren <ezekielnewren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The field rindex describes an index offset for other arrays. Change it
to size_t.
Changing the type of rindex from long to size_t has no cascading
refactor impact because it is only ever used to directly index other
arrays.
Signed-off-by: Ezekiel Newren <ezekielnewren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
size_t is used because nreff describes the number of elements in memory
for rindex.
Signed-off-by: Ezekiel Newren <ezekielnewren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
size_t is used because nrec describes the number of elements for both
recs, and for 'changed' + 2.
Signed-off-by: Ezekiel Newren <ezekielnewren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The ha field is serving two different purposes, which makes the code
harder to read. At first glance, it looks like many places assume
there could never be hash collisions between lines of the two input
files. In reality, line_hash is used together with xdl_recmatch() to
ensure correct comparisons of lines, even when collisions occur.
To make this clearer, the old ha field has been split:
* line_hash: a straightforward hash of a line, independent of any
external context. Its type is uint64_t, as it comes from a fixed
width hash function.
* minimal_perfect_hash: Not a new concept, but now a separate
field. It comes from the classifier's general-purpose hash table,
which assigns each line a unique and minimal hash across the two
files. A size_t is used here because it's meant to be used to
index an array. This also avoids ` as usize` casts on the Rust
side when using it to index a slice.
Signed-off-by: Ezekiel Newren <ezekielnewren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Convert the function signature and body to use unambiguous types. char
is changed to uint8_t because this function processes bytes in memory.
unsigned long to uint64_t so that the hash output is consistent across
platforms. `flags` was changed from long to uint64_t to ensure the
high order bits are not dropped on platforms that treat long as 32
bits.
Signed-off-by: Ezekiel Newren <ezekielnewren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
size_t is the appropriate type because size is describing the number of
elements, bytes in this case, in memory.
Signed-off-by: Ezekiel Newren <ezekielnewren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Make xrecord_t.ptr uint8_t because it's referring to bytes in memory.
In order to avoid a refactor avalanche, many uses of this field were
cast to char* or similar.
Places where casting was unnecessary:
xemit.c:156
xmerge.c:124
xmerge.c:127
xmerge.c:164
xmerge.c:169
xmerge.c:172
xmerge.c:178
Signed-off-by: Ezekiel Newren <ezekielnewren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
ptrdiff_t is appropriate for dstart and dend because they both describe
positive or negative offsets relative to a pointer.
Signed-off-by: Ezekiel Newren <ezekielnewren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The XDF_DIFF_ALGORITHM_MASK bit mask only includes bits for the patience
and histogram diffs, not for the minimal one. This means that when
reseting the diff algorithm to the default one, one needs to separately
clear the bit for the minimal diff. There are places in the code that fail
to do that: merge-ort.c and builtin/merge-file.c.
Add the XDF_NEED_MINIMAL bit to the bit mask, and remove the separate
clearing of this bit in the places where it hasn't been forgotten.
Signed-off-by: Antonin Delpeuch <antonin@delpeuch.eu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
A lot of code clean-up of xdiff.
Split out of a larger topic.
* en/xdiff-cleanup:
xdiff: change type of xdfile_t.changed from char to bool
xdiff: add macros DISCARD(0), KEEP(1), INVESTIGATE(2) in xprepare.c
xdiff: rename rchg -> changed in xdfile_t
xdiff: delete chastore from xdfile_t
xdiff: delete fields ha, line, size in xdlclass_t in favor of an xrecord_t
xdiff: delete redundant array xdfile_t.ha
xdiff: delete struct diffdata_t
xdiff: delete local variables that alias fields in xrecord_t
xdiff: delete superfluous function xdl_get_rec() in xemit
xdiff: delete unnecessary fields from xrecord_t and xdfile_t
xdiff: delete local variables and initialize/free xdfile_t directly
xdiff: delete static forward declarations in xprepare
The only values possible for 'changed' is 1 and 0, which exactly maps
to a bool type. It might not look like this because action1 and action2
(which use to be dis1, and dis2) were also of type char and were
assigned numerical values within a few lines of 'changed' (what used to
be rchg).
Using DISCARD/KEEP/INVESTIGATE for action1[i]/action2[j], and true/false
for changed[k] makes it clear to future readers that these are
logically separate concepts.
Best-viewed-with: --color-words
Signed-off-by: Ezekiel Newren <ezekielnewren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This commit is refactor-only; no behavior is changed. A future commit
will use bool literals for changed[i].
The functions xdl_clean_mmatch() and xdl_cleanup_records() will be
cleaned up more in a future patch series. The changes to
xdl_cleanup_records(), in this patch, are just to make it clear why
`char rchg` is refactored to `bool changed`.
Rename dis* to action* and replace literal numericals with macros.
The old names came from when dis* (which I think was short for discard)
was treated like a boolean, but over time it grew into a ternary state
machine. The result was confusing because dis* and rchg* both used 0/1
values with different meanings.
The new names and macros make the states explicit. nm is short for
number of matches, and mlim is a heuristic limit:
nm == 0 -> action[i] = DISCARD -> changed[i] = true
0 < nm < mlim -> action[i] = KEEP -> changed[i] = false
nm >= mlim -> action[i] = INVESTIGATE -> changed[i] = xdl_clean_mmatch()
When need_min is true, only DISCARD and KEEP occur because the limit
is effectively infinite.
Best-viewed-with: --color-words
Signed-off-by: Ezekiel Newren <ezekielnewren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The field rchg (now 'changed') declares if a line in a file is changed
or not. A later commit will change it's type from 'char' to 'bool'
to make its purpose even more clear.
Best-viewed-with: --color-words
Signed-off-by: Ezekiel Newren <ezekielnewren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
xdfile_t currently uses chastore_t which is an arena allocator. I
think that xrecord_t used to be a linked list and recs didn't exist
originally. When recs was added I think they forgot to remove
xdfile_t.next, but was overlooked. This dual data structure setup
makes the code somewhat confusing.
Additionally the C type chastore_t isn't FFI friendly, and provides
little to no performance benefit over using realloc to grow an array.
Performance impact of deleting fields from xdfile_t:
Deleting ha is about 5% slower.
Deleting cha is about 5% faster.
Delete ha, but keep cha
time hyperfine --warmup 3 -L exe build_v2.51.0/git,build_delete_ha/git '{exe} log --oneline --shortstat --diff-algorithm=myers -3000 v2.39.1 >/dev/null'
Benchmark 1: build_v2.51.0/git log --oneline --shortstat --diff-algorithm=myers -3000 v2.39.1 >/dev/null
Time (mean ± σ): 1.269 s ± 0.017 s [User: 1.135 s, System: 0.128 s]
Range (min … max): 1.249 s … 1.286 s 10 runs
Benchmark 2: build_delete_ha/git log --oneline --shortstat --diff-algorithm=myers -3000 v2.39.1 >/dev/null
Time (mean ± σ): 1.339 s ± 0.017 s [User: 1.234 s, System: 0.099 s]
Range (min … max): 1.320 s … 1.358 s 10 runs
Summary
build_v2.51.0/git log --oneline --shortstat --diff-algorithm=myers -3000 v2.39.1 >/dev/null ran
1.06 ± 0.02 times faster than build_delete_ha/git log --oneline --shortstat --diff-algorithm=myers -3000 v2.39.1 >/dev/null
Delete cha, but keep ha
time hyperfine --warmup 3 -L exe build_v2.51.0/git,build_delete_chastore/git '{exe} log --oneline --shortstat --diff-algorithm=myers -3000 v2.39.1 >/dev/null'
Benchmark 1: build_v2.51.0/git log --oneline --shortstat --diff-algorithm=myers -3000 v2.39.1 >/dev/null
Time (mean ± σ): 1.290 s ± 0.001 s [User: 1.154 s, System: 0.130 s]
Range (min … max): 1.288 s … 1.292 s 10 runs
Benchmark 2: build_delete_chastore/git log --oneline --shortstat --diff-algorithm=myers -3000 v2.39.1 >/dev/null
Time (mean ± σ): 1.232 s ± 0.017 s [User: 1.105 s, System: 0.121 s]
Range (min … max): 1.205 s … 1.249 s 10 runs
Summary
build_delete_chastore/git log --oneline --shortstat --diff-algorithm=myers -3000 v2.39.1 >/dev/null ran
1.05 ± 0.01 times faster than build_v2.51.0/git log --oneline --shortstat --diff-algorithm=myers -3000 v2.39.1 >/dev/null
Delete ha AND chastore
time hyperfine --warmup 3 -L exe build_v2.51.0/git,build_delete_ha_and_chastore/git '{exe} log --oneline --shortstat --diff-algorithm=myers -3000 v2.39.1 >/dev/null'
Benchmark 1: build_v2.51.0/git log --oneline --shortstat --diff-algorithm=myers -3000 v2.39.1 >/dev/null
Time (mean ± σ): 1.291 s ± 0.002 s [User: 1.156 s, System: 0.129 s]
Range (min … max): 1.287 s … 1.295 s 10 runs
Benchmark 2: build_delete_ha_and_chastore/git log --oneline --shortstat --diff-algorithm=myers -3000 v2.39.1 >/dev/null
Time (mean ± σ): 1.306 s ± 0.001 s [User: 1.195 s, System: 0.105 s]
Range (min … max): 1.305 s … 1.308 s 10 runs
Summary
build_v2.51.0/git log --oneline --shortstat --diff-algorithm=myers -3000 v2.39.1 >/dev/null ran
1.01 ± 0.00 times faster than build_delete_ha_and_chastore/git log --oneline --shortstat --diff-algorithm=myers -3000 v2.39.1 >/dev/null
Best-viewed-with: --color-words
Signed-off-by: Ezekiel Newren <ezekielnewren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The fields from xdlclass_t are aliases of xrecord_t:
xdlclass_t.line -> xrecord_t.ptr
xdlclass_t.size -> xrecord_t.size
xdlclass_t.ha -> xrecord_t.ha
xdlclass_t carries a copy of the data in xrecord_t, but instead of
embedding xrecord_t it duplicates the individual fields. A future
commit will change the types used in xrecord_t so embed it in
xdlclass_t first, so we don't have to remember to change the types
here as well.
Best-viewed-with: --color-words
Helped-by: Phillip Wood <phillip.wood123@gmail.com>
Signed-off-by: Ezekiel Newren <ezekielnewren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When 0 <= i < xdfile_t.nreff the following is true:
xdfile_t.ha[i] == xdfile_t.recs[xdfile_t.rindex[i]]
This makes the code about 5% slower. The fields rindex and ha are
specific to the classic diff (myers and minimal). I plan on creating a
struct for classic diff, but there's a lot of cleanup that needs to be
done before that can happen and leaving ha in would make those cleanups
harder to follow.
A subsequent commit will delete the chastore cha from xdfile_t. That
later commit will investigate deleting ha and cha independently and
together.
Signed-off-by: Ezekiel Newren <ezekielnewren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Every field in this struct is an alias for a certain field in xdfile_t.
diffdata_t.nrec -> xdfile_t.nreff
diffdata_t.ha -> xdfile_t.ha
diffdata_t.rindex -> xdfile_t.rindex
diffdata_t.rchg -> xdfile_t.rchg
I think this struct existed before xdfile_t, and was kept for backward
compatibility reasons. I think xdiffi should have been refactored to
use the new (xdfile_t) struct, but was easier to alias it instead.
The local variables rchg* and rindex* don't shorten the lines by much,
nor do they really need to be there to make the code more readable.
Delete them.
Signed-off-by: Ezekiel Newren <ezekielnewren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Use the type xrecord_t as the local variable for the functions in the
file xdiff/xemit.c. Most places directly reference the fields inside of
this struct, doing that here makes it more consistent with the rest of
the code.
Signed-off-by: Ezekiel Newren <ezekielnewren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When xrecord_t was a linked list, and recs didn't exist, I assume this
function walked the list until it found the right record. Accessing
a contiguous array is so trivial that this function is now superfluous.
Delete it.
Signed-off-by: Ezekiel Newren <ezekielnewren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
xrecord_t.next, xdfile_t.hbits, xdfile_t.rhash are initialized,
but never used for anything by the code. Remove them.
Best-viewed-with: --color-words
Signed-off-by: Ezekiel Newren <ezekielnewren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
These local variables are essentially a hand-rolled additional
implementation of xdl_free_ctx() inlined into xdl_prepare_ctx(). Modify
the code to use the existing xdl_free_ctx() function so there aren't
two ways to free such variables.
Signed-off-by: Ezekiel Newren <ezekielnewren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Move xdl_prepare_env() later in the file to avoid the need
for static forward declarations.
Best-viewed-with: --color-moved
Signed-off-by: Ezekiel Newren <ezekielnewren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
xdl_hash_record_verbatim uses modified djb2 hash with XOR instead of ADD
for combining. The ADD-based variant is used as the basis of the modern
("GNU") symbol lookup scheme in ELF. Glibc dynamic loader received an
optimized version of this hash function thanks to Noah Goldstein [1].
Switch xdl_hash_record_verbatim to additive hashing and implement
an optimized loop following the scheme suggested by Noah.
Timing 'git log --oneline --shortstat v2.0.0..v2.5.0' under perf, I got
version | cycles, bn | instructions, bn
---------------------------------------
A 6.38 11.3
B 6.21 10.89
C 5.80 9.95
D 5.83 8.74
---------------------------------------
A: baseline (git master at e4ef0485fd78)
B: plus 'xdiff: refactor xdl_hash_record()'
C: and plus this patch
D: with 'xdiff: use xxhash' by Phillip Wood
The resulting speedup for xdl_hash_record_verbatim itself is about 1.5x.
[1] https://inbox.sourceware.org/libc-alpha/20220519221803.57957-6-goldstein.w.n@gmail.com/
Signed-off-by: Alexander Monakov <amonakov@ispras.ru>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Inline the check for whitespace flags so that the compiler can hoist
it out of the loop in xdl_prepare_ctx(). This improves the performance
by 8%.
$ hyperfine --warmup=1 -L rev HEAD,HEAD^ --setup='git checkout {rev} -- :/ && make git' ': {rev}; GIT_CONFIG_GLOBAL=/dev/null ./git log --oneline --shortstat v2.0.0..v2.5.0'
Benchmark 1: : HEAD; GIT_CONFIG_GLOBAL=/dev/null ./git log --oneline --shortstat v2.0.0..v2.5.0
Time (mean ± σ): 1.670 s ± 0.044 s [User: 1.473 s, System: 0.196 s]
Range (min … max): 1.619 s … 1.754 s 10 runs
Benchmark 2: : HEAD^; GIT_CONFIG_GLOBAL=/dev/null ./git log --oneline --shortstat v2.0.0..v2.5.0
Time (mean ± σ): 1.801 s ± 0.021 s [User: 1.605 s, System: 0.192 s]
Range (min … max): 1.766 s … 1.831 s 10 runs
Summary
': HEAD^; GIT_CONFIG_GLOBAL=/dev/null ./git log --oneline --shortstat v2.0.0..v2.5.0' ran
1.08 ± 0.03 times faster than ': HEAD^^; GIT_CONFIG_GLOBAL=/dev/null ./git log --oneline --shortstat v2.0.0..v2.5.0'
Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The cleanup_records function marks some lines as changed before running
the actual diff algorithm. For most lines, this is a good performance
optimization, but it also marks lines that are surrounded by many
changed lines as changed as well. This can cause redundant changes and
longer-than-necessary diffs.
Whether this results in better-looking diffs is subjective. However, the
--minimal flag explicitly requests the shortest possible diff.
The change results in shorter diffs in about 1.3% of all diffs in Git's
history. Performance wise, I have measured the impact on
"git log -p -3000 --minimal > /dev/null". With this change, I get
Time (mean ± σ): 2.363 s ± 0.023 s (25 runs)
and without this patch I measured
Time (mean ± σ): 2.362 s ± 0.035 s (25 runs).
As the difference is well within the margin of error, this does not seem
to have an impact on performance.
Signed-off-by: Niels Glodny <n.glodny@campus.lmu.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Code clean-up.
* js/comma-semicolon-confusion:
detect-compiler: detect clang even if it found CUDA
clang: warn when the comma operator is used
compat/regex: explicitly mark intentional use of the comma operator
wildmatch: avoid using of the comma operator
diff-delta: avoid using the comma operator
xdiff: avoid using the comma operator unnecessarily
clar: avoid using the comma operator unnecessarily
kwset: avoid using the comma operator unnecessarily
rebase: avoid using the comma operator unnecessarily
remote-curl: avoid using the comma operator unnecessarily
The xdiff code on 32-bit platform misbehaved when an insanely large
context size is given, which has been corrected.
* rs/xdiff-context-length-fix:
xdiff: avoid arithmetic overflow in xdl_get_hunk()
The comma operator is a somewhat obscure C feature that is often used by
mistake and can even cause unintentional code flow. While the code in
this patch used the comma operator intentionally (to avoid curly
brackets around two statements, each, that want to be guarded by a
condition), it is better to surround it with curly brackets and to use a
semicolon instead.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Acked-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
xdl_get_hunk() calculates the maximum number of common lines between two
changes that would fit into the same hunk for the given context options.
It involves doubling and addition and thus can overflow if the terms are
huge.
The type of ctxlen and interhunkctxlen in xdemitconf_t is long, while
the type of the corresponding context and interhunkcontext in struct
diff_options is int. On many platforms longs are bigger that ints,
which prevents the overflow. On Windows they have the same range and
the overflow manifests as hunks that are split erroneously and lines
being repeated between them.
Fix the overflow by checking and not going beyond LONG_MAX. This allows
specifying a huge context line count and getting all lines of a changed
files in a single hunk, as expected.
Reported-by: Jason Cho <jason11choca@proton.me>
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The comparisons all involve comparisons against unsigned values.
Signed-off-by: David Aguilar <davvid@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The loop iteration variable is non-negative and used in comparisons
against a size_t value. Use size_t to eliminate the mismatch.
Signed-off-by: David Aguilar <davvid@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The comparisons all involve unsigned variables. Cast the comparison
to unsigned to eliminate the mismatch.
Signed-off-by: David Aguilar <davvid@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The unsigned `ignored` variable causes expressions to promote to
unsigned. Use a signed value to make comparisons use the same types.
Signed-off-by: David Aguilar <davvid@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>