mirror of
https://github.com/git/git.git
synced 2026-01-11 21:33:13 +09:00
Martin Koegler noted that create_delta() performs a new hash lookup after every block copy encoding which are currently limited to 64KB. In case of larger identical blocks, the next hash lookup would normally point to the next 64KB block in the reference buffer and multiple block copy operations will be consecutively encoded. It is however possible that the reference buffer be sparsely indexed if hash buckets have been trimmed down in create_delta_index() when hashing of the reference buffer isn't well balanced. In that case the hash lookup following a block copy might fail to match anything and the fact that the reference buffer still matches beyond the previous 64KB block will be missed. Let's rework the code so that buffer comparison isn't bounded to 64KB anymore. The match size should be as large as possible up front and only then should multiple block copy be encoded to cover it all. Also, fewer hash lookups will be performed in the end. According to Martin, this patch should reduce his 92MB pack down to 75MB with the dataset he has. Tests performed on the Linux kernel repo show a slightly smaller pack and a slightly faster repack. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
//////////////////////////////////////////////////////////////// GIT - the stupid content tracker //////////////////////////////////////////////////////////////// "git" can mean anything, depending on your mood. - random three-letter combination that is pronounceable, and not actually used by any common UNIX command. The fact that it is a mispronunciation of "get" may or may not be relevant. - stupid. contemptible and despicable. simple. Take your pick from the dictionary of slang. - "global information tracker": you're in a good mood, and it actually works for you. Angels sing, and a light suddenly fills the room. - "goddamn idiotic truckload of sh*t": when it breaks Git is a fast, scalable, distributed revision control system with an unusually rich command set that provides both high-level operations and full access to internals. Git is an Open Source project covered by the GNU General Public License. It was originally written by Linus Torvalds with help of a group of hackers around the net. It is currently maintained by Junio C Hamano. Please read the file INSTALL for installation instructions. See Documentation/tutorial.txt to get started, then see Documentation/everyday.txt for a useful minimum set of commands, and "man git-commandname" for documentation of each command. CVS users may also want to read Documentation/cvs-migration.txt. Many Git online resources are accessible from http://git.or.cz/ including full documentation and Git related tools. The user discussion and development of Git take place on the Git mailing list -- everyone is welcome to post bug reports, feature requests, comments and patches to git@vger.kernel.org. To subscribe to the list, send an email with just "subscribe git" in the body to majordomo@vger.kernel.org. The mailing list archives are available at http://marc.theaimsgroup.com/?l=git and other archival sites. The messages titled "A note from the maintainer", "What's in git.git (stable)" and "What's cooking in git.git (topics)" and the discussion following them on the mailing list give a good reference for project status, development direction and remaining tasks.
Description
Languages
C
50.5%
Shell
38.7%
Perl
4.5%
Tcl
3.2%
Python
0.8%
Other
2.1%