Kjetil Barvik 36419c8ee4 check_updates(): effective removal of cache entries marked CE_REMOVE
Below is oprofile output from GIT command 'git chekcout -q my-v2.6.25'
(move from tag v2.6.27 to tag v2.6.25 of the Linux kernel):

CPU: Core 2, speed 1999.95 MHz (estimated)
Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit
                         mask of 0x00 (Unhalted core cycles) count 20000
Counted INST_RETIRED_ANY_P events (number of instructions retired) with a
                           unit mask of 0x00 (No unit mask) count 20000
CPU_CLK_UNHALT...|INST_RETIRED:2...|
  samples|      %|  samples|      %|
------------------------------------
   409247 100.000    342878 100.000 git
        CPU_CLK_UNHALT...|INST_RETIRED:2...|
          samples|      %|  samples|      %|
        ------------------------------------
           260476 63.6476    257843 75.1996 libz.so.1.2.3
           100876 24.6492     64378 18.7758 kernel-2.6.28.4_2.vmlinux
            30850  7.5382      7874  2.2964 libc-2.9.so
            14775  3.6103      8390  2.4469 git
             2020  0.4936      4325  1.2614 libcrypto.so.0.9.8
              191  0.0467        32  0.0093 libpthread-2.9.so
               58  0.0142        36  0.0105 ld-2.9.so
                1 2.4e-04         0       0 libldap-2.3.so.0.2.31

Detail list of the top 20 function entries (libz counted in one blob):

CPU_CLK_UNHALTED  INST_RETIRED_ANY_P
samples  %        samples  %        image name               symbol name
260476   63.6862  257843   75.2725  libz.so.1.2.3            /lib/libz.so.1.2.3
16587     4.0555  3636      1.0615  libc-2.9.so              memcpy
7710      1.8851  277       0.0809  libc-2.9.so              memmove
3679      0.8995  1108      0.3235  kernel-2.6.28.4_2.vmlinux d_validate
3546      0.8670  2607      0.7611  kernel-2.6.28.4_2.vmlinux __getblk
3174      0.7760  1813      0.5293  libc-2.9.so              _int_malloc
2396      0.5858  3681      1.0746  kernel-2.6.28.4_2.vmlinux copy_to_user
2270      0.5550  2528      0.7380  kernel-2.6.28.4_2.vmlinux __link_path_walk
2205      0.5391  1797      0.5246  kernel-2.6.28.4_2.vmlinux ext4_mark_iloc_dirty
2103      0.5142  1203      0.3512  kernel-2.6.28.4_2.vmlinux find_first_zero_bit
2077      0.5078  997       0.2911  kernel-2.6.28.4_2.vmlinux do_get_write_access
2070      0.5061  514       0.1501  git                      cache_name_compare
2043      0.4995  1501      0.4382  kernel-2.6.28.4_2.vmlinux rcu_irq_exit
2022      0.4944  1732      0.5056  kernel-2.6.28.4_2.vmlinux __ext4_get_inode_loc
2020      0.4939  4325      1.2626  libcrypto.so.0.9.8       /usr/lib/libcrypto.so.0.9.8
1965      0.4804  1384      0.4040  git                      patch_delta
1708      0.4176  984       0.2873  kernel-2.6.28.4_2.vmlinux rcu_sched_grace_period
1682      0.4112  727       0.2122  kernel-2.6.28.4_2.vmlinux sysfs_slab_alias
1659      0.4056  290       0.0847  git                      find_pack_entry_one
1480      0.3619  1307      0.3816  kernel-2.6.28.4_2.vmlinux ext4_writepage_trans_blocks

Notice the memmove line, where the CPU did 7710 / 277 = 27.8 cycles
per instruction, and compared to the total cycles spent inside the
source code of GIT for this command, all the memmove() calls
translates to (7710 * 100) / 14775 = 52.2% of this.

Retesting with a GIT program compiled for gcov usage, I found out that
the memmove() calls came from remove_index_entry_at() in read-cache.c,
where we have:

        memmove(istate->cache + pos,
                istate->cache + pos + 1,
                (istate->cache_nr - pos) * sizeof(struct cache_entry *));

remove_index_entry_at() is called 4902 times from check_updates() in
unpack-trees.c, and each time called we move each cache_entry pointers
(from the removed one) one step to the left.

Since we have 28828 entries in the cache this time, and if we on
average move half of them each time, we in total move approximately
4902 * 0.5 * 28828 * 4 = 282 629 712 bytes, or twice this amount if
each pointer is 8 bytes (64 bit).

OK, is seems that the function check_updates() is called 28 times, so
the estimated guess above had been more correct if check_updates() had
been called only once, but the point is: we get lots of bytes moved.

To fix this, and use an O(N) algorithm instead, where N is the number
of cache_entries, we delete/remove all entries in one loop through all
entries.

From a retest, the new remove_marked_cache_entries() from the patch
below, ended up with the following output line from oprofile:

46        0.0105  15        0.0041  git                      remove_marked_cache_entries

If we can trust the numbers from oprofile in this case, we saved
approximately ((7710 - 46) * 20000) / (2 * 1000 * 1000 * 1000) = 0.077
seconds CPU time with this fix for this particular test.  And notice
that now the CPU did only 46 / 15 = 3.1 cycles/instruction.

Signed-off-by: Kjetil Barvik <barvik@broadpark.no>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-02-18 17:11:21 -08:00
2009-01-21 22:46:52 -08:00
2008-12-17 21:56:48 -08:00
2009-02-08 22:07:53 -08:00
2009-02-08 21:40:52 -08:00
2009-02-05 19:40:39 -08:00
2008-07-19 11:25:51 -07:00
2008-06-30 22:45:50 -07:00
2008-07-19 11:17:43 -07:00
2008-10-26 16:21:08 -07:00
2007-05-30 15:03:50 -07:00
2008-10-21 17:58:11 -07:00
2007-06-07 00:04:01 -07:00
2009-01-28 11:33:03 -08:00
2008-08-28 20:50:10 -07:00
2008-09-10 15:00:17 -07:00
2008-12-07 15:13:02 -08:00
2008-08-03 14:14:10 -07:00
2008-10-08 08:05:43 -07:00
2009-01-28 11:33:51 -08:00
2008-09-15 23:11:35 -07:00
2008-09-25 09:39:24 -07:00
2009-01-25 17:13:29 -08:00
2009-01-25 17:13:29 -08:00
2009-02-05 19:40:39 -08:00
2009-01-28 11:33:03 -08:00
2008-12-21 02:47:21 -08:00
2009-01-05 13:01:01 -08:00
2008-10-10 08:39:20 -07:00
2008-10-10 08:39:20 -07:00
2009-01-17 18:30:41 -08:00
2007-06-07 00:04:01 -07:00
2008-11-02 16:36:40 -08:00
2008-12-21 02:47:21 -08:00
2008-11-11 14:49:50 -08:00
2008-02-25 23:57:35 -08:00
2009-02-04 13:07:02 -08:00
2009-01-17 18:30:41 -08:00
2008-08-05 21:21:08 -07:00
2008-07-13 14:12:48 -07:00
2009-02-05 19:40:35 -08:00
2008-12-03 14:27:17 -08:00
2008-07-13 14:12:48 -07:00
2009-02-07 00:51:47 -08:00
2008-09-25 08:00:28 -07:00
2009-01-17 18:30:41 -08:00
2008-09-07 23:52:16 -07:00
2009-02-05 19:40:36 -08:00
2008-12-07 15:13:02 -08:00
2008-11-23 19:23:34 -08:00
2008-07-21 19:11:50 -07:00
2008-07-21 19:11:50 -07:00
2009-02-05 19:40:39 -08:00
2008-10-25 12:09:31 -07:00
2008-12-21 02:47:21 -08:00
2009-02-05 19:40:39 -08:00
2008-07-21 19:11:50 -07:00
2009-01-19 22:18:29 -08:00
2009-01-19 22:18:29 -08:00
2009-02-07 00:51:47 -08:00
2009-01-11 13:21:57 -08:00
2008-07-21 19:11:50 -07:00
2007-11-09 21:14:10 -08:00
2009-01-28 15:00:27 -08:00
2008-07-21 19:11:50 -07:00
2008-07-21 19:11:50 -07:00
2008-07-21 19:11:50 -07:00
2008-03-02 15:11:07 -08:00
2005-11-02 16:50:58 -08:00
2006-03-25 16:35:43 -08:00
2009-01-17 18:30:41 -08:00
2007-05-01 02:59:08 -07:00
2008-10-09 11:26:17 -07:00
2008-09-29 07:30:16 -07:00
2009-01-21 23:52:16 -08:00
2009-02-04 16:30:43 -08:00
2009-02-04 16:30:43 -08:00

////////////////////////////////////////////////////////////////

	GIT - the stupid content tracker

////////////////////////////////////////////////////////////////

"git" can mean anything, depending on your mood.

 - random three-letter combination that is pronounceable, and not
   actually used by any common UNIX command.  The fact that it is a
   mispronunciation of "get" may or may not be relevant.
 - stupid. contemptible and despicable. simple. Take your pick from the
   dictionary of slang.
 - "global information tracker": you're in a good mood, and it actually
   works for you. Angels sing, and a light suddenly fills the room.
 - "goddamn idiotic truckload of sh*t": when it breaks

Git is a fast, scalable, distributed revision control system with an
unusually rich command set that provides both high-level operations
and full access to internals.

Git is an Open Source project covered by the GNU General Public License.
It was originally written by Linus Torvalds with help of a group of
hackers around the net. It is currently maintained by Junio C Hamano.

Please read the file INSTALL for installation instructions.
See Documentation/gittutorial.txt to get started, then see
Documentation/everyday.txt for a useful minimum set of commands,
and "man git-commandname" for documentation of each command.
CVS users may also want to read Documentation/cvs-migration.txt.

Many Git online resources are accessible from http://git.or.cz/
including full documentation and Git related tools.

The user discussion and development of Git take place on the Git
mailing list -- everyone is welcome to post bug reports, feature
requests, comments and patches to git@vger.kernel.org. To subscribe
to the list, send an email with just "subscribe git" in the body to
majordomo@vger.kernel.org. The mailing list archives are available at
http://marc.theaimsgroup.com/?l=git and other archival sites.

The messages titled "A note from the maintainer", "What's in
git.git (stable)" and "What's cooking in git.git (topics)" and
the discussion following them on the mailing list give a good
reference for project status, development direction and
remaining tasks.
Description
No description provided
Readme 582 MiB
Languages
C 50.5%
Shell 38.7%
Perl 4.5%
Tcl 3.2%
Python 0.8%
Other 2.1%