If you have a buffer that is not NUL-terminated but want to parse an
integer, there aren't many good options. If you use strtol() and
friends, you risk running off the end of the buffer if there is no
non-digit terminating character. And even if you carefully make sure
that there is such a character, ASan's strict-string-check mode will
still complain.
You can copy bytes into a temporary buffer, terminate it, and then call
strtol(), but doing so adds some pitfalls (like making sure you soak up
whitespace and leading +/- signs, and reporting overflow for overly long
input). Or you can hand-parse the digits, but then you need to take some
care to handle overflow (and again, whitespace and +/- signs).
These things aren't impossible to do right, but it's error-prone to have
to do them in every spot that wants to do such parsing. So let's add
some functions which can be used across the code base.
There are a few choices regarding the interface and the implementation.
First, the implementation:
- I went with with parsing the digits (rather than buffering and
passing to libc functions). It ends up being a similar amount of
code because we have to do some parsing either way. And likewise
overflow detection depends on the exact type the caller wants, so we
either have to do it by hand or write a separate wrapper for
strtol(), strtoumax(), and so on.
- Unsigned overflow detection is done using the same techniques as in
unsigned_add_overflows(), etc. We can't use those macros directly
because our core function is type-agnostic (so the caller passes in
the max value, rather than us deriving it on the fly). This is
similar to how git_parse_int(), etc, work.
- Signed overflow detection assumes that we can express a negative
value with magnitude one larger than our maximum positive value
(e.g., -128..127 for a signed 8-bit value). I doubt this is
guaranteed by the standard, but it should hold in practice, and we
make the same assumption in git_parse_int(), etc. The nice thing
about this is that we can derive the range from the number of bits
in the type. For ints, you obviously could use INT_MIN..INT_MAX, but
for an arbitrary type, we can use maximum_signed_value_of_type().
- I didn't bother with handling bases other than 10. It would
complicate the code, and I suspect it won't be needed. We could
probably retro-fit it later without too much work, if need be.
For the interface:
- What do we call it? We have git_parse_int() and friends, which aim
to make parsing less error-prone. And in some ways, these are just
buffer (rather than string) versions of those functions. But not
entirely. Those functions are aimed at parsing a single user-facing
value. So they accept a unit prefix (e.g., "10k"), which we won't
always want. And they insist that the whole string is consumed
(rather than passing back an "end" pointer).
We also have strtol_i() and strtoul_ui() wrappers, which try to make
error handling simpler (especially around overflow), but mostly
behave like their libc counterparts. These also don't pass out an
end pointer, though.
So I started a new namespace, "parse_<type>_from_buf".
- Like those other functions above, we use an out-parameter to store
the result, which lets us return an error code directly. This avoids
the complicated errno dance for detecting overflow that you get with
strtol().
What should the error code look like? git_parse_int() uses a bool
for success/failure. But strtol_ui() uses the syscall-like "0 is
success, -1 is error" convention.
I went with the bool approach here. Since the names are closest to
those functions, I thought it would cause the least confusion.
- Unlike git_parse_signed() and friends, we do not insist that the
entire buffer be consumed. For parsing a specific standalone string
that makes sense, but within an unterminated buffer you are much
more likely to be parsing multiple fields from a larger data set.
We pass out an "end" pointer the same way strtol() does. Another
option is to accept the input as an in-out parameter and advance the
pointer ourselves (and likewise shrink the length pointer). That
would let you do something like:
if (!parse_int_from_buf(&p, &len, &out))
return error(...);
/* "p" and "len" were adjusted automatically */
if (!len || *p++ != ' ')
return error(...);
That saves a few lines of code in some spots, but requires a few
more in others (depending on whether the caller has a length in the
first place or is using an end pointer). Of the two callers I intend
to immediately convert, we have one of each type!
I went with the strtol() approach as flexible and time-tested.
- We could likewise take the input buffer as two pointers (start and
end) rather than a pointer and a length. That again makes life
easier for some callers and harder for others. I stuck with pointer
and length as the more usual interface.
- What happens when a caller passes in a NULL end pointer? This is
allowed by strtol(). But I think it's often a sign of a lurking bug,
because there's no way to know how much was consumed (and even if a
caller wants to assume everything is consumed, you have no way to
verify it). So it is simply an error in this interface (you'd get a
segfault).
I am tempted to say that if the end pointer is NULL the functions
could confirm that the entire buffer was consumed, as a convenience.
But that felt a bit magical and surprising.
Like git_parse_*(), there is a generic signed/unsigned helper, and then
we can add type-specific helpers on top. I've added an int helper here
to start, and we'll add more as we convert callers.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Git - fast, scalable, distributed revision control system
Git is a fast, scalable, distributed revision control system with an unusually rich command set that provides both high-level operations and full access to internals.
Git is an Open Source project covered by the GNU General Public License version 2 (some parts of it are under different licenses, compatible with the GPLv2). It was originally written by Linus Torvalds with help of a group of hackers around the net.
Please read the file INSTALL for installation instructions.
Many Git online resources are accessible from https://git-scm.com/ including full documentation and Git related tools.
See Documentation/gittutorial.adoc to get started, then see
Documentation/giteveryday.adoc for a useful minimum set of commands, and
Documentation/git-<commandname>.adoc for documentation of each command.
If git has been correctly installed, then the tutorial can also be
read with man gittutorial or git help tutorial, and the
documentation of each command with man git-<commandname> or git help <commandname>.
CVS users may also want to read Documentation/gitcvs-migration.adoc
(man gitcvs-migration or git help cvs-migration if git is
installed).
The user discussion and development of Git take place on the Git mailing list -- everyone is welcome to post bug reports, feature requests, comments and patches to git@vger.kernel.org (read Documentation/SubmittingPatches for instructions on patch submission and Documentation/CodingGuidelines).
Those wishing to help with error message, usage and informational message
string translations (localization l10) should see po/README.md
(a po file is a Portable Object file that holds the translations).
To subscribe to the list, send an email to git+subscribe@vger.kernel.org (see https://subspace.kernel.org/subscribing.html for details). The mailing list archives are available at https://lore.kernel.org/git/, https://marc.info/?l=git and other archival sites.
Issues which are security relevant should be disclosed privately to the Git Security mailing list git-security@googlegroups.com.
The maintainer frequently sends the "What's cooking" reports that list the current status of various development topics to the mailing list. The discussion following them give a good reference for project status, development direction and remaining tasks.
The name "git" was given by Linus Torvalds when he wrote the very first version. He described the tool as "the stupid content tracker" and the name as (depending on your mood):
- random three-letter combination that is pronounceable, and not actually used by any common UNIX command. The fact that it is a mispronunciation of "get" may or may not be relevant.
- stupid. contemptible and despicable. simple. Take your pick from the dictionary of slang.
- "global information tracker": you're in a good mood, and it actually works for you. Angels sing, and a light suddenly fills the room.
- "goddamn idiotic truckload of sh*t": when it breaks