JTC1/SC22/WG14
N1106
Document: WG14 N1106
Date: 07-Mar-05
Austin Group Review of ISO/IEC WDTR 24731
Specification for Secure C Library Functions
============================================
During the Redmond WG 14 meeting, I was asked to request review and
comment on the Secure C Library functions by the Austin Group.
This report summarizes the comments received.
1. The Austin Group in general applauds WG 14 for focusing attention
on the problems of buffer overflow.
2. It is generally felt that the name "Secure" is incorrect,
and should be changed. Possible alternates include "Enhanced
Library Interfaces" or "Robust Library Interfaces".
3. The term "diagnosed undefined behavior" is confusing, and
inappropriate. Some other term should be sought.
4. The approach of simply providing a length argument to a number
of string functions may cause as many problems as it solves. It
provides no assurance that the programmer provide a sensible
value, and may end up obfuscating otherwise clear and safe
usage of the original function. Functions that allocate memory
(using malloc()) would provide safer, clearer, and more robust
interfaces. E.g. strdup() instead of strcpy().
5. The asctime_s() and related functions do not support localization,
and should be dropped in favor of using strftime().
6. Namespace pollution issues ... the behavior should not be
implementation defined if __STDC_WANT_SECURE_LIB is undefined. See
P4 para3. The undefined case should be the same as defined as 0
In general, many of the group found this proposed TR to be controversial,
and the Austin Group has no strong showing of support for it.
=======================================================================
Background
----------
The Austin Group initially looked at the document in late December,
2004, and discussed the document during their Face to Face plenary
meeting in January 2005.
At that meeting, the following observations were made:
* Responses to Draft TR from WG14: "Security" Library Extensions
WG 14 have submitted a draft of this TR for CD Registration, and have
asked AG for review and comment.
The document is titled "Specification for Secure C Library Functions"
but does not cover anything related to security. It provides bound
checked string functions and other interfaces intended to reduce
programmer error. The AG respectfully suggest that the title should
be something more like "Enhanced Library Interfaces", or similar.
Security can be divided into six basic requirements, or tenets, that
help ensure data confidentiality, integrity, and availability. The
six security tenets are:
* Identification. This deals with user names and how users identify
themselves to the system.
* Authentication. This deals with passwords, smart cards,
biometrics, and so on. Authentication is how users prove to
the system that they are who they claim to be.
* Access control (also called authorization). This deals with
access and the privileges granted to users so that they may
perform certain functions on the system.
* Confidentiality. This deals with encryption. Confidentiality
mechanisms ensure that only authorized people can see data
stored on or traveling across the network.
* Integrity. This deals with checksums and digital
signatures. Integrity mechanisms ensure that data is not garbled,
lost, or changed when traveling across the network.
* Nonrepudiation. This is a means of providing proof of data
transmission or receipt so that the occurrence of a transaction
cannot later be denied.
This TR deals only with integrity, and only a little of that.
Namespace pollution issues ... should not be implementation defined
if __STDC_WANT_SECURE_LIB is undefined. See P4 para3. The undefined
case should be the same as defined as 0.
We should continue to review this document offline. Comments
to austin-group-l. Nick Stoughton will deliver the comments
to the next meeting of WG14. Mailing deadline (for WG14)
is 7 March 2005. Our deadline should be one or two days
before that. ACTION AI-2005-01-11: All to review WG14
Security TR and report to austin-group-l by 2005-3-1.
========================================================================
The following comments have been extracted from the messages
posted in response to that action item. These responses have
been slightly edited for consistency and to remove some of the
off-topic side discussions! The entire messages can be found at
http://www.opengroup.org/austin/mailarchives/ag/index.html
At this stage, these are "raw" comments ... they represent the opinions
of some of the approximately 550-600 participants, and have not been
officially adopted as an offical opinion of the Austin Group itself.
-----------------------------------------------
From: Curtis Smith
This proposal seems as though it took a bit of effort to develop.
I especially like the behaviour of strncpy_s in that it ensures a
nul terminator (when possible) and does not require the writing of
extra nuls at the end of the string.
However, I don't think [this TR] would be especially useful, and I
don't envision it gaining wide acceptance due to the following:
awkward names with _s suffix
corresponding traditional functions would apparently still be
available, so whatever "security" the functions purport to provide
can be circumvented
anyone interested in using the new functions would probably already
be writing code to check buffer sizes
people that want to manipulate strings are probably better off using
C++ unless they're concerned with speed in time-critical loops in
which they don't want to be doing superfluous checks anyway.
I think the term "secure" is misleading.
Perpetuation of other poor interfaces (e.g., anything to do with
time_t) is not a good idea.
These changes really don't offer any functionality not already
available.
If you did want to proceed with the proposal, I think it is missing
some functionality that really does aid in robustness.
Background: Whenever I am trying to write robust code, especially for
libraries that 3rd parties might call, I like to do some rudimentary
argument checking for invalid pointers. For many specific operating
environments, this is possible, e.g., in Windows, the functions
IsBadReadPtr, IsBadWritePtr, and IsBadStringPtr implement this
desired functionality (although they use logic inverted from what
I suggest below). In operating systems which do not provide such
functions, equivalents can usually be written using machine-specific
instructions to analyse the accessibilty of certain parts of memory.
Even in environments where no such cpu instructions exist, a stub set
of functions can be written to ensure that NULL pointers aren't being
used. It can be argued that these functions are not robust enough
because they don't handle the case where code in parallel suddenly
changes the readability or writability of a particular section of
memory; nevertheless, their use does indeed provide value
[[ edited out suggestion for additional robustness checking functions,
available on request ]]
-----------------------------------------------
From: Marshall Wiley
As an application (rather than OS) developer, these seem like a good
start. However, this leaves one problem that I haven't seen addressed
anywhere - the last of a supported way to determine how long a va_arg
list is, and thus prevent overflow when attempting to process one. The
Open VMS HP C library offers a non-portable va_count() routine which
returns the number of quadwords (Alpha) in a va_list, which is better
than nothing, but there doesn't seem to be anything in the POSIX or C
standards to address this problem correctly. A related issue is the
apparent lack of a supported way to create a va_list, but I can't
claim that is (directly) a security problem.
As an API developer, I often need to take an incoming va_list, break
out some of the arguments, and pass the remainder to a function
such as printf (vprintf). Since I don't know how many arguments to
expect, what types they may be, nor how long the arg list actually
is, it's virtually impossible to write a supported function that
I can guarantee won't run off the end of the list. Today I have to
attempt to determine how each OS implements va_list and try to work
around this restriction. But since the behavior isn't specified, that
implementation can change tomorrow, potentially breaking my code,
or worse, bypassing my attempts to provide some degree of security.
[[ This led to a long, off topic, discussion on the merits of this
proposal, which can be largely summarized as "this will either require
hardware support or substantial compiler changes {Nick Mclaren has
details}". Most thought it was not a good idea.]]
-----------------------------------------------
>From Paul Eggert:
Let's vote against that proposal. It's controversial, arguably
leads to buggier software, and does not reflect the consensus of
the community.
Linus Torvalds had this to say about a similar, earlier proposal:
The above code is slow, ugly, non-straightforward, unportable and
no more secure than the original code was.
In short, it is just stupid code.
But hey, if you want to advocate stupid code in public, that's
your prerogative. But please don't be proud of it.
--
<http://lists.nas.nasa.gov/archives/ext/linux-security-audit/2002/01/msg00073.html>
The GNU C Library maintainers have rejected similar proposals, for
similar reasons. For example, Ulrich Drepper wrote:
This is horribly inefficient BSD crap. Using these functions only
leads to other errors. Correct string handling means that you
always know how long your strings are and therefore you can you
memcpy (instead of strcpy).
-- <http://sources.redhat.com/ml/libc-alpha/2000-08/msg00053.html>
Given this history of controversy, I am a bit surprised to see this
proposal appear before a standards body.
I am not aware of any study demonstrating that the approach embodied
by the proposed API leads to safer or more-secure software. On the
contrary, when I very-briefly attempted to investigate the matter,
I found that it made C code harder to read and to maintain, and
even buggier; its use did not catch any bugs in the code I surveyed
and its use apparently introduced at least one bug. I surveyed
OpenSSH_3.0.2p1, the then-current version. For some details please
see:
http://sources.redhat.com/ml/libc-alpha/2002-01/msg00096.html
http://sources.redhat.com/ml/libc-alpha/2002-01/msg00159.html
My understanding is that the proposed functions are mainly useful in
environments where developers must take a lot of existing code, much of
it of poor quality, and try to limit the scope of some of its faults
without necessarily having to understand it thoroughly. The goal is
mainly to prevent certain low-level failures, not to improve the code
quality or functionality. This is a fairly specialized need, and does
not form sufficient justification for introducing these primitives
into a widely used standard. Those who have such a need can easily
write the functions themselves, and include their implementations as
part of their application.
A new API like this should reflect consensus. We should not
standardize on a new API that is so obviously controversial.
[[Clive Feather points out "that this is a Technical Report. That is,
it is *NOT* a proposal to change the Standard; it's a document to
give some consensus to efforts in this area.
I have a list of detailed comments on the document, but I'll just
repeat my first one here: it should not be called "Secure functions"
or anything like that; the correct name should be something along
the lines of "Functions with Bounds Checking and/or Re-entrancy".
I don't see the harm, once my issues are fixed, in issuing this as a
TR - it gives a point of focus. Either the concepts will be taken up,
or people will adjust them to meet real-world concerns, or they'll die.
I would take a very different view, however, to bringing them into
the C Standard at this point."]]
-----------------------------------------------
>From Glenn Seeds
While I may not agree with the details of what has been proposed,
I applaud the recognition that something needs to be done, and I
believe we should support the effort by making constructive comments.
I certainly don't agree with a point of view that says "good
programmers don't need this stuff". I do agree that any "improvements"
must satisfy some key criteria: - Code must not be made less readable
by their use. - It should be possible to implement them so that
they perform as well as equivalently robust hand-written code in
the application.
1) I was confused by the term "secure", which implied something about
security (i.e. access control). To avoid confusing others, could we
not use the term "robust" instead?
2) The current [POSIX] standard defines 2 methods for obtaining the
text corresponding to an error code: strerror, and strerror_r
strerror is not thread safe. strerror_r was introduced for this
purpose.
Unfortunately, strerror_r has a serious usability problem. It requires
the caller to supply a fixed buffer for the result, but there is no
way to determine how big this buffer needs to be. The only way to
make this work in general is to supply an initial buffer, check for
overflow, re-allocate, and try again. Repeat until success, or you've
used all available memory. Minimal code looks something like this:
[program fragment edited to work!]
size_t buflen;
char *buf;
buflen = 100;
while (0 != (buf = malloc(buflen))) {
if (0 == (strerror_r(errno, buf, buflen)) )
break;
free(buf); buflen++;
}
A possible solution is to add a strerrorlen(errno) to get the required
buffer size. The resulting code would be:
size_t buflen; char *buf; buflen = 1+strerrorlen(errno); buf =
malloc(buflen); strerror_r(errno, buf, buflen);
[Additional comments] strerror_s() as defined in the proposal does
*not* solve the usability problems identified above. Strerrorlen() is
easy to implement efficiently, and does not impair the readabaility
of the application. While the malloc() approach used above does
have a performance impact, applications are not forced to do that. If
simple truncation is acceptable for the application, then the existing
strerror_r() is sufficient.
-----------------------------------------------
>From Bruce Korb
Every new interface introduces new complexity, so throwing in new stuff
to standardize needs to be widely regarded as useful. Paul [Eggert]
did a reasonably careful code analysis of OpenSSH and did not find that
it had any appreciable benefit. "appreciable benefit" to me would mean
improved conciseness with equivalent or better verification. I do
not see that this interface provides that, either. It is intended
to do so, but when you get around to actually coding something up
with it, it just fails to provide anything appreciable. So, perhaps
instead of starting with the premise, "I think this will help guide
programmers into being more careful", instead start with examining
the misusages of string copies and providing interfaces that can
more helpful. Both of which are outside of POSIX, of course. But,
just as a couple of examples:
char* asprintf( const char*, ...) -- allocating sprintf function
char* strjoin( const char*, ...) -- multi-string "strdup()"
-----------------------------------------------
>From Garret Wollman
I think in particular the draft's attempts to save some of the
fundamentally broken interfaces in C by restricting them to a magic
maximum buffer size is very much the wrong way to go.
General comments:
* __STDC_WANT_SECURE_LIB__ is a horrible botch. [[but its the same
technique that POSIX uses ...]]
* tmpnam_s() (and functions like it) cannot be used securely; it
should not be included.
* The whole concept of "diagnosed undefined behavior" as invoked
(e.g.) in section 5 does not bear contemplation. [[Clive Feather:
The *name* of the concept is foul, but what it's trying to do isn't
as bad as all that.]]
* The fopen_s() etc. functions are not improvements over the Standard
functions.
* I don't believe the invention of `rsize_t' and `errno_t' improves
anything. In general, new `foo_t' types are almost always a mistake,
since C does not have incomplete typedefs.
* The improvements to the scanf() family seem reasonable, but I'm
dubious as to the value of improving scanf() since it is nearly always
the wrong function to use when parsing input.
* gets_s() is a pointless interface; fgets() does the useful part of
what gets_s() is described as doing. By contrast, an interface like
fgetln() would be a meaningful improvement.
* getenv_s() does not offer anything over strdup(getenv(...)) except
unnecessary complexity. [[strdup is not a C function, its a POSIX
one]]
* OpenBSD's strlcpy() would be an improvement over the strcpy_s()
interface described here, and has the benefit of actually being used
in a real implementation, Linux developers notwithstanding.
* The 4.4BSD function strsep() is an improvement over the Standard
strtok() and strtok_r() interfaces; this draft's strtok_s() is not.
* I would agree with the comments others have made regarding
strerror_s(). As defined, the whole strerror() family is a horrible
interface. In retrospect, it would have been much better to simply
standardize "%m" as a format specifier for printf().
* Extending the current time interfaces at this time is inadvisable.
They are almost all broken with respect to timezones and localization;
this needs to be on a separate standardization track.
-----------------------------------------------
>From Ulrich Drepper
The proposed safe(r) ISO C library fails to address to issue completely.
The problem with the existing interfaces is that the programmer has to
put in a lot of additional effort to make sure the program behaves
correctly. In many situations a much simpler code, with all kinds of
error checking removed, works equally well and therefore is left out.
This is the core of the problem. Code is rather written like
char *p = malloc (3 * NAME_LEN);
strcpy (p, name1);
strcat (p, name2);
strcat (p, name3);
for some fixed (but arbitrarily chosen NAME_LEN) than
char *p = malloc (strlen (name1) + strlen (name2) + strlen (name3) + 1);
if (p != NULL)
{
strcpy (p, name1);
strcat (p, name2);
strcat (p, name3);
}
Code like the former can be found in almost all software projects. The
second code is just so much longer and, if something needs to change, it
has to be changed in two places. Programmers are lazy, though, so this
is a big deterrent.
Proposing to make the life of a programmer even harder is not going to
help. But this is exactly what is proposed. Requiring the programming
to write
size_t len = 3 * NAME_LEN;
char *p = NULL;
again:
char *p2 = realloc (p, len);
if (p2 == NULL)
abort ();
p = p2;
if (strcpy_s (p, len, name1) != 0
|| strcat_s (p, len, name2) != 0
|| strcat_s (p, len name3) != 0)
{
len += 2;
goto again;
}
will make the situation worst, if it changes anything. Programmers
won't put up with the amount of new code needed. The resulting code is
probably more complicated than the second example code which is
perfectly safe as well. The program in both cases needs to keep track
of the length of the string, so why not do it right from the beginning?
If programmers are asked about what they want it is automatic, implicit
memory allocation. I know that this is contrary to the design decision
of ISO C where malloc et.al. are only used explicitly (and not
implicitly by some interface). That's OK, but it does not justify
inventing (and this all is pure invention) useless interfaces.
It is clear that those who wish a string call should choose appropriate
programming languages which either have it build in or allow to
implement it easily using OO methods. C does not fall into this
category. But if the no-implicit-malloc rule is ignored (as POSIX
does), there are many possibilities. The GNU C library has many
functions which provide implicit memory management:
int asprintf (char **s, const char *fmt, ...)
int vasprintf (char **s, const char *fmt, va_list ap)
These functions work similar to sprintf() but actually allocate the
memory needed for the result with an adequately sized call to malloc().
For more complicated situations this interface is useful:
FILE *open_memstream (char **bufp, size_t *sizep)
The function returns a stream which can be used with any of the stream
operations. The output is accumulated in a memory buffer the runtime
allocates. Upon fclose() to variables references in the
open_memstream() call are filled with the final string. This interface
allows to create arbitrarily complex strings. Some people suggested
more interfaces like strjoin(), which would just append an arbitrary
number of strings, but this is something which can be easily handled
with asprintf() and open_memstream().
In a similar fashion, stdio functions should be extended. glibc
contains getline()/getdelim() to replace fgets() etc. These functions
do their own memory handling and there will never be a need again to
arbitrary limit line length (and the resulting errors in the code
handling too long lines).
scanf() with %s etc is another case. Passing the length means the
programmer has to come with the error handling (and the correct length
value to pass to the function). This is ignored as well. The
alternative is to have scanf() allocate the memory. glibc uses a format
flag ('a' in our case, yes it conflicts with C99). I.e.,
char *p;
scanf(f, "%as", &p)
will cause p to be a freshly allocated buffer.
These are the kind of extensions which, without a OO framework in the
language, make the programmer's life easier and therefore will actually
be used. None of the proposed interfaces can say that. They all
require more work to be done or are just plain silly.
-----------------------------------------------
>From Konrad Schwarz
Diagnosed undefined behavior should be flagged by raise(3)'ing an
appropriate signal.
(I think there is a pretty clear trend towards memory protection -- not
necessarily paging support -- in embedded processor designs.)
This is in addition to the other criticims already posted.