Submitter: Gwyn (US)
Submission Date: 2006-10-17
Source: Douglas A. Gwyn <gwyn@arl.army.mil>
Version: 1.1
Date: 2007-09-06
Subject: gets
is generally
unsafe
Summary
The gets
function's notorious
vulnerability to buffer overrun ought to be addressed.
Rationale
The gets
function draws much criticism
due to its vulnerability to buffer overrun, which is inherent
in its legacy interface specification. Its very presence in the
Standard has been taken by many as evidence of WG14's
ineptitude or lack of concern for software reliability, despite
arguments to the contrary. The Committee may be more favorably
regarded within the programming community if it takes
reasonable steps to address this issue.
The recent publication of TR 24731, which specifies
an alternative function gets_s
that could
be used instead of gets
, does not satisfy
the critics who claim that the continuing existence of the
gets
specification in the C standard
amounts to an endorsement of its unsafe use in new
programs.
Discussion
Consider this representative usage of
gets
:
#include <stdio.h> static char line[BUFSIZ]; /* BUFSIZ is bigger than any normal text line */ extern void process(char *); int main(void) { while (gets(line)) process(line); /* may invoke puts(line), etc. */ return 0; }This shows how convenient the
gets
interface is. The well-known problem with this interface occurs
when the standard input stream contains a text line longer than
the allocated size of the buffer; because
gets
has no way to know that size, it
blindly continues to store data beyond the end of the array,
with potentially devastating impact on program operation. The
infamous 1988 Morris Internet worm was merely the first of many
attacks that exploit this behavior to breach security in
network applications.
What might be done to improve the specification for
gets
so that the safety of this
exceptionally convenient interface can be assured? (I do not
recommend removing it altogether!) It seems evident that the
only feasible change would be to impose a limit on the amount
of data transferred. Requiring the programmer to establish the
limit through some additional interface would sacrifice the
convenience. The alternative is to impose some constant limit,
in which case the remaining question is what would be a
suitable constant. That can be answered by examining existing
uses of gets
to determine typical buffer
sizes. It appears that two usage patterns predominate: Using
the BUFSIZ
macro which happens to be
conveniently at hand as a consequence of #include
<stdio.h>
, or using some assumed text-line
length such as 80.
Restricting the amount of data transferred to only 80
characters may be too severe for many applications, and in any
case it would necessitate the introduction of a new limit macro
such as LINE_MAX
to provide a convenient
way for programmers to declare suitable buffer arrays.
Therefore I recommend instead that the existing usage of
BUFSIZ
be legitimatized, as follows.
Suggested Technical Corrigendum
Add the following sentence to the Description in subclause 7.19.7.7 (Thegets
function)],
between the two existing sentences:
At most BUFSIZ-1
characters are copied to
the array; excessive characters are discarded.
(The portion after the semicolon isn't strictly necessary, but
it adds clarity.)
Impact
The proposed change to the gets
specification would have the effect of preventing buffer
overruns in many existing applications. Overly long input lines
would be silently truncated (which is better than the
alternative of treating them as multiple lines).
Existing applications using small buffers would not be automatically rescued by this change; however, there would be a simple source-code fix (change the buffer size). New applications would obtain safe behavior by using the known limit for buffer allocation, exactly as in the above example.
Adoption of such a change would demonstrate the committee's willingness to improve specifications compatibly with the existing standard, without resorting to unnecessarily drastic measures.
Committee Discussion (for history only)
The Committee thinks that the programming
community would be better served by flagging the
gets()
function as deprecated.
Technical Corrigendum
Add to subclause 7.26.9:
The gets
function is obsolescent, and is
deprecated.
Add forward reference in 7.19.7.7 to 7.26.9