head     1.1;
branch   1.1.1;
access   ;
symbols  start:1.1.1.1 project:1.1.1;
locks    ; strict;
comment  @# @;


1.1
date     2009.07.09.02.51.48;  author yo2dh;  state Exp;
branches 1.1.1.1;
next     ;

1.1.1.1
date     2009.07.09.02.51.48;  author yo2dh;  state Exp;
branches ;
next     ;


desc
@@



1.1
log
@Initial revision
@
text
@The following are random scribbled notes on idiosyncracises of BOON.
They may or may not be helpful.


It's not uncommon to receive parsing errors (partially due to inadequacies
in the C parser), and if so, the error messages will probably not be very
informative.  If you run into this, first check: Were you able to compile
the package (using gcc and make)?  If the answer is "No", then the problem
is unlikely to be in my parser.  If the answer is "Yes", then it's probably
a bug in my code.

One known pitfall: be careful to use a version of cpp that collapses
backslashes at the end of the line.  Try:
       $ cat > foo
       foo\
       bar
       $ cpp foo
       foobar
       $
If the two lines aren't combined, the parser in my tool can get confused.
gcc's cpp should be ok.  This affects examples/{route.c,fingerd.c}.

Another known pitfall: there are known incompatibilities between the
header files on Solaris and Linux, so some Linux programs cannot be
compiled on Solaris.  This affects examples/route.c.

Another limitation of the parser I used: It can't handle some constructs.
For instance, __asm__ calls are problematic.  I suggest to just remove
them from the source code.  One way to do this is to use "-D'__asm__(x)='"
as an extra argument to cpp.


The tool doesn't keep track of all the possible ways that a function such
as resolve() could be called.  Instead, it just "merges" all of those
calls together.  Yup, this introduces imprecision; it's not a feature,
it is certainly a limitation (and a known one), but it is there because
you need to make approximations like these to get a static analysis
which is fast enough.

Here's an example that might illustrate what I mean by "merging":
   f(char *p) {
     printf("%s\n", p);
   }
   main() {
     f("foo");
     f("abracadabra");
   }
What will the analysis report?  Well, it knows that "p" refers to some
string.  It will say that the length of this string will be somewhere
in the range 4..12.  (The length of "foo" is 4 bytes, counting the
'\0' terminator, and the length of "abracadabra" is 12 bytes, if I'm
not mistaken.)  This answer is correct but conservative: it is not as
descriptive as it could be, because all of the calls to f have been
merged together and we've lost some information about the length of "p".
For instance, you can never discover that "length(p)" is always an even
integer with this 'range analysis' technique.

Here's a slight twist on that example, which might illustrate why it is
problematic for buffer overrun analysis.
   char *f(char *p) {
     return p;
   }
   main() {
     char *q, *r;
     char smallbuf[4], largebuf[64];
     q = f("foo");
     strcpy(smallbuf, q);
     r = f("abracadabra");
     strcpy(largebuf, r);
   }
What will BOON report?  Well, it will say that length(p) = 4..12,
and consequently length(f's return value) = 4..12.  Then it will look at
the assignment "q = f(...)", and conclude that length(q) could be anything
that length(f's return value) could be, so the tool will say length(q) = 4..12.
(Same goes for length(r).)  But now look at the "strcpy(smallbuf, q)"
operation.  The analysis knows that this copies the string referred to by
q into the buffer referred to by smallbuf, and so after the strcpy,
length(smallbuf) could be anything that length(q) could be.  So the tool
will conclude that length(smallbuf) = 4..12.  But the tool also knows that
alloc(smallbuf) = 4..4, i.e., exactly 4 bytes have been allocated.  This
means that, at least as far as the tool can see, we have a potential buffer
situation: smallbuf might contain a string that could be as large as 12
bytes long (the tool believes), but it won't have more than 4 bytes allocated
for it, so this would be an overrun.  So the tool will give you an output
complaining about a possible buffer overrun here.

Yet we know -- looking at this by hand -- that there is no buffer overrun
in the second example above.  Only the string "foo" ever gets copied into
smallbuf, so there's no danger.  What happened?  What happened is that the
tool merged together information about the length of the arguments to f(),
and this introduced imprecision.  So, as a consequence, we get lots of
false alarms where the analysis was more imprecise than we would like it
to be.  Sigh.  That's why we still need a human in the loop to distinguish
false alarms from real bugs!


We can see an instance of this "merging" imprecision in examples/fingerd.c.
In one place, fatal() is called with a 5-byte string -- 'fatal("pipe")' --
and this is treated just as though the program contained the assignment
'msg = "pipe";'.  Since "pipe" refers to a constant byte array that is 5
bytes long (counting the '\0' terminator) and holds a 5-byte string, we
know that length("pipe") = alloc("pipe") = 5.  Due to the (fake) assignment
'msg = "pipe";', the program concludes that 5 is a possible value for
length(msg), and similarly 5 is a possible value for alloc(msg).

Then the program encounters a second call to fatal(), this time with a
7-byte string: 'fatal("fdopen")'.  We get the fake assignment 'msg = "fdopen"',
and so the tool concludes that, whatever the possible range of length(msg)
is, it must include the value 7.  Similarly, 7 is in alloc(msg).

Combining these two observations, the tool concludes that length(msg) must
contain at least the range 5..7 (a conservative, but hopefully safe,
assumption).  Since there are no other assignments to 'msg', this is the
best choice we can have, and so the tool says length(msg) = 5..7.
Similarly, the tool concludes alloc(msg) = 5..7.

Now after all that reasoning, the tool checks to see whether there might
be a buffer overrun in the buffer referred to by the variable 'msg'.
At this point, the tool cannot rule out the possibility of a buffer overrun,
since from the information length(msg) = 5..7, alloc(msg) = 5..7, it appears
possible that 'msg' could refer to a 5-byte buffer containing a 7-byte string
(which would indeed be a buffer overrun if it happened).  (Of course,
by looking at the program you and I can tell that this bad scenario can't
happen in real life, but the tool doesn't know that.)  So the tool must
flag this as a possible buffer overrun.

That would be the end of the story, except that on very large programs we'd
get a lot of false alarms from exactly this type of program.  So, I added
an extra heuristic to try to help you filter out this class of false alarm.
It's not possible to be identify them with any certainty, but we can make
an educated guess.  In particular, if we see the situation where
length(msg) = X..Y and alloc(msg) = X..Y, then -- although we can't be
sure -- we can guess that this might be a false alarm analogous to the one
explained above, and so I label this case as 'Slight chance of a buffer
overrun...' rather than as 'Possible buffer overrun'.  This heuristic is
of course not reliable -- it might take a real buffer overrun and label
it 'slight chance...' -- but personally I think it might help reduce the
effort required to check all the output of the tool.


In general, BOON seems to be more effective on programs with statically
sized buffers, and be less effective (have more false alarms) on programs
that dynamically allocate buffers using malloc() with buffer sizes determined
at runtime.  This is most unfortunate, as the latter programming style
often can be more secure against buffer overruns, but that's how life is.


As you'll probably re-discover for yourself, I find that complaints about
a possible buffer overrun in an argument to a function like strcmp() tend
to indicate one of two cases: (1) a false alarm; or (2) a buffer that was
overflowed else, where the tool keeps complaining about it multiple times.

(1) is easy to see how it happen; it's a result of the "merging" that
I mentioned earlier.

An example of (2) might look like this:
   f(char *p) {
     printf("%s\n", p);
   }
   main() {
     char buf[8];
     strcpy(buf, "Hello, world!"); /* This is a buffer overrun */
     f(buf);
   }
If you try the tool on this example, you'll see that the tool correctly
complains that there's a buffer overrun in buf[] (due to the strcpy());
but then it also complains that there's a buffer overrun in p[].  Why?
Well, p also refers to a buffer that has only 8 bytes allocated for it
but will contain more than 8 bytes.  The tool doesn't know that these
two warnings are just two instances of the same buffer overrun, so it
warns about both instances.  You'll probably run into this a fair amount.


Here is an example of a vulnerability detected in real code, and how to
interpret the tool's output.  The tool says:
  Possibly a buffer overflow in `name@@resolve()':
    128..128 bytes allocated, 1..+Infinity bytes used.
    <- siz(name@@resolve()) <- siz(target@@rt_del())
    <- len(name@@resolve()) <- len((unnamed field h_name))
This shows you that the tool knows exactly 128 bytes have been allocated
for the buffer "name" in resolve(), and it claims that the length of the
string that is stored in that buffer could be arbitrarily long (up to
"+Infinity" bytes, it says), so there's a potential buffer overrun here.

In this case, it's even able to help you figure out where these numbers
came from.  Why is the length up to "+Infinity" bytes?  Well, the source
of this is "(unnamed field h_name)", which in this case corresponds to
"hp->h_name", the result from the gethostbyname().  Also, the number of
bytes allocated came from a place where resolve() was called in the
rt_del() function with "target" as its first parameter.  When you have
this type of information about why the tool thinks there might be a buffer
overrun here, it will be extremely useful in tracking down the source of
the bug and figuring out whether or not this is likely to be a false alarm.

In general, you often will not get so lucky as to get so much information
about why the tool thinks there's an overrun, but when you do get the
information, I've found that it is often useful.

Also, please be warned that BOON is by no means guaranteed to find all
bugs, or even all buffer overruns, in your source code.  Just because
BOON gives no warnings doesn't mean the software is secure!
@


1.1.1.1
log
@CVS TEST
@
text
@@
