I came across three interesting articles this past week:
1) The DailyWTF breaks the story about the State of Oklahoma's Department of Corrections sex offenders list being an open proxy for their Oracle database. SELECT * FROM FAIL. As Les says in a comment on Bruce Schneier's site, this is a screwup that can't be blamed on a single person. Sure one guy wrote bad code and then applied a half-assed fix, but there's no way the original passed through a QA process or was reviewed by anyone (if your QA team isn't looking out for SQL injection/XSS opportunities, they should be).
What I think is more revealing is the organization's response to the initial report. Inferring from the story, the word of this data leak would have to come down the chain of command from the top, and I would hope they would be taking this seriously. All of these people -- and their bosses -- aware of the problem, and the patch they push out is still so lame? Did they ever discuss the fundamental underlying problem? Or do you suppose somebody just checked that the original URL no longer worked?
2) Matasano Chargen has an accessible account of an epic hack against a bug in Adobe's Flash plugin. If I follow it right, Mark Dowd figured out how to use an unchecked malloc failure to convince the byte-code verifier to ignore machine code embedded in a flash file at one point, but to run it nonetheless. In his writeup, Thomas Ptacek weaves in Terminator references; the actual exploit (subverting the verifier to ignore semantically significant content) puts me more in mind of A Fire Upon the Deep.
3) The Six Dumbest Ideas in Computer Security: The ideas here are simple, but well worth reading. A recurring theme is how easy it is to approach a problem backwards, and how that sets you up to be on the losing side (no matter how hard you work). Fun game: which of the six ideas would have prevented the above failures?
Monday, April 21, 2008
Friday, April 18, 2008
When Interface Dictates Implementation
This whole topic is probably kind of obvious, but I recently ran into a great example of how a seemingly minor difference in an interface can have a large impact on possible implementations. I was evaluating two libraries for inclusion in a project at work. Both provided a C API that we would use, and the two interfaces were only slightly different. This difference would prove to be significant.
There are only a handful of ways to create and manipulate "objects" with a C library. The first library, "Toolkit S", had an interface like this:
int CreateObject(...); /* returns a handle or a reserved value to indicate failure */
int Foo(int handle, ...);
int Bar(int handle, ...);
int DestroyObject(int handle);
That is, all required allocations are done internally, and clients interact via a numeric id. This is similar to how _open works. The second library, "Toolkit L", had an interface like:
typedef struct tag_Object{ ... } Object;
int CreateObject(Object* obj); /* returns an error code, initializes struct */
int Foo(Object* obj, ...);
int Bar(Object* obj, ...);
int DestroyObject(Object* obj);
In this case, the memory for the object is allocated by the caller and that struct is passed in to every call (like the CriticalSection functions). In this particular case the Object struct contains pointers to memory that could be allocated by the library, but this isn't generally the case.
There are pros and cons to each design; I'm not going to declare one superior to the other in all cases. However, what I realized (this is kind of obvious if you've thought about it) is that the first library must have a similar struct defined internally, and has an internal data structure that maps int handles to it. Also, this data structure has to have some kind of thread-safe synchronization around it which causes an unavoidable performance penalty in a multi-threaded environment that the other library avoids completely. No matter how access is controlled, there are use cases where your design requires a performance hit (above and beyond the extra level of indirection). The benchmarks showed what I knew they would: Library L made much better use of my test machine's second core.
They've been talking about the "multicore revolution" for years, and we're finally getting to the point where odds are that your code is running on a multi-core machine. You can't always take advantage of it - especially as a library author - but you have to do what you can to not get in the way of those who can take advantage.
There are only a handful of ways to create and manipulate "objects" with a C library. The first library, "Toolkit S", had an interface like this:
int CreateObject(...); /* returns a handle or a reserved value to indicate failure */
int Foo(int handle, ...);
int Bar(int handle, ...);
int DestroyObject(int handle);
That is, all required allocations are done internally, and clients interact via a numeric id. This is similar to how _open works. The second library, "Toolkit L", had an interface like:
typedef struct tag_Object{ ... } Object;
int CreateObject(Object* obj); /* returns an error code, initializes struct */
int Foo(Object* obj, ...);
int Bar(Object* obj, ...);
int DestroyObject(Object* obj);
In this case, the memory for the object is allocated by the caller and that struct is passed in to every call (like the CriticalSection functions). In this particular case the Object struct contains pointers to memory that could be allocated by the library, but this isn't generally the case.
There are pros and cons to each design; I'm not going to declare one superior to the other in all cases. However, what I realized (this is kind of obvious if you've thought about it) is that the first library must have a similar struct defined internally, and has an internal data structure that maps int handles to it. Also, this data structure has to have some kind of thread-safe synchronization around it which causes an unavoidable performance penalty in a multi-threaded environment that the other library avoids completely. No matter how access is controlled, there are use cases where your design requires a performance hit (above and beyond the extra level of indirection). The benchmarks showed what I knew they would: Library L made much better use of my test machine's second core.
They've been talking about the "multicore revolution" for years, and we're finally getting to the point where odds are that your code is running on a multi-core machine. You can't always take advantage of it - especially as a library author - but you have to do what you can to not get in the way of those who can take advantage.
Monday, April 7, 2008
Final Four Probabilities
For the first time in history, the teams in the Final Four this year are all number 1 seeds. I've been hearing about this for a while, and began to wonder... Should my reaction be "it's about time" or "wow, that's remarkable"? So I went after the numbers to try to figure it out.
The first thing you realize is that the NCAA likes to change formats pretty regularly. The current format (dating back to 2002) starts with 65 teams; the previous 16 seasons featured a 64-team tournament with the same overall structure. These (1985 - 2007) are the tournaments whose results I used. Here's what I found:
So, if we take history as a guide, we would put the probability of all four #1 seeds winning their regional brackets at just under 3% [.41 * .41 * .41 * .41], so it should happen once every 34 years, on average. This is the 24th year of the current format, so it happened a little sooner than we would have expected.
An interesting question is, "what is the most likely distribution of seeds in a Final Four?" Obviously, a 1 seed is the favorite to win its regional group, but other outcomes are more likely than all of them winning. Three 1 seeds and a 2 seed would be twice as likely (about 6.4%, once every 15.5 years); it happened in 1993. Close behind that, we would expect two 1 seeds, a 2 seed and a 3 seed about every 16.4 years; this was the case in 1991 and 2001.
Two years ago, no number one seeds made the Final Four (this was not quite a first - it happened in 1980 during a 48-team tournament). This is actually a relatively likely situation - you would expect it once every 12 or so years.
To close out, let's look at outcomes that we would expect more frequently, and see how they compare with history:
Note that the above outcomes are mutually exclusive (but not exhaustive - I mentioned the other cases above). In particular, I'm surprised by the frequency with which only one 1 seed makes it to the Final Four. I wonder what kind of odds I can get on that next year...
The first thing you realize is that the NCAA likes to change formats pretty regularly. The current format (dating back to 2002) starts with 65 teams; the previous 16 seasons featured a 64-team tournament with the same overall structure. These (1985 - 2007) are the tournaments whose results I used. Here's what I found:
Seed | Region Wins | Percent |
1 | 38 | 0.41 |
2 | 21 | 0.23 |
3 | 12 | 0.13 |
4 | 9 | 0.1 |
5 | 4 | 0.04 |
6 | 3 | 0.03 |
8 | 3 | 0.03 |
11 | 2 | 0.02 |
So, if we take history as a guide, we would put the probability of all four #1 seeds winning their regional brackets at just under 3% [.41 * .41 * .41 * .41], so it should happen once every 34 years, on average. This is the 24th year of the current format, so it happened a little sooner than we would have expected.
An interesting question is, "what is the most likely distribution of seeds in a Final Four?" Obviously, a 1 seed is the favorite to win its regional group, but other outcomes are more likely than all of them winning. Three 1 seeds and a 2 seed would be twice as likely (about 6.4%, once every 15.5 years); it happened in 1993. Close behind that, we would expect two 1 seeds, a 2 seed and a 3 seed about every 16.4 years; this was the case in 1991 and 2001.
Two years ago, no number one seeds made the Final Four (this was not quite a first - it happened in 1980 during a 48-team tournament). This is actually a relatively likely situation - you would expect it once every 12 or so years.
To close out, let's look at outcomes that we would expect more frequently, and see how they compare with history:
Final Four Seeds | Probability | Expected Time (Years) | Occurrences | Observed Rate |
3 #1's | 0.1654 | 6.04 | 3 | 0.13 |
2 #1's | 0.3527 | 2.84 | 10 | 0.43 |
1 #1 | 0.3341 | 2.99 | 9 | 0.39 |
Note that the above outcomes are mutually exclusive (but not exhaustive - I mentioned the other cases above). In particular, I'm surprised by the frequency with which only one 1 seed makes it to the Final Four. I wonder what kind of odds I can get on that next year...
Subscribe to:
Posts (Atom)