Monday, September 10, 2007

Multithreaded testing

Karl Seguin wants Better multithreading support needed in .NET. Yeah, I miss Java's thread-safe collections and fancy synchronization primitives too. I sometimes feel like my working vocabulary is being stunted for lack of a proper built-in CyclicBarrier class. I'm somewhat tired of reimplementing these kinds of things.

But I'd like to focus on the last thing Karl said:

At the top of my wish list though is support for multithreading testing. Multithreaded applications might be hard to write, but they are a nightmare to test. It's easy enough to run multiple threads over your code, but I haven't yet seen (which doesn't mean it doesn't exist), something that lets you transparently test specific race and deadlock conditions. Was a waiting thread properly signaled? Was a deadlock properly resolved? So far my approach has been simply to put huge loads on the application and hope problems surface.

Developing and testing concurrently accessed shared objects is hard! That's why CSP was invented. Just the same, the difficulties can't be swept under the rug.


In these situations we really want to be looking for race hazards and deadlocks. We want to find possible interleaved execution paths that cause the contract of the shared object to be violated. It would be helpful to have a semi-automated tool to search for these cases. (Particularly if it also knew about asynchronous interruptions like ThreadAbortedException.) However, to guide the analysis we probably need a way to express the desired non-deterministic behavior as part of the shared object's contract. I know this kind of work has been done many times over using theorem provers over process calculi. I've yet to see one applied to ordinary code outside of a research environment.


But there's more to it than that. Besides correctness we are often also interested in performance. After all, since we decided to optimize throughput or bandwidth through increased parallelism we should really get some measure of success. Was the added complexity worth it? How do we know our fancy concurrently accessed shared object won't introduce an insurmountable bottleneck despite our efforts? It is hard to create a realistic load profile for synthetic benchmarks. Operations involving lots of I/O with external systems are especially nasty.

Ultimately I think the hardest part of performance tuning is the interpretation of the results. Moreover, performance optimization is a dynamic feedback process. As the system evolves previous observations become invalid. The best tuned algorithm at one time can suddenly lose its edge as the underlying platform evolves.


I generally avoid premature optimization during development. Quite often what happens is that the performance of the simplest possible design is in fact quite acceptable. It's much easier to verify the correctness of a simpler implementation!

As for optimization, instead of using a raw performance metric like CPU utilization, I think in terms of scalability metrics. "Will this process handle the current 100 transactions per second? Will it need to be scrapped to handle the anticipated 1000 transactions per second?" Reframing the problem helps me avoid wasting effort on useless micro-optimizations. It keeps me focused on choosing a good algorithm first. Once that's done I'll start looking at cheap low-level improvements if the cost/benefit ratio is good enough.

Writing it down

As if deciding what to test and how wasn't hard enough, actually implementing a multi-threaded test is a real bear. You need to set up threads, catch unhandled exceptions, introduce synchronization barriers, handle non-deterministic events, tear it down, and somehow yield diagnostic output regarding the sequence of actions that actually took place so you can figure out what went wrong. Phew! I dread it every time.

Fortunately it doesn't have to be that hard. A couple of months ago I wrote down an idea for an MbUnit extension for "agent-driven testing" to help with all of this bookkeeping. The idea is think of each concurrent process that participates in a test as an agent. A test can spawn any number of agents of various kinds that each run in their own thread. If an agent terminates abnormally, the entire test including its agents is aborted with a failure result. Therefore agents can write to the execution log and evaluate assertions concurrently. Moreover, agents can access the shared state of the fixture. They can also communicate and synchronize via signals, condition variables and barriers.

What distinguishes this idea from other great multi-threaded test helpers out there is that it will leverage more of the test framework and consequently will be much more expressive. I'll say more about this later...

P.S. I made up the term "agent-driven testing". I'm not quite satisfied with the name. Surely someone else has come up with a better one.

P.P.S. I wish we could easily do run-time instrumentation of code in .Net. It'd be exceedingly cool to build something like ConTest. I have been thinking about using Mono.Cecil to do some interesting things at runtime though.

No comments: