Sunday, July 15, 2007

Scalable Test Selection

A few weeks ago I gave a presentation at the Google Scalability Conference on the idea of Scalable Test Selection (video can be found here). Now that the video is up, I thought I'd share with you all this idea developed within my team. I must preface this by saying that this was not my idea -- the credit belongs to Amit and Fiona.

As Cem Kaner has said before me, there is never enough time to do all the testing. Time, in essence, is your scarce resource that must be allocated intelligently. There are many test strategies one can choose from when testing a product, however the question must be raised: how do you know whether you're testing the right things, given the amount of time you have to test? This is an impossible question to answer, however there are situations where the question is pertinent. For instance: when a change is introduced to your product at the last minute before release, and you have to decide what to test, how do you choose? There are probably some intelligent guesses you can make on what to test based on what has changed, but how can you know that this change hasn't broken a distant dependency?

This is the situation where we believe Scalable Test Selection can help. Given a source code change, what test cases are associated with that source code? Essentially, how can we link test cases to source code using test artifacts?

We have identified (and presented on) three ways to associate test cases with source code:
  • Requirements: If source code is checked in to satisfy requirements, and test cases are checked in to satisfy requirements, then a connection can be made.
  • Defects: A test case fails, which then is associated with a new defect. When source code is checked in to fix that defect, an association is made between the code and the test case
  • Build Correlation: For a single build, you can associate a set of source code changes with a set of test case failures. Now iterate that over successive builds, and you have a large set of source code and test case associations
With all this data that you can use to associate source code to test cases, when future source code changes are checked in, a tool can be written that can find all the test cases that are associated with that source code. In the case that you're in a time-crunched situation, you can have another source that suggests what test cases you should run in your limited amount of time.

What we're doing is applying simple data-mining techniques to the test data. There is much more to the idea that I'm not talking about (prioritization of test cases, implementation ideas, etc), however I hope you get the jist. I fully recommend you watch the video if this topic interests you, and feel free to email me if you want the slides :).

No comments: