Wednesday, July 8, 2009

Data Flow and Static Analysis: Why, When, How

In this interview, Adam Kolawa—Parasoft CEO and co-founder—discusses why, when, and how to apply three different types of static source code analysis: static code analysis, data flow static analysis, and code metrics analysis. Read on to learn how static analysis can help your team ensure that code meets uniform expectations around security, reliability, performance, and maintainability—and how to get started as painlessly as possible.

What do you mean by “static analysis”?


I mean statically analyzing code to monitor whether it meets uniform expectations around security, reliability, performance, and maintainability. Done properly, this static code analysis provides a foundation for producing solid code by exposing structural errors and preventing entire classes of errors. At Parasoft, we’ve found that the most effective static analysis encompasses static code analysis, data flow static analysis, and code metrics analysis.

Let’s take a closer look at those three breeds of static analysis. First off, static code analysis. What is it and why is it valuable?


By static code analysis, I mean scanning the source code and checking whether it has patterns known to cause defects or impede reuse and agility. This involves monitoring compliance to coding standard rules—rules for preventing improper language usage, satisfying industry standards (MISRA, JSF, Ellemtel, etc.), and enforcing internal coding guidelines.

If you nip these issues in the bud by finding and fixing dangerous code as it is introduced, you significantly reduce the amount of testing and debugging required later on—when the difficulty and cost of dealing with each defect increases by over an order of magnitude.

Many categories of defects can be prevented in this manner, including defects related to memory leaks, resource leaks, and security vulnerabilities. In fact, simply using static code analysis to enforce proper input validation can prevent approximately 70% of the security problems cited by OWASP, the industry-leading security community.

What’s data flow static analysis and why is it valuable?


Data flow static analysis statically simulates application execution paths, which may cross multiple units, components, and files. It’s like testing without actually executing the code. It can automatically detect potential runtime errors such as resource leaks, NullPointerExceptions, SQL injections, and other security vulnerabilities. This enables early and effortless detection of critical runtime errors that might otherwise take weeks to find.

While static code analysis is an error prevention practice, data flow static analysis is an error-detection practice. Like all error-detection practices, it’s not 100% accurate and you can’t expect that it will uncover each and every bug lurking in your application.

The main difference between static code analysis and data flow static analysis is that with pattern-based static code analysis, you can absolutely guarantee that certain classes of defects will not occur as long as you find and fix the coding constructs known to cause these defects. With data flow static analysis, you are identifying defects that could actually occur when real application paths are exercised—not just dangerous coding constructs. But you have to realize that you will inevitably overlook some bugs, and might have a higher ratio of false positives than you encounter with static code analysis.

If data flow static analysis can’t find all the bugs, how do you automatically detect the remaining bugs?


***

To read more, download Parasoft's complete "Static Analysis Best Practices" paper as a PDF.

You can also access this paper at the Parasoft Resource Centers for:

No comments:

Post a Comment