A vs. I Graphs

This is a quick analysis of A vs. I graphs for some of the software used in Dependency Finder. I took these graphs from Design Principles and Design Patterns by Robert C. Martin of Object Mentor, Inc. The idea is to plot the degree of abstractness of a package against how hard it is to change that package.

The X-axis charts instability, or a package's coupling. A low value mean that many other packages use the given package. Making changes to it could be hard if they must all be changed too. A high value means that only a few packages refer to it, if any. Making changes to it will be easy since they will have nowhere to trickle to.

The Y-axis charts abstractness, or a package's ratio of interfaces and abstract classes to concrete classes. A low values mean that there is very little abstraction. The classes are mostly concrete and the slightest modification in behavior requires to modify them. A high value means that the package is mostly interfaces that do not code any behavior but only protocol, which is much less subject to change.

Typically, you want to do two things. You want to group interfaces in abstract package that control access to a component; they are the ones that everybody refers to, and therefore A=1 and I=0. Then, you want each component to have implementation packages hidden behind the interface package, that nobody should refer to, yielding A=0 and I=1.

In real life, packages are somewhere in between, hopefully on a straight line between the two extremes. Robert C. Martin calls this line the main sequence, in reference to astronomy, and D' is the normalized distance between a given package and the main sequence. I can easily compute it as D' = A + I - 1. A low value means that the package is concrete (and subject to change) but hard to change because of the number of external references to it. A high value means that the package is highly abstract but not much used.

I collected the values for D' using OOMetrics with the Martin configuration. I then used a special XSL stylesheet to extract the values from the XML output and fed them to MS Excel. I used a simple Perl script to compute the value distributions. It rounded each value of D' to the nearest 0.1 increment and then couunted how many points fell on each of the 21 target values. I normalized the values by dividing each count by the total number of packages in a given piece of software so that we can compare the distributions regardless of the size of the software under inspection.

E.g.,

OOMetrics -configuration etc\MartinConfig.xml -csv -groups -out depfind lib\DependencyFinder.jar
OOMetrics -configuration etc\MartinConfig.xml -xml -groups -out depfind lib\DependencyFinder.jar

OOMetrics -configuration etc\MartinConfig.xml -csv -groups -out oro     lib\jakarta-oro.jar
OOMetrics -configuration etc\MartinConfig.xml -xml -groups -out oro     lib\jakarta-oro.jar

OOMetrics -configuration etc\MartinConfig.xml -csv -groups -out log4j   lib\log4j.jar
OOMetrics -configuration etc\MartinConfig.xml -xml -groups -out log4j   lib\log4j.jar

OOMetrics -configuration etc\MartinConfig.xml -csv -groups -out xerces  lib\xmlParserAPIs.jar lib\xercesImpl.jar
OOMetrics -configuration etc\MartinConfig.xml -xml -groups -out xerces  lib\xmlParserAPIs.jar lib\xercesImpl.jar

OOMetrics -configuration etc\MartinConfig.xml -csv -groups -out xalan   lib\xml-apis.jar      lib\xalan.jar
OOMetrics -configuration etc\MartinConfig.xml -xml -groups -out xalan   lib\xml-apis.jar      lib\xalan.jar

The graphs below on the left show the A vs. I plottings. The center graphs show the values of D' in sorted order. The graphs on the right show the distribution of values of D' across the range -1 to 1.

The goal, here, is to keep the packages close to the main sequence and keep values of D' as close to zero as possible. So from the center graphs, this means keeping the extreme values close to the central axis. This will translate into a large spike at 0, and short wings to each side, on the distribution graph. Jakarta-ORO and Log4J are good examples of this.

A vs. I Graph D' Distribution of D'
Dependency Finder A vs. I graph for Dependency Finder Values of D' for Dependency Finder Distribution of D' for Dependency Finder
Jakarta-ORO A vs. I graph for Jakarta-ORO Values of D' for Jakarta-ORO Distribution of D' for Jakarta-ORO
Log4J A vs. I graph for Log4J Values of D' for Log4J Distribution of D' for Log4J
Xerces A vs. I graph for Xerces Values of D' for Xerces Distribution of D' for Xerces
Xalan A vs. I graph for Xalan Values of D' for Xalan Distribution of D' for Xalan

Finally, here is a summary view of all five codebases. The thin line shows the full range of D' for that codebase. The large rectangle is centered on the statistical mean and spreads each way by one standard deviation. Statistically, 65% of all values should fall within the square. The column below each range shows how many packages are in each codebase.

Final lesson: I guess Dependency Finder is not faring too badly. But for something so small, it could still have done better. :-)