Duplicate code detection allows you to find code that has been generated by Copy/Paste programming. Duplicate code typically leads to higher maintainance cost because bugs will need to be fixed twice, more code needs to be tested, etc.
There are many trade-offs when writing a duplicate code detection tool. Some of the conflicting goals are:
The check provided here, StrictDuplicateCode, is fast enough to facilitate checking very large code bases in acceptable time (minutes). It consumes very little memory, false alarms are impossible. While it supports multiple languages it does not support fuzzy matches (that's why it's called Strict).
Note that there are brilliant commercial implementations of duplicate code detection tools. One that is particularly noteworthy is Simian from RedHill Consulting, Inc. Simian has managed to find a very good balance of the above tradeoffs. It is superior to the checks in this package in many repects. Simian is reasonably priced (free for noncommercial projects) and includes a Checkstyle plugin.
The following table summarizes the characteristics of the available Checkstyle plugins for duplicate code detection:
Name | Speed | Memory Usage | False Alarms | Supported languages | Fuzzy matches |
---|---|---|---|---|---|
StrictDuplicateCode | High | Very Low | Impossible | any language | No |
Simian | Very high | Low | Possible but very unlikely | many languages, including Java and C/C++/C# | Limited support |
We encourage all users of Checkstyle to evaluate Simian as an alternative to the Checks we offer in our distribution.
Performs a line-by-line comparison of all code lines and reports duplicate code if a sequence of lines differs only in indentation. All import statements in Java code are ignored, any other line - including javadoc, whitespace lines between methods, etc. - is considered (which is why the check is called strict).
name | description | type | default value |
---|---|---|---|
min | how many lines must be equal to be considered a duplicate | int | 12 |
charset | name of the file charset | String | System property "file.encoding" |
To configure the check:
<module name="StrictDuplicateCode"/> |
||
To configure the check so that it allows larger equivalent blocks:
<module name="StrictDuplicateCode"> <property name="min" value="15"/> </module> |
||
To configure the check so that it handles files with the UTF-8 charset:
<module name="StrictDuplicateCode"> <property name="charset" value="UTF-8"/> </module> |
||
com.puppycrawl.tools.checkstyle.checks.duplicates