checkstyle – Writing Javadoc Checks

Content

Content
What is Javadoc comment
Overview
Difference between Java Grammar and Javadoc comments Grammar
Tools to see Javadoc tree structure
Token types
HTML Code In Javadoc Comments
Checkstyle SDK GUI
Access Java AST from Javadoc Check
Integrating new Javadoc Check

What is Javadoc comment

Javadoc comment is multiline comment that starts with * character and placed under class definition, interface definition, enum definition, method definition or field definition. The comment should be written in XHTML to be correctly processed by Checkstyle. This means that every HTML tag should have matching closed HTML tag or it is self-closed one (singlton tag). The only exceptions are <p>, <li>, <tr>, <td>, <th>, <body>, <colgroup>, <dd>, <dt>, <head>, <html>, <option>, <tbody>, <thead>, <tfoot> and Checkstyle won't show error about missing closing tag, however, it leads to broken XHTML structure and, therefore, incorrect Abstract Syntax Tree of the Javadoc comment anyway. See examples at "HTML Code In Javadoc Comments" chapter.

Overview

To start implementing your own Check create new class and extend AbstractJavadocCheck. It has two abstract methods:

getDefaultJavadocTokens() - return array of token types that your new Check requires to process (see "Token Types" section)
visitJavadocToken(DetailNode) - it's the place you should put tree nodes proccessing. The argument is Javadoc tree node of type you described before in getDefaultJavadocTokens() method.

Javadoc parser requires XHTML to be used in Javadoc comments, i.e. if there is some open tag(for example <div>) then there have to be its close tag </div>. This means that if Javadoc comment has incorrect XHTML structure then Javadoc Parser will fail processing the comment, therefore, your new Check can't get its parse tree and process anything from this Javadoc comment. For more details and examples go to "HTML code in Javadoc comments" section.

Difference between Java Grammar and Javadoc comments Grammar

Java grammar parses java file due to language specifications. So, there are singleline comments and multiline/block comments in it. Java compiler doesn't know about Javadoc because it is just a multiline comment. To parse multiline comment as a Javadoc comment, checkstyle has second grammar - Javadoc grammar. So, it's supposed to proccess block comments and parse them to Abstract Syntax Tree. The problem is that Java grammar is old one and uses ANTLR v2, while Javadoc grammar uses ANTLR v4. Because of that, these two grammars and their trees are not compatible. Java AST consists of DetailAST objects, while Javadoc AST consists of DetailNode objects.

Tools to see Javadoc tree structure

Checkstyle can print Abstract Syntax Tree including Javadoc trees. You need to run checkstyle jar file with -J argument, providing java file.

For example, here is java file:

/**
 * My <b>class</b>.
 * @see AbstractClass
 */
public class MyClass {

}

Command:

java -jar checkstyle-6.18-all.jar -J MyClass.java

Output:

CLASS_DEF -> CLASS_DEF [5:0]
|--MODIFIERS -> MODIFIERS [5:0]
|   |--JAVADOC -> \r\n * My <b>class</b>.\r\n * @see AbstractClass\r\n <EOF> [1:0]
|   |   |--NEWLINE -> \r\n [1:0]
|   |   |--LEADING_ASTERISK ->  * [2:0]
|   |   |--TEXT ->  My  [2:2]
|   |   |   |--WS ->   [2:2]
|   |   |   |--CHAR -> M [2:3]
|   |   |   |--CHAR -> y [2:4]
|   |   |   `--WS ->   [2:5]
|   |   |--HTML_ELEMENT -> <b>class</b> [2:6]
|   |   |   `--HTML_TAG -> <b>class</b> [2:6]
|   |   |       |--HTML_ELEMENT_OPEN -> <b> [2:6]
|   |   |       |   |--OPEN -> < [2:6]
|   |   |       |   |--HTML_TAG_NAME -> b [2:7]
|   |   |       |   `--CLOSE -> > [2:8]
|   |   |       |--TEXT -> class [2:9]
|   |   |       |   |--CHAR -> c [2:9]
|   |   |       |   |--CHAR -> l [2:10]
|   |   |       |   |--CHAR -> a [2:11]
|   |   |       |   |--CHAR -> s [2:12]
|   |   |       |   `--CHAR -> s [2:13]
|   |   |       `--HTML_ELEMENT_CLOSE -> </b> [2:14]
|   |   |           |--OPEN -> < [2:14]
|   |   |           |--SLASH -> / [2:15]
|   |   |           |--HTML_TAG_NAME -> b [2:16]
|   |   |           `--CLOSE -> > [2:17]
|   |   |--TEXT -> . [2:18]
|   |   |   `--CHAR -> . [2:18]
|   |   |--NEWLINE -> \r\n [2:19]
|   |   |--LEADING_ASTERISK ->  * [3:0]
|   |   |--WS ->   [3:2]
|   |   |--JAVADOC_TAG -> @see AbstractClass\r\n  [3:3]
|   |   |   |--SEE_LITERAL -> @see [3:3]
|   |   |   |--WS ->   [3:7]
|   |   |   |--REFERENCE -> AbstractClass [3:8]
|   |   |   |   `--CLASS -> AbstractClass [3:8]
|   |   |   |--NEWLINE -> \r\n [3:21]
|   |   |   `--WS ->   [4:0]
|   |   `--EOF -> <EOF> [4:1]
|   `--LITERAL_PUBLIC -> public [5:0]
|--LITERAL_CLASS -> class [5:7]
|--IDENT -> MyClass [5:13]
`--OBJBLOCK -> OBJBLOCK [5:21]
    |--LCURLY -> { [5:21]
    `--RCURLY -> } [7:0]

As you see very small java file transforms to a huge Abstract Syntax Tree, because that is the most detailed tree including all components of the java file: classes, methods, comments, etc. But in most cases while developing Javadoc Check you need only parse tree of the exact Javadoc comment. To do that just copy Javadoc comment to separate file and remove /** at the begining and */ at the end. After that, run checkstyle with -j argument.

File:

 * My <b>class</b>.
 * @see AbstractClass

Command:

java -jar checkstyle-6.18-SNAPSHOT-all.jar -j MyJavadocComment.javadoc

Output:

JAVADOC ->  * My <b>class</b>.\r\n * @see AbstractClass<EOF> [0:0]
|--LEADING_ASTERISK ->  * [0:0]
|--TEXT ->  My  [0:2]
|   |--WS ->   [0:2]
|   |--CHAR -> M [0:3]
|   |--CHAR -> y [0:4]
|   `--WS ->   [0:5]
|--HTML_ELEMENT -> <b>class</b> [0:6]
|   `--HTML_TAG -> <b>class</b> [0:6]
|       |--HTML_ELEMENT_OPEN -> <b> [0:6]
|       |   |--OPEN -> < [0:6]
|       |   |--HTML_TAG_NAME -> b [0:7]
|       |   `--CLOSE -> > [0:8]
|       |--TEXT -> class [0:9]
|       |   |--CHAR -> c [0:9]
|       |   |--CHAR -> l [0:10]
|       |   |--CHAR -> a [0:11]
|       |   |--CHAR -> s [0:12]
|       |   `--CHAR -> s [0:13]
|       `--HTML_ELEMENT_CLOSE -> </b> [0:14]
|           |--OPEN -> < [0:14]
|           |--SLASH -> / [0:15]
|           |--HTML_TAG_NAME -> b [0:16]
|           `--CLOSE -> > [0:17]
|--TEXT -> . [0:18]
|   `--CHAR -> . [0:18]
|--NEWLINE -> \r\n [0:19]
|--LEADING_ASTERISK ->  * [1:0]
|--WS ->   [1:2]
|--JAVADOC_TAG -> @see AbstractClass [1:3]
|   |--SEE_LITERAL -> @see [1:3]
|   |--WS ->   [1:7]
|   `--REFERENCE -> AbstractClass [1:8]
|       `--CLASS -> AbstractClass [1:8]
`--EOF -> <EOF> [1:21]

Token types

HTML Code In Javadoc Comments

Examples: 1) Unclosed paragraph HTML tag. As you see in the tree, content of the paragraph tag is not nested to this tag. That is because HTML tags are not closed by pair tag </p>, and Checkstyle requires XHTML to correctly parse Javadoc comments.

<p> First
<p> Second

JAVADOC -> <p> First\r\n<p> Second<EOF> [0:0]
|--HTML_ELEMENT -> <p> [0:0]
|   `--P_TAG_OPEN -> <p> [0:0]
|       |--OPEN -> < [0:0]
|       |--P_HTML_TAG_NAME -> p [0:1]
|       `--CLOSE -> > [0:2]
|--TEXT ->  First [0:3]
|   |--WS ->   [0:3]
|   |--CHAR -> F [0:4]
|   |--CHAR -> i [0:5]
|   |--CHAR -> r [0:6]
|   |--CHAR -> s [0:7]
|   `--CHAR -> t [0:8]
|--NEWLINE -> \r\n [0:9]
|--HTML_ELEMENT -> <p> [1:0]
|   `--P_TAG_OPEN -> <p> [1:0]
|       |--OPEN -> < [1:0]
|       |--P_HTML_TAG_NAME -> p [1:1]
|       `--CLOSE -> > [1:2]
|--TEXT ->  Second [1:3]
|   |--WS ->   [1:3]
|   |--CHAR -> S [1:4]
|   |--CHAR -> e [1:5]
|   |--CHAR -> c [1:6]
|   |--CHAR -> o [1:7]
|   |--CHAR -> n [1:8]
|   `--CHAR -> d [1:9]
`--EOF -> <EOF> [1:10]

2) Here is correct version with open and closed HTML tags.

<p> First </p>
<p> Second </p>

JAVADOC -> <p> First </p>\r\n<p> Second </p><EOF> [0:0]
|--HTML_ELEMENT -> <p> First </p> [0:0]
|   `--PARAGRAPH -> <p> First </p> [0:0]
|       |--P_TAG_OPEN -> <p> [0:0]
|       |   |--OPEN -> < [0:0]
|       |   |--P_HTML_TAG_NAME -> p [0:1]
|       |   `--CLOSE -> > [0:2]
|       |--TEXT ->  First  [0:3]
|       |   |--WS ->   [0:3]
|       |   |--CHAR -> F [0:4]
|       |   |--CHAR -> i [0:5]
|       |   |--CHAR -> r [0:6]
|       |   |--CHAR -> s [0:7]
|       |   |--CHAR -> t [0:8]
|       |   `--WS ->   [0:9]
|       `--P_TAG_CLOSE -> </p> [0:10]
|           |--OPEN -> < [0:10]
|           |--SLASH -> / [0:11]
|           |--P_HTML_TAG_NAME -> p [0:12]
|           `--CLOSE -> > [0:13]
|--NEWLINE -> \r\n [0:14]
|--HTML_ELEMENT -> <p> Second </p> [1:0]
|   `--PARAGRAPH -> <p> Second </p> [1:0]
|       |--P_TAG_OPEN -> <p> [1:0]
|       |   |--OPEN -> < [1:0]
|       |   |--P_HTML_TAG_NAME -> p [1:1]
|       |   `--CLOSE -> > [1:2]
|       |--TEXT ->  Second  [1:3]
|       |   |--WS ->   [1:3]
|       |   |--CHAR -> S [1:4]
|       |   |--CHAR -> e [1:5]
|       |   |--CHAR -> c [1:6]
|       |   |--CHAR -> o [1:7]
|       |   |--CHAR -> n [1:8]
|       |   |--CHAR -> d [1:9]
|       |   `--WS ->   [1:10]
|       `--P_TAG_CLOSE -> </p> [1:11]
|           |--OPEN -> < [1:11]
|           |--SLASH -> / [1:12]
|           |--P_HTML_TAG_NAME -> p [1:13]
|           `--CLOSE -> > [1:14]
`--EOF -> <EOF> [1:15]

About

Documentation

Developers

Project Documentation