Documentation

The Quick Reference Guide gives a quick overview of options and TeXcount instructions, whereas the full TeXcount Documentation gives more details and a thorough explanation on how to use and customise TeXcount. There is also a Technical documentation intended for anyone who wishes to dive into the Perl code or TeXcount.

You may change the behaviour of TeXcount by giving command line options when you run TeXcount, or by adding TeXcount instructions within the documents to specify which parts should be counted and add handling rules for macros and groups.

Command line options

The syntax for running TeXcount is
texcount.pl [options] file(s)
where options start with a dash and the files are all valid TeX or LaTeX files.

-v
Verbose (same as -v3).
-v0
No details (default).
-v1
Prints counted text, marks formulae.
-v2
Also prints ignored text.
-v3
Also includes comments and options.
-v4
Same as -v3 -showstate.
-v=..., -v[[0-4]]...
Allows detailed control of which elements are included in the verbose output. The provided values is a list of styles or style categories separated by + or - to indicate if they should be added or removed from the list of included styles. Style categories start with capital letter and include Words, Macros, Options; the individual styles are in all lower case and include word, hword, option, ignore.
-showstate
Show internal states (with verbose).
-brief
Only prints a one line summary of the counts for each file.
-q, -quiet
Quiet mode, does not print error messages. Use is discouraged, but it may be useful when piping the output into another application.
-strict
Prints a warning of begin-end groups for which no specific rule is defined.
-total
Only give total sum, no per file sums.
-1
Same as specifying -brief and -total, and ensures there will only be one line of output. If used with -sum, the output will only be the total number.
-0
Same as -1, i.e. -brief and -total, but does not put a line shift at the end. This may be useful when the one line output is to be used by another application, e.g. Emacs, for which the line shift would otherwise need to be stripped away.
-template="..."
Specify an output template which is used to generate the summary output for each file and for the total count. Codes {label} is used to include values, where label is the counter keywords (word, headerword, etc or counter number 0 to 7), an arithmetic expression combining multiple counter keywords, SUM, ERROR or TITLE. Conditional inclusion is done using {label?text?label} or {label?if non-zero|if zero?label}. If the count contains at least two subcounts, use {SUB?template?SUB} with a separate template for the subcounts, or {SUB?prefix|template|suffix?SUB}.
-sub[=...], -subcount[=...]
Generate subcounts. Valid option values are none, part, chapter, section and subsection (default), indicating at which level subcounts are generated. (On by default.)
-nosub
Do not generate subcounts.
-sum[=n,n,...]
Produces total sum, default being all words and formulae, but customisable to any weighted sum of the seven counts (list of weights for text words, header words, caption words, headers, floats, inlined formulae, displayed formulae).
-nosum
Do not generate total sum. (Default choice.)
-col
Use ANSI colour codes in verbose output. This requires ANSI colours which is used on Linux, but may not be available under Windows. On by default on non-Windows systems.
-nc, -nocol
No colours (colours require ANSI). Default under Windows.
-nosep, -noseparator
No separating character/string added after each word in the verbose output (default).
-sep=, -separator=
Separating character or string to be added after each word in the verbose output.
-relaxed
Relaxes the rules for matching words and macro options.
-restricted
Restricts the rules for matching words and macro options.
-
Read LaTeX code from STDIN.
-inc
Parse included files (as separate files).
-merge
Merge included files into document (in place).
-noinc
Do not parse or merge in included files (default).
-incbib
Include bibliography in count, include bbl file if needed.
-nobib
Do not include bibliography in count (default).
-incpackage=
Include rules for a given package.
-dir[=...]
Specify working directory which will serve as root for all include files. The default (-dir=.) is to use the current directory, i.e. from which TeXcount{} is executed: the path can be absolute or relative to the current directory. Use -dir to use the path of the main LaTeX document as working directory.
-auxdir[=...]
Specify the directory of the auxiliary files, e.g. the bibliography (.bbl) file. The default setting (-auxdir only) indicates that auxiliary files are in the working directory (from the -dir or -dir= option). If -auxdir= is used to provide a path and -dir= is used to specify the working directory, the path to the auxiliary directory is take to be relative to the current folder (from which TeXcount{} is executed); if used with -dir, the working directory is taken from the path of the parsed file, and the auxiliary directory is taken to be relative to this (unless an absolute path is provided).
-enc=, -encoding=
Specify encoding to use in input (and text output).
-utf8, -unicode
Use UTF-8 (Unicode) encoding. Same as -encoding=utf8.
-alpha=, -alphabets=
List of Unicode character groups (or digit, alphabetic) permitted as letters. Names are separated by , or +. If list starts with +, the alphabets will be added to those already included. The default is Digit+alphabetic.
-logo=, -logograms=
List of Unicode character groups interpreted as whole word characters, e.g. Han for Chinese characters. Names are separated by , or +. If list starts with +, the alphabets will be added to those already included. By default, this is set to include Ideographic, Katakana, Hiragana, Thai and Lao.
-ch, -chinese, -zhongwen
Turn on Chinese mode in which Chinese characters are counted. I recommend using UTF-8, although TeXcount{} will also test other encodings (GB2312, Big5, Hz) if UTF-8 fails, and other encodings may be specified by -encoding=.
-jp, -japanese
Turn on Japanese mode in which Japanese characters (Kanji and Kana) are counted. I recommend using UTF-8, although TeXcount{} will also test other encodings (e.g. EUC-JP) if UTF-8 fails, and other encodings may be specified by -encoding=.
-kr, -korean
Turn on Korean mode in which Korean characters (Hangul and Han) are counted. I recommend using UTF-8, although TeXcount{} will also test other encodings (e.g. EUC-KR) if UTF-8 fails, and other encodings may be specified by -encoding=.
-kr-words, -korean-words
Korean mode in which Hangul words are counted (i.e. as words separated by spaces) rather than characters. Han characters are still counted as characters. See also -korean.
-chinese-only, ..., -korean-words-only
As options -chinese, ..., -korean-words, but also excludes other alphabets (e.g. letter-based words) and logographic characters.
-count-all, -all
Allow all alphabets, digits, and logograms as letters, whether counting words or characters. This is the default setting.
-char, -letter
Count letters instead of words. This count does not include spaces.
-char-only, -letter-only
Count letters instead of words, but excludes logograms (like chinese characters). This count does not include spaces.
-all-nonspace-char, -all-nonspace-characters
Counts characters instead of words, including letters, logograms, and punctuation, but not including spaces.
-out=
Send output to file. Takes file name as value.
-out-stderr
Send output to STDERR instead of STDOUT.
-html
Output in HTML format.
-htmlcore
Only HTML body contents.
-htmlfile=
File containing a template HTML document with included somewhere to indicate the location where the TeXcount output from the parsing should be inserted.
-tex
Encode TeX special characters for output into TeX code.
-css=
Reference to CSS to be included in the HTML output instead of including the style definition directly in the output.
-cssfile=, -css=file:
File containing style definitions to be included into the HTML output instead of the default styles.
-freq[=#]
Count individual word frequencies. Optionally, give minimal frequency required to be included in output.
-stat
Produce statistics on language usage, i.e. based on the alphabets and logograms included.
-macrostat, -macrofreq
Produce statistics on package, environment and macro usage.
-codes
Display an overview of the colour codes. Can be used as a separate option to only display the colour codes, or together with files to parse.
-nocodes
Do not display overview of colour codes.
-topcodes, -bottomcodes
Determines if overview of colour codes should be placed at the top or at the bottom (default).
-opt=, -optionfile=
Reads options (command line parameters) from a specified text file. Should use one option per line. May also include TC options in the same format as specified in LaTeX documents, but prefixed by % rather than %TC:. Blank lines and lines starting with # are ignored; lines starting with \ are considered to be continuations of the previous line.
-split, -nosplit
The -split option, which is on by default, speeds up handling of large files by splitting the file into paragraphs. To turn it off, use the -nosplit option.
-showver, -nover
Include version number in output with -showver; use -nover not to show it (default).
-h, -?, --help, /?
Help.
-h=, -?=, --help=, /?=
Help on particular macro or group name: gives the parsing rule for that macro or group if defined. If the the macro or environment is package specific, use -h=package:name; replace package with class%name if it is specific to a document class.
-help-options, -h-opt
Lists all TeXcount options and help on them.
--help-option=, -h-opt=
Lists all TeXcount options containing the provided string: e.g. -h-opt=inc lists all options containing inc, while -h-opt=-v lists all options starting with v.
-help-style, -h-style
Lists all styles and style categories, i.e. those permitted used in -v={styles-list}.
-help-style=, -h-style=
Gives description of style or style category.
-help-all-rules
List all rules (except some hardcoded ones), including package specific rules.
-ver, --version
Print version number.
-lic, --license
License information.

TeXcount instructions

These are instructions that you can leave for TeXcount within the LaTeX document. The syntax for doing so is in general
%TC:instruction [name] [parameters]
where name is used by instructions that give macro/group handling rules. TeX/LaTeX will ignore these as comments because of the % at the start.

There are a few general TeXcount instructions:

break title
Break point which initiates a new subcount. The title is used to identify the following region in the summary output.
incbib or includebibliography
Sets bibliography inclusion, same as running TeXcount{} with the option -incbib.
subst macro text
This substitutes a macro with any text. The verbose output will show the substituted text: e.g. %TC:subst \test TEST will cause a following \newcommand\test{TEST} to be changed into \newcommand TEST{TEST}, which TeXcount{} will interpret differently. Use with care!
ignore
Indicates start of a region to be ignored. End region with %TC:endignore.
insert TeX-code
Insert TeX code for TeXcount{} to process.
newcounter name [description]
Define a new counter with the given name and description (optional). A corresponding parsing rule will also be added with the same name.
newtemplate and template [template-line]
Specify a template for the summary output. The first line should just declare a new template using %TC:newtemplate, while the subsequent lines use %TC:template followed by text specifying the template. The line breaks in the template specification are not of importance: to specify a line break, use \n.
usepackage [package(s)]
This will include the TeXcount{} rules for the specified package(s). TeXcount{} does this automatically when it encounters \usepackage.

Specifying macro handling rules

The syntax for specifying macro handling rules is
%TC:instruction name parameters
where the instruction is one of the below, the name is the macro name (starting with \) or group name, and parameters depend on the instruction.

macro macroname parameter-rules
Defines macro handling rule for the specified macro. The parameter is on the form [rule,...] where each rule is either a keyword indicating the parsing rule for a macro parameter or option:rule for an optional []-enclosed parameter. Alternatively, an integer value n indicates that the n first parameters to the macro should be ignored, equivalent to giving a list of n ignore rules.
envir envirname parameter-rules content-rule
(The previously used command, group, remains an alias for envir, but the name envir is more appropriate and therefore recommended.) This specifies the handling of environments with the given name. The parameter handling rule, applied to parameters following \begin{name}, are specified as in the macro instruction. The second parameter specifies the rule, i.e. parser state, with which the contents should be parsed.
wordtype original-rule wordtype changed-rule
TeXcount{} can distinguish between number, mixed, and nonum: numbers on various formats, words that includes digits, and words that do not contain any digits. The wordtype specification indicates that words of a given wordtype which were originally counted using original-rule, should instead be counted using changed-rule. NB: Experimental feature, may change!
macrocount macroname [count-spec.]
(An alias for macrocount is macroword; the preferred name was changed to reflect that this can count any element, not just words.) If a number is provided as the count parameter, this defines the given macro to be counted as the specified number of words; if no count is specified, it is assumed to be 1. Alternatively, a []-enclosed list of counters can be specified (using the counter keywords), causing each of them to be incremented: counter are word/text, headerword, otherword, header, float, inlinemath, displaymath plus a number of aliases.
breakmacro macroname
Specify that the given macro should cause a break point.
floatinclude macroname parameter-rules
Specify macro handling rules used within float groups. The handling rules are specified as for macro. Most commonly, the parameter rule will be the otherword/oword to specify that words should be counted as other words.
preambleinclude macroname parameter-rules
Specifies macro handling rules to be used in the preamble: the text between \documentclass and \begin{document}. The rule is specified like the macro rules.
fileinclude macroname file-path-spec.
Specifies macros that cause files to be included when TeXcount{} is run with the -inc option. The parameters specify the format on which the file path is specified, and can also be used to modify the search path used within the included document.

Rule types

For macros and environments, the rules with which parameters and environment contents are processed are specified by a key words, of which there may be several synonymous. Previously they were identified by numerical codes, and these are also provided. Note that there are three somewhat different types of rules. The most basic rules indicate how words are counted. Then there are transitional rules which may count an item, eg a header, and then transition to another rule, eg the rule for counting words in headers. And finally there are some rules for ignoring text, which range from a simple ignore which parses the code without counting any words, to stronger exclude rules which may be used to prevent TeXcount from parsing certain elments.
Text (keys: text, word, wd, w; formerly code 1)
Count as text (i.e. count words).
Header text (keys: headertext, headerword, hword, hwd, hw; formerly code 2)
Count as header text.
Other text (keys: otherword, other, oword, owd, ow; formerly code 3)
Count as float/caption text.
Displaymath (keys: displaymath, dsmath, dmath, ds; formerly code 7)
Count as displayed math formulae.
Inline math (keys: inlinemath, inline, imath, eq; formerly code 6)
Count as inlined math formulae.
To header (keys: header, heading, head; formerly code 4)
Count header, then count text as headertext (transition state).
To float (keys: float, table, figure; formerly code 5)
Count float, then parse contents as isfloat (transition state).
Preamble (keys: ; formerly code -9)
Parse as preamble, i.e. ignore text but look for preambleinclude macros.
Ignore (keys: ignore; formerly code 0)
Ignore text, i.e. do not count, but will still parse the code.
Float (keys: isfloat; formerly code -1)
Float contents, ignore text but look for floatinclude macros.
Strong exclude (keys: xx; formerly code -2)
Strong ignore which ignore environments, e.g. to use in macro definitions where \begin--\end need not be balanced.
Stronger exclude (keys: xxx; formerly code -3)
Stronger ignore, handles macros as isolated tokens without handling their parameters, to use with macro definitions like \newcommand and \def.
Exclude all (keys: xall; formerly code -4)
Ignore all, including unbalanced braces (e.g. used by %TC:ignore and the verbatim environment). This rule may be used for environment contents, but not for macro or environment parameters or options since the exclusion causes { and [ to be ignored.

Last modified December 31, 2018.