TeXcount FAQ, utilities and tricks

Install and run

Setting up TeXcount

Unix/Linux/MacOSX

If TeXcount came along with your LaTeX installation, it may already be set up to run by using the command texcount. If you wish to upgrade this, you should preferably use the LaTeX upgrade tools. Alternatively, if you wish to upgrade with a version of TeXcount you have downloaded from this web page, you can locate where TeXcount is located by running which texcount. If for example it is located at /usr/bin/texcount, run ls -l /usr/bin/texcount to find out if it lies there as a file or as a link to a file that lies elsewhere. Once you know where the file is stored, you can replace it with the one you downloaded. The following examples locates TeXcount which is already installed, checks if it is a link, then copies the copy of TeXcount just downloaded (assumed to be texcount.pl in the present directory) to that location to replace the one already there.

$ which texcount
/usr/bin/texcount
$ ls -l /usr/bin/texcount
lrwxrwxrwx 1 root root 51 2010-05-02 13:38 /usr/bin/texcount -> ../share/texmf-texlive/scripts/texcount/texcount.pl
$ cp -f texcount.pl /usr/bin/share/texmf-texlive/scripts/texcount/texcount.pl

If you don't have TeXcount available already and have just downloaded it, you may want to rename it to texcount (rather than texcount.pl) and move it to a directory that is in your PATH. For example, checking if it is available, checking the PATH, and moving and renaming TeXcount (assumed to be texcount.pl in the present directory) may be done as follows:

$ which texcount
/usr/bin/texcount
$ echo $PATH
/home/einar/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games
$ mv texcount.pl /usr/bin/texcount
$ chmod +x /usr/bin/texcount

This will move texcount.pl to /usr/bin/texcount which is in the path, hence you run TeXcount just by typing texcount. The chmod command is used the make texcount executable.

You should now be ready to run TeXcount just by typing texcount on the command line. The format for running TeXcount:
$ texcount [options] file(s)

Windows

Under Windows, you may have to install Perl to run TeXcount unless it is already installed: I suggest ActivePerl which is free. In order to run TeXcount from anywhere, either the texcount.pl script must be in the PATH, or there must be either a batch script texcount.bat or an executable texcount.exe in the PATH which in turn runs texcount.pl. Alternatively, you must give the full path to texcount.pl in order to run it.

The syntax for running TeXcount is then
> texcount [options] file(s)
or texcount.pl in case there is no batch script or executable.

If you have TeXcount installed through TeX Live, MiKTeX, etc., and want to upgrade or modify the TeXcount script, you will have to locate the texcount.pl file. MiKTeX places the script itself under the scripts folder, and uses an executable texcount.exe located in the miktex\bin folder to run this.

Since I recommend viewing the verbose output, but Windows does not support ANSI colour codes required for this, you may want to make a batch file that calls TeXcount for you using the -html option and open this in a web browser. You may also register this batch script as one of the open with... alternatives.

How to run TeXcount

Unix/Linux/MacOSX

If you have TeXcount installed together with LaTeX, it may already be set up to run using the command texcount. Otherwise, make sure the TeXcount script is in your path and has been made executable. You may also want to rename the script, e.g. into texcount in case you want a simpler name: if the TeXcount script is in a file named texcount.pl or TeXcount.pl, you would have to use these exact names to run TeXcount.

Assuming TeXcount is named texcount and is in your path, run TeXcount by executing the command
$ texcount [options] file(s)
to perform the count. Parsing details will by output using ANSI colour codes. If TeXcount is not in you path, you will have to give the path, e.g.
$ ~/TeXcount/texcount [options] file(s)
if the location is ~/TeXcount/. Note that if you are in the same directory as TeXcount, even if it has been made executable, you may still have to prepend ./ to run it:
$ ./texcount [options] file(s)
In case of problems, recheck that you have set up TeXcount properly.

Windows

To run TeXcount on the command line, it needs to be in your path. You may want to rename it, e.g. into texcount.pl, in case you want a fixed name irrespective of version. Then run
> texcount.pl [options] file(s)
to make the word count. If you are using verbose output, e.g. -v, ANSI colour codes does not work (turned off by default, or by using -nc), which unfortunately makes the parsing details less informative: use the option -html
> texcount.pl -html [options] file(s) > filename.html
to save TeXcount output as HTML code which may be opened in a web parser. Alternatively, you may define a batch file opening TeXcount output in a browser. If you have such a batch file, e.g. texcount.bat, you can add this to the Open with menu available when right-clicking on a tex file.

How TeXcount works

What TeXcount counts?

TeXcount parses TeX/LaTeX documents, complete documents or partial documents that are included, recognises macros and environments and tries to identify which parts are text and which are TeX/LaTeX code. In addition, it counts the main text, text in headers and text in captions (figure captions, footnotes, etc.) separately. Math elements, i.e. formulae, are not counted as text but counted separately as either inlined formulae or displayed formulae.

How are macros handled?

When TeXcount encounters a macro, it has a rule (or uses a default rule if none is specified) which tells how many parameters the macro takes. TeXcount assumes parameters to be on the form {...}. TeXcount will then read the appropriate number of parameters and parse each of them. The rule may simply tell TeXcount to exclude them all from the count, or it may tell TeXcount how each of them should be analysed: the basic rules are to exclude the parameter, count it as text, count is a header text or count it as caption text.

In addition, many macros allow options on the form [...] which may come before, between or after the parameters. TeXcount simply assumes that these should all be excluded from the word count. In some cases, this may give the wrong result: e.g. \item[text]. However, in most cases, this rule works fairly well.

If parameters are not encapsulated by brackets, TeXcount will differ from TeX/LaTeX: TeXcount may take entire words as parameters where TeX/LaTeX will use individual characters, and so the result may be quite incorrect.

For more extensive details, please read the documentation.

How are environments handled?

Environments, i.e. regions enclosed by \begin{name} ... \end{name} are handled in a manner similar to macros. The \begin part of the environment may take parameters, and these are handled by rules similar to those of other macros.

Next, an environment rule also specifies how to interpret the contents. The contents may be specified to be text, header or caption text, but it may also be an inline math formulae, displayed math which is counted separately from inlined formulae, or it may simple be counted as a floating object (float for short) which may typically be things like figures or tables.

For more extensive details, please read the documentation.

What is the rule for ...?

Starting with version 2.3, TeXcount is able to give help on the individual macro and environment rules. Just run TeXcount with the option -h=name (or -help=name or -?=name)where the name is the macro or environment name. Note that the macro should start with the backslash, and this may need to be escaped: i.e. you may have to use something like -h='\cite' or -h=\\cite since it may try to interpret the backslash as a special character.

Other languages: Greek, Chinese, ...

Languages that use only the Latin letters A-Z should generally be fine. There are a few cases that might still cause problems. Words joined by - are generally counted as one word, and acronyms containing periods but no spaces are counted as one word. If letters are separated out, e.g. {C}apital, using the -relaxed option may help: this relaxes the word definition so more is allowed to go into a word. However, things like {\bf B}old will be split in two: "B" and "old".

Many languages have additional characters that are written using different types of accents (e.g. \`{o} for ò) or special characters (e.g. \ae for æ). By default, TeXcount does not recognise these, but again the -relaxed option will allow also many of these inside words.

If the document contains non-ASCII characters, e.g. ø, TeXcount depends on Perl's ability to recognise them as letters. The safest choice for this is to use Unicode UTF-8 encoding (which will be set as default from version 2.3). The alternative, Latin1, may work or it may depend on the locale to recognise the characters as letters. I have received reports that TeXcount has worked with both Greek and Hebrew. In addition, there are separate options to count Chinese or Japanese characters (requires UTF-8 encoding).

Integration

View parsing details in a web browser

If TeXcount is run with the option -html, it will output a document in HTML format which may be viewed in a browser. This may then be opened in a web browser such as Internet Explorer, Firefox or Opera.

A short Windows script that runs TeXcount and opens the output using the default web browser is:

rem Path to the TeXcount script (changed to appropriate path)
set tc=C:\lib\_util_\texcount.pl
rem Set temporary file (%temp%=temporary file directory)
set tempdoc=%temp%\parse.html
rem Run TeXcount with -html and other desired options
%tc% -sub -inc -v -html -dir %1 > %tempdoc%
rem Open temporary HTML document
start %tempdoc%

Make a batch file, e.g. texcount.bat, and running this with the LaTeX document as the parameter will open the results in you default web browser. You will have to change the path in set tc to where the TeXcount script is located, and you may wish to change the options used. If html documents are not set up to open in your default browser, you may have to specify the program: e.g. start firefox %tempdoc%.

Using less with colour codes (Unix/Linux)

When using less to look through the output, the option -r must be used for the ANSI colour codes to be handled correctly.

Include word count in LaTeX document

If you want the word count included in the LaTeX document, the easiest way is to write the output from TeXcount to a file and include this. If you run
texcount.pl -1 -sum document.tex > document.sum
with the desired file names, this will produce a file containing only the total word count; alternatively,
texcount.pl -1 -sum document.tex -out=document.sum
will cause TeXcount to write the result directly to the file. Then define the LaTeX macro \wordcount
\newcommand\wordcount{\input{document.sum}}
to include the count, or
\newcommand\wordcount{\input{\jobname.sum}}
to use the name of the main LaTeX document instead of document as file name. Alternatively, you may produce more extensive output using
texcount.pl document.tex -out=document.sum
and, with \usepackage{verbatim}, define
\newcommand\wordcount{\verbatiminput{\jobname.sum}}
to include the full summary. Of course, if you have several files you want included, use the -inc option.

Running TeXcount from TeX/LaTeX

It is possible to execute TeXcount or a script directly from TeX/LaTeX using \immediate\write18{...} to give instructions to be executed by the OS. The exact instructions may, of course, depend on the OS. However, an example that might work on most systems provided TeXcount is made executable using texcount is:

% Run texcount on tex-file and write results to a sum-file
\immediate\write18{texcount \jobname.tex -out=\jobname.sum}
% Define macro \wordcount for including the counts
\usepackage{verbatim}
\newcommand\wordcount{\verbatiminput{\jobname.sum}}

You then place \wordcount where you want the count inserted. This example uses the \verbatiminput from the varbatim package to include test.sum. Note that you may have to add the option --enable-write18 or --shell-escape when you run TeX/LaTeX to permit use of \write18. For more complex examples, e.g. if you have your own output template to use with TeXcount or many options, you may find it more convenient to make a script rather than call TeXcount directly.

Integrate TeXcount with Emacs

How to integrate TeXcount with Emacs may depend slightly on the Emacs version and the TeXcount version.

I have adapted a and modified it to utilise features from TeXcount version 2.3, but also provide alternatives for version 2.2 or older. Another solution for use with AUCTeX adds TeXcount to the menu.

The first method allows you to do a word count on LaTeX documents by pressing Ctrl-c w or M-x latex-word-count by adding the following code to the .emacs file:

;;
;; TeXcount setup for TeXcount version 2.3 and later
;;
(defun texcount ()
  (interactive)
  (let*
    ( (this-file (buffer-file-name))
      (enc-str (symbol-name buffer-file-coding-system))
      (enc-opt
        (cond
          ((string-match "utf-8" enc-str) "-utf8")
          ((string-match "latin" enc-str) "-latin1")
          ("-encoding=guess")
      ) )
      (word-count
        (with-output-to-string
          (with-current-buffer standard-output
            (call-process "texcount" nil t nil "-0" enc-opt this-file)
    ) ) ) )
    (message word-count)
) )
(add-hook 'LaTeX-mode-hook (lambda () (define-key LaTeX-mode-map "\C-cw" 'texcount)))
(add-hook 'latex-mode-hook (lambda () (define-key latex-mode-map "\C-cw" 'texcount)))

You may also have to change the reference to texcount after call-process, e.g. giving the full path.

Relative to the original solution by Nicholas Riley, I added a hook for latex-mode in addition to LaTeX-mode. I also changed the option -brief into -0 which causes the output to be on one line with no line shift and added detection of file encoding.

The above method requires TeXcount 2.3 or later. I have made a version to use with TeXcount 2.2 as well as one to use with older TeXcount versions. Note that older TeXcount version may not handle file names containing spaces properly, and for TeXcount versions prior to 2.3 the Latin-1 file encoding will be assumed.

A solution for use with AUCTeX adds TeXcount to the Command menu, provided by Sidney Lin, requires the following code in the .emacs file:

;;
;; TeXcount setup for AUCTeX
;;
(require 'tex)
(add-to-list 'TeX-command-list
      (list "TeXcount" "texcount %s.tex" 'TeX-run-command nil t))

This sends the TeXcount output to a new buffer which can contain thus provide more details than the one line output from the previous solution.

Integrate TeXcount with TeXshop

Ramón M. Figueroa-Centeno have been kind to provide macros integrating TeXcount with TeXshop (LaTeX editor used on the OS X platform).

Running TeXcount in a pipe

Starting with version 2.3 of TeXcount, it can read input from STDIN if you give it - as an option. This allows TeXcount to be used in a pipe.

Customising word counts

Ignore segment of LaTeX document

You may have TeXcount completely ignore a segment of the LaTeX file by using
%TC:ignore
and
%TC:endignore
to mark the start and end of the ignore region.

Specify new macro handling rules

This is done by adding a TeXcount instruction in the LaTeX document on the form
%TC:macro \macroname n
indicating that the n parameters following the macro should be ignored, or
%TC:macro \macroname [state1,state2,...,stateN]
indicating that N parameters are parsed using the parsing states 1 to N. The main parsing options are:

  • text: count text
  • header: count header, and text as header text
  • float: count float/caption and content as other text
  • other: count text as other text
  • ignore: ignore text (will still parse but not include in counts)
  • xx: stronger ignore (ignores \begin and \end)
  • xxx: even stronger ignore (does not interpret macros)
  • xall: ignore everything including braces, so can only be used for environment contents

Previously, the parsing options were specified by numerical codes, which was less intuitive.

Note that the macro should be specified including the backslash.

Specify handling rule for begin—end groups

The syntax for handling begin—end groups is similar to that of macros:
%TC:envir name n state
if the n parameters following \begin{name} should be ignored, or
%TC:envir name [state1,state2,...,stateN] state
if the N parameters following \begin{name} should be parsed using parsing states 1 to N. In both cases, the contents between \begin{name} and \end{name} should be parsed using the state given as the last argument.

Can I count option text, e.g. \item[text]?

Prior to version 3.0, the answer was no as TeXcount would ignore the contents of macro options [...] whenever they come after macros or macro parameters. However, starting with version 3.0, this is possible, and the rule for \item has been updated.

TeXcount and bibliographies

Up until version 2.3 of TeXcount, there was no option to include bibliographies in the word count. From version 2.3, the option -incbib may be specified, or the TC-command %TC:incbib included in the LaTeX document to include the bibliography in the count. TeXcount will then process the thebibliography environment if included in the document, or read the bibliography bbl-file if the bibliography is included through the \bibliography command.

You should take extra care to check how TeXcount parses the bibliography. The format of the raw bib-code may vary, and TeXcount cannot necessarily be expected to handle them very well. You might want to try the -relaxed option since this should recognise that {W}ord is one word rather than two. However, TeXcount will still count every number (page numbers, volumes, etc.) as words.

Note that the TeXcount rule for including the bibliography also includes the assumption that the bibliography has a header consisting of one word: these are added to the count.

Count characters instead of words

You can perform a letter count by specifying the option -char (or -letter). If you you are writing Chinese or Japanese, there are the options -chinese and -japanese: these require that the file is saved using Unicode UTF-8 character encoding. Not that the letter count does not include spaces. If you also need to count the spaces, the simplest solution is to assume that the number of spaces roughly equals the number of words and then add the word count to the letter count.

Can I count the abstract separately?

There is no direct option or instruction for producing a separate count for the abstract, but it is possible to count it as a section by putting a breakpoint before, and possible also after to make sure no text after the abstract gets included. Example:

 %TC:break Abstract
 \begin{abstract}
 Abstract goes here.
 \end{abstract}
 %TC:break _main_

The first breakpoint initiates a new count with the label Abstract, and the one after sets a new breakpoint with title _main_: if no text is included before the next natural breakpoint comes, e.g. a section header, this will not be reported in the summary count. The summary output option must be one that produces counts per section, although that should be the default.

Can I count footnotes, etc. separately?

Starting with version 3 of TeXcount, you can add new counters and define macro handling rules for e.g. footnotes to use this. This is explained in greater detail in the documentation.

Problems

TeXcount does not handle macro/environment correctly

In many cases, the reason for this will be that TeXcount does not have a rule for handling the macro/environment. From version 2.3 of TeXcount, you can check the rule of any macro/environment.

If it is a non-standard macro/environment, you would have to add the rule yourself. Look up the TC-commands in the documentation for help on how to do this from within the document.

If you think the macro/environment is a standard one that TeXcount should support, e.g. if it is part of a standard distribution of TeX/LaTeX, or you think TeXcount has an incorrect handling rule, please send a notification. However, be aware that some macros cannot easily be handled, e.g. if they do not match the format TeXcount handling rules use for macros and environments.

The TeXcount output is garbled with unreadable characters

If the output from TeXcount contains lots of unreadable characters, these are most likely the ANSI colour and style codes: that would mean that the window you view the output in does not support ANSI colour and style codes. The option -nc (or -nocol) turns off the ANSI colour codes. Instead, you can use HTML output.

TeXcount splits up words containing special letters

Many languages contain letters other than A-Z, and these may pose a problem for two different reasons. If Perl/TeXcount does not recognise that these characters are letters, they will be handled as special characters like punctuation and cause words to be split. Worse, TeXcount will be default assume that files are encoded using a ISO 8859 encoding (locale dependent), and will not handle properly non-ASCII characters if from another encoding such as UTF-8 (Unicode).

The default rule for what TeXcount considers a word is rather strict. In some languages, this may cause problems as some special characters may not be identified as letters. There are two things you can try (only one available for the online web service). By using the option -relaxed, the word identification rule is relaxed a bit, in particular to accept some character modifiers. By changing the Perl locale defined at the start of the script, other characters may be identified as letters.

The world is gradually shifting to Unicode, mostly storing text using the UTF-8 encoding since this is compatible with ASCII. Unicode contains more characters than most of us are ever going to need: Latin and regional special letters, Greek, Cyrillic, Chinese, Japanese, Korean, ... In time, I suppose both Perl and TeXcount will switch to UTF-8 as default, but as of now you have to specify -utf8 to specify that your LaTeX file is saved in this format. This is done automatically if you specify -chinese or -japanese.

TeXcount has problems identifying macro options

TeXcount assumes that anything after a macro enclosed in [] is a macro option: it does not know which macros actually take such options. In order to reduce the risk of misinterpreting text enclosed in [] as macro options, TeXcount is restrictive about what the contents of a macro option may be. If TeXcount counts some macro options as text, you may try using the -relaxed option which will relax the restriction somewhat.

Error: Can’t locate locale.pm

The TeXcount script uses the locale to identify letters. TeXcount, depending on the version you have, may contain one or more of the lines

use POSIX qw(locale_h);
use locale;
setlocale(LC_CTYPE,"no_NO");

Comment out all of these (add # in front) and see if that helps. Starting with version 2.3, use POSIX and setlocale have both been removed, but use locale been retained in order that the regular expressions recognise letters other than A-Z as specified in your locale.

Without use locale, letters other than A-Z may not be identified as letters and instead treated as special characters. One exception is if you use -utf8 (default with -chinese and -japanese) in which it should still work.

Other

"I have a nice utility to use with TeXcount..."

If you have a script, utility, application, etc. to use with TeXcount or that uses TeXcount, and you'd like to share this with others, please drop me a line and I can quote it or add a reference to it if you have it posted on a web page.

If you have a downloadable script or program on your own web page, you are free to include TeXcount (script and documentation) together with it if you think that is more convenient for the user.

I have made a separate page for supporting scripts, utilities and applications for things that do not go into the TeXcount package.

Large LaTeX documents

TeXcount reads the entire document into memory, so this is of course a limiting factor, but this is hardly likely to be a limitation unless you wish to run it one a device with very little memory.

The more practical limitation is the time it takes to process a large document. Up until version 2.3, TeXcount had not been made to handle large documents well: it essentially handled the document as one big string which made it very slow. With version 2.3, this has been substantially improved, although there may still be cases where TeXcount take its time.

On older TeXcount version, i.e. before version 2.3, processing would start to slow down when files exceed 100 kB of text. From version 2.3, however, this should be less of a problem: a document of around 1 MB should run in a few seconds, whereas documents substantially larger may take a while.

Is using the web service secure?

If you are very worried about keeping you document secret, you might prefer download the script rather than upload the document to the web service. The communication is not encrypted, so in theory it would be possible to intercept it. Hence, if the document contains sensitive or classified information, you should not use the web service.

The web service itself does not save the document (although I suppose it may be temporarily stored by the web server). It does however log all calls, both to the web service and to download files, in order to produce statistics on use. The log stores time of access, IP address, which file or TeXcount version was called, and the size of the file processed by the web service. The main reason for storing the IP address is to be able to distinguish between calls by different users versus repeated calls by the same user when the usage statistics is produced.

If you want to contribute to TeXcount...

If you find bugs, identify problems, common macros for which TeXcount does not have any or appropriate handling rules, please email me. That will help improve TeXcount.

Donations are a common means to support free programs like TeXcount. However, I'm fairly well off myself, and there are lots of causes in much greater need. I have picked Doctors without Borders for myself, and would recommend anyone with money to spare to find an organisation to which to donate. If you pick an organisation and donate to them regularly, rather than wait to be approached by them (called, stopped on the street, ...), this will help them spend more of their collected money on actual aid work, and less on marketing, call-centres, public awareness, etc.


Last modified February 01, 2017.