Logo

Charles Steinkuehler's LEAF/LRP Website


 

webalizer.1




NAME

       webalizer - A web server log file analysis tool.


SYNOPSIS

       webalizer [ option ... ] [ log-file ]

       webazolver [ option ... ] [ log-file ]


DESCRIPTION

       The  Webalizer  is  a web server log file analysis program
       which produces usage statistics in HTML format for viewing
       with  a browser.  The results are presented in both colum­
       nar and graphical format,  which  facilitates  interpreta­
       tion.   Yearly, monthly, daily and hourly usage statistics
       are presented, along with the ability to display usage  by
       site,  URL,  referrer,  user  agent  (browser),  username,
       search  strings,  entry/exit  pages,   and  country  (some
       information may not be available if not present in the log
       file being processed).

       The Webalizer supports CLF (common log format) log  files,
       as  well  as  Combined  log formats as defined by NCSA and
       others, and variations of these which it attempts to  han­
       dle  intelligently.   In addition, the Webalizer also sup­
       ports wu-ftpd xferlog formatted log files, allowing analy­
       sis  of  ftp servers, and squid proxy logs.  Logs may also
       be compressed, via gzip.  If  a  compressed  log  file  is
       detected,  it  will be automatically uncompressed while it
       is read.  Compressed logs  must  have  the  standard  gzip
       extension of .gz.

       webazolver  is normally just a symbolic link to the webal­
       izer.   When  run  as  webazolver,  only  DNS  file   cre­
       ation/updates  are  performed,  and  the program will exit
       once  complete.   All  normal  options  and  configuration
       directives  are  available, however many will not be used.
       In addition, a DNS cache file must be specified.   If  the
       number of DNS children processes to use are not specified,
       the webazolver will default to 5.

       This documentation applies to The Webalizer Version 2.00


RUNNING THE WEBALIZER

       The Webalizer was designed to be run from a  Unix  command
       line  prompt or as a crond(8) job. Once executed, the gen­
       eral flow of the program is:

       o       A default configuration file is  scanned  for.   A
               file  named  webalizer.conf is searched for in the
               current directory, and if found,  it's  configura­
               tion  data  is parsed.  If the file is not present
               in the current directory,   the  file  /etc/webal­
               izer.conf  is  searched for and, if found, is used
               instead.

       o       Any command line arguments given  to  the  program
               are parsed.  This may include the specification of
               a configuration file, which is  processed  at  the
               time it is encountered.

       o       If a log file was specified, it is opened and made
               ready for processing.  If no log file  was  given,
               STDIN  is used for input.  If the log filename '-'
               is specified, STDIN will be forced.

       o       If an output directory was specified, the  program
               does  a  chdir(2)  to that directory in prepration
               for generating output.  If no output directory was
               given, the current directory is used.

       o       If  a  non-zero  number  of DNS Children processes
               were specified, they  will  be  started,  and  the
               specified  log file will be processed, creating or
               updating the specified DNS cache file.

       o       If no hostname was given, the program attempts  to
               get the hostname using a uname(2) system call.  If
               that fails, localhost is used.

       o       A history file is  searched  for  in  the  current
               directory  (output  directory)  and read if found.
               This file keeps totals for previous months,  which
               is  used  in  the  main  index.html HTML document.
               Note: The file location can now be specified  with
               the HistoryName configuration option.

       o       If  incremental  processing  was specified, a data
               file is searched for and loaded if found, contain­
               ing  the  'internal  state' data of the program at
               the end of a previous run.  Note: The  file  loca­
               tion can now be specified with the IncrementalName
               configuration option.

       o       Main processing begins on the log  file.   If  the
               log  spans  multiple months, a seperate HTML docu­
               ment is created for each month.

       o       After main processing, the main index.html page is
               created,  which  has  totals by month and links to
               each months HTML document.

       o       A  new  history  file  is  saved  to  disk,  which
               includes  totals generated by The Webalizer during
               the current run.

       o       If incremental processing was  specified,  a  data
               file is written that contains the 'internal state'
               data at the end of this run.


INCREMENTAL PROCESSING

       Version 1.2x of The Webalizer adds incremental  run  capa­
       bility.   Simply  put,  this  allows  processing large log
       files by breaking them up into smaller  pieces,  and  pro­
       cessing  these  pieces  instead.   What this means in real
       terms is that you can now rotate your log files  as  often
       as  you  want,  and still be able to produce monthly usage
       statistics without the loss of any detail.  Basically, The
       Webalizer  saves  and restores all internal data in a file
       named  webalizer.current.   This  allows  the  program  to
       'start  where  it  left  off'  so to speak, and allows the
       preservation of detail from one run to the next.  The data
       file  is  placed in the current output directory, and is a
       plain ascii text file that can be viewed with any standard
       text  editor.  It's location and name may be changed using
       the IncrementalName configuration keyword.

       Some special precautions need to be taken when  using  the
       incremental  run  capability of The Webalizer.  Configura­
       tion options should not be changed between runs,  as  that
       could  cause  corruption of the internal data stored.  For
       example, changing the MangleAgents level will  cause  dif­
       ferent  representations  of user agents to be stored, pro­
       ducing invalid results in the user agents section  of  the
       report.   If  you need to change configuration options, do
       it at the end of the month after normal processing of  the
       previous  month  and  before processing the current month.
       You may also want to delete the webalizer.current file  as
       well.

       The Webalizer also attempts to prevent data duplication by
       keeping track of the timestamp of  the  last  record  pro­
       cessed.   This  timestamp  is  then  compared  to  current
       records being processed, and any records that were  logged
       previous  to that timestamp are ignored.  This, in theory,
       should allow you to re-process logs that have already been
       processed,  or  process  logs  that  contain a mix of pro­
       cessed/not yet processed records, and not produce duplica­
       tion  of  statistics.   The only time this may break is if
       you have duplicate timestamps in two seperate log files...
       any  records  in the second log file that do have the same
       timestamp as the last record in the previous log file pro­
       cessed, will be discarded as if they had already been pro­
       cessed.  There are lots of ways to prevent  this  however,
       for  example, stopping the web server before rotating logs
       will prevent this situation.  This setup also necessitates
       that  you always process logs in chronological order, oth­
       erwise data loss will occur as a result of  the  timestamp
       compare.


REVERSE DNS LOOKUPS

       The  Webalizer  supports reverse DNS lookups through a DNS
       cache file that is either created/updated at run-time,  or
       has  been  previously created, either by a previous run of
       the webalizer, or  by  running  the  stand-alone  version,
       webazolver.   In  order  to perform reverse DNS lookups, a
       DNSCache filename must be specified.   In  order  to  cre­
       ate/update  the  cache  file  at run-time, the DNSChildren
       number must be non-zero.  The DNSChildren value  specifies
       the  number  of  children processes to fork, each of which
       will perform reverse DNS lookups in order to create/update
       the  DNS  cache  file.   See the file DNS.README for addi­
       tional information.


COMMAND LINE OPTIONS

       The  Webalizer  supports  many   different   configuration
       options  that  will  alter the way the program behaves and
       generates output.  Most of these can be specified  on  the
       command  line,  while some can only be specified in a con­
       figuration file.  The  command  line  options  are  listed
       below,  with references to the corresponding configuration
       file keywords.

       General Options

       -h      Display all available  command  line  options  and
               exit program.

       -v -V   Display program version and exit program.

       -d      Debug.   Display  debugging information for errors
               and warnings.

       -i      IgnoreHist.  Ignore history.   USE  WITH  CAUTION.
               This will cause The Webalizer to ignore any previ­
               ous monthly history file only.   Incremental  data
               (if present) is still processed.

       -p      Incremental.  Preserve internal data between runs.

       -q      Quiet.  Supress informational messages.  Does  not
               supress warnings or errors.

       -Q      ReallyQuiet.  Supress all messages including warn­
               ings and errors.

       -T      TimeMe.  Force display of  timing  information  at
               end of processing.

       -c file Use configuration file file.

       -n name Hostname.  Use the hostname name.

       -o dir  OutputDir.  Use output directory dir.

       -t name ReportTitle.  Use name for report title.
       -F ( clf | ftp | squid )
               LogType.  Specify log type to be processed.  Value
               can be either clf, ftp or squid  format.   If  not
               specified,  will  default to CLF format.  FTP logs
               must be in standard wu-ftpd xferlog format.

       -f      FoldSeqErr.  Fold out of sequence log records back
               into  analysis,  by  treating  as if they were the
               same date/time as the last good record.  Normally,
               out of sequence log records are simply ignored.

       -Y      CountryGraph. Supress country graph.

       -G      HourlyGraph.  Supress hourly graph.

       -x name HTMLExtension.   Defines  HTML  file  extension to
               use.  If not specified, defaults to html.  Do  not
               include the leading period.

       -H      HourlyStats.  Supress hourly statistics.

       -L      GraphLegend.  Supress color coded graph legends.

       -l num  GraphLines.   Specify  number of background lines.
               Default is 2.   Use  zero  ('0')  to  disable  the
               lines.

       -P name PageType.   Specify  file extensions that are con­
               sidered   pages.    Sometimes   referred   to   as
               pageviews.

       -m num  VisitTimeout.   Specify  the Visit timeout period.
               Must be given in HHMMSS  format.   Default  is  30
               minutes (3000).

       -I name IndexAlias.   Use  the  filename  name as an addi­
               tional alias for index..

       -M num  MangleAgents.  Mangle user agent  names  according
               to the mangle level specified by num.  Mangle lev­
               els are:

               5   Browser name and major version.

               4   Browser name, major and minor version.

               3   Browser name, major version, minor version  to
                   two decimal places.

               2   Browser  name,  major  and  minor versions and
                   sub-version.

               1   Browser name, version and machine type if pos­
                   sible.

               0   All informaiton (left unchanged).

       -g num  GroupDomains. Automatically group sites by domain.
               The grouping level specified by num can be thought
               of  as  'the  number  of  dots'  to display in the
               grouping.  The default value  of  0  disables  any
               domain grouping.

       -D name DNSCache.  Use the DNS cache file name.

       -N num  DNSChildren.   Use  num  DNS children processes to
               perform DNS lookups, either creating or  updateing
               the  DNS  cache file.  Specify zero (0) to disable
               cache file  creation/updates.   If  given,  a  DNS
               cache filename must be specified.

       Hide Options

       -a name HideAgent.  Hide user agents matching name.

       -r name HideReferrer.  Hide referrer matching name.

       -s name HideSite.  Hide site matching name.

       -X name HideAllSites.   Hide  all  individual  sites (only
               display groups).

       -u name HideURL.  Hide URL matching name.

       Table size options

       -A num  TopAgents.  Display the top num user agents table.

       -R num  TopReferrers.    Display  the  top  num  referrers
               table.

       -S num  TopSites.  Display the top num sites table.

       -U num  TopURLs.  Display the top num URL's table.

       -C num  TopCountries.   Display  the  top  num   countries
               table.

       -e num  TopEntry.   Display the top num entry pages table.

       -E num  TopExit.  Display the top num exit pages table.


CONFIGURATION FILES

       Configuration files are standard ascii(7) text files  that
       may be created or edited using any standard editor.  Blank
       lines and lines that begin with a  pound  sign  ('#')  are
       ignored.  Any other lines are considered to be configurga­
       tion lines, and have the form "Keyword Value",  where  the
       ´Keyword´  is one of the currently available configuration
       keywords defined below, and 'Value' is the value to assign
       to  that particular option.  Any text found after the key­
       word up to the end of the line is considered the keyword's
       value, so you should not include anything after the actual
       value on the line that is not actually part of  the  value
       being  assigned.   The  file sample.conf provided with the
       distribution contains lots  of  useful  documentation  and
       examples as well.

       General Configuration Keywords

       LogFile name
               Use log file named name.  If none specified, STDIN
               will be used.

       LogType name
               Specify log file  type  as  name.  Values  can  be
               either  web,  squid or ftp, with the default being
               web.

       OutputDir dir
               Create output in the directory dir.  If none spec­
               ified, the current directory will be used.

       HistoryName name
               Filename  to  use  for  history file.  Relative to
               output directory unless  absolute  name  is  given
               (ie:   starts   with  '/').  Defaults  to  ´webal­
               izer.hist' in the standard output directory.

       ReportTitle name
               Use the title string name for  the  report  title.
               If none specified, use the default of (in english)
               "Usage Statistics for ".

       Hostname name
               Set the hostname for the report as name.  If  none
               specified,  an  attempt will be made to gather the
               hostname via a  uname(2)  system  call.   If  that
               fails, localhost will be used.

       UseHTTPS ( yes | no )
               Use  https://  on  links  to  URLS, instead of the
               default http://, in the 'Top URL's' table.

       Quiet ( yes | no )
               Supress informational messages.  Warning and Error
               messages will not be supressed.

       ReallyQuiet ( yes | no )
               Supress  all messages, including Warning and Error
               messages.

       Debug ( yes | no )
               Print extra debugging information on Warnings  and
               Errors.

       TimeMe ( yes | no )
               Force timing information at end of processing.

       GMTTime ( yes | no )
               Use  GMT  (UTC) time instead of local timezone for
               reports.

       IgnoreHist ( yes | no )
               Ignore previous monthly history  file.   USE  WITH
               CAUTION.   Does  not prevent Incremental file pro­
               cessing.

       FoldSeqErr ( yes | no )
               Fold out of sequence log records back into  analy­
               sis  by  treating  them  as  if  they had the same
               date/time as the last good record.  Normally,  out
               of sequence log records are ignored.

       CountryGraph ( yes | no )
               Display Country Usage Graph in output report.

       HourlyGraph ( yes | no )
               Display Hourly Graph in output report.

       HourlyStats ( yes | no )
               Display Hourly Statistics in output report.

       PageType name
               Define  the file extensions to consider as a page.
               If a file is found to have the same  extension  as
               name,  it  will  be  counted  as a page (sometimes
               called a pageview).

       GraphLegend ( yes | no )
               Allows  the  color  coded  graph  legends  to   be
               enabled/disabled.

       GraphLines num
               Specify  the  number of background reference lines
               displayed on  the  graphs  produced.   Disable  by
               using zero ('0'), default is 2.

       VisitTimeout num
               Specifies  the visit timeout value.  Default is 30
               minutes.  A visit is determined by looking at  the
               difference  in  time  between the current and last
               request from a specific site.  If  the  difference
               is  greater  or  equal  to  the timeout value, the
               request is counted as a new visit.

       IndexAlias name
               Use name as an additional alias for index.*.

       MangleAgents num
               Mangle user agent names based on mangle level num.
               See  the  -M command line switch for mangle levels
               and  their  meaning.   The  default  is  0,  which
               doesn't mangle user agents at all.

       SearchEngine name variable
               Allows  the  specification  of  search engines and
               their query strings.  The  name  is  the  name  to
               match  against  the  referrer  string  for a given
               search engine.  The variable is the  cgi  variable
               that  the search engine uses for queries.  See the
               sample.conf file for  example  usage  with  common
               search engines.

       Incremental ( yes | no )
               Enable Incremental mode processing.

       IncrementalName name
               Filename to use for incremental data.  Relative to
               output directory unless an absolute name is  given
               (ie:   starts  with  '/').   Defaults  to  ´webal­
               izer.current' in the standard output directory.

       DNSCache name
               Filename to use for the DNS  cache.   Relative  to
               output  directory unless an absolute name is given
               (ie: starts with '/').

       DNSChildren num
               Number of children DNS processes to run  in  order
               to create/update the DNS cache file.  Specify zero
               (0) to disable.

       Top Table Keywords

       TopAgents num
               Display the top num User Agents table. Use zero to
               disable.

       AllAgents ( yes | no )
               Create seperate HTML page with All User Agents.

       TopReferrers num
               Display  the  top num Referrers table. Use zero to
               disable.

       AllReferrers ( yes | no )
               Create seperate HTML page with All Referrers.

       TopSites num
               Display the top num Sites table. Use zero to  dis­
               able.

       TopKSites num
               Display  the  top num Sites (by KByte) table.  Use
               zero to disable.

       AllSites ( yes | no )
               Create seperate HTML page with All Sites.

       TopURLs num
               Display the top num URLs table. Use zero  to  dis­
               able.

       TopKURLs num
               Display  the  top  num URLs (by KByte) table.  Use
               zero to disable.

       AllURLs ( yes | no )
               Create seperate HTML page with All URLs.

       TopCountries num
               Display the top num Countries in  the  table.  Use
               zero to disable.

       TopEntry num
               Display the top num Entry Pages in the table.  Use
               zero to disable.

       TopExit num
               Display the top num Exit Pages in the table.   Use
               zero to disable.

       TopSearch num
               Display  the  top num Search Strings in the table.
               Use zero to disable.

       AllSearchStr ( yes | no )
               Create seperate HTML page with All Search Strings.

       TopUsers num
               Display  the  top num Usernames in the table.  Use
               zero to disable.  Usernames are only available  if
               using http based authentication.

       AllUsers ( yes | no )
               Create seperate HTML page with All Usernames.

       Hide/Ignore/Group/Include Keywords

       HideAgent name
               Hide User Agents that match name.

       HideReferrer name
               Hide Referrers that match name.

       HideSite name
               Hide Sites that match name.

       HideAllSites ( yes | no )
               Hide  all  individual  sites.   This  causes  only
               grouped sites to be displayed.

       HideURL name
               Hide URL's that match name.

       HideUser name
               Hide Usernames that match name.

       IgnoreAgent name
               Ignore User Agents that match name.

       IgnoreReferrer name
               Ignore Referrers that match name.

       IgnoreSite name
               Ignore Sites that match name.

       IgnoreURL name
               Ignore URL's that match name.

       IgnoreUser name
               Ignore Usernames that match name.

       GroupAgent name [Label]
               Group User Agents that match name.  Display  Label
               in 'Top Agent' table if given (instead of name).

       GroupReferrer name [Label]
               Group Referrers that match name.  Display Label in
               'Top Referrer' table if given (instead of name).

       GroupSite name [Label]
               Group Sites that match  name.   Display  Label  in
               'Top Site' table if given (instead of name).

       GroupDomains num
               Automatically  group  sites  by domain.  The value
               num specifies the level of grouping,  and  can  be
               thought  of  as  the  'number  of dots' to be dis­
               played.  The default value of  0  disables  domain
               grouping.

       GroupURL name [Label]
               Group  URL's  that  match  name.  Display Label in
               'Top URL' table if given (instead of name).

       GroupUser name [Label]
               Group Usernames that match name.  Display Label in
               'Top  Usernames' table if given (instead of name).

       IncludeSite name
               Force inclusion of sites that match  name.   Takes
               precedence over Ignore# keywords.

       IncludeURL name
               Force  inclusion  of URL's that match name.  Takes
               precedence over Ignore# keywords.

       IncludeReferrer name
               Force inclusion  of  Referrers  that  match  name.
               Takes precedence over Ignore# keywords.

       IncludeAgent name
               Force  inclusion  of  User Agents that match name.
               Takes precedence over Ignore* keywords.

       IncludeUser name
               Force inclusion  of  Usernames  that  match  name.
               Takes precedence over Ignore* keywords.

       HTML Generation Keywords

       HTMLExtension text
               Defines  the  HTML file extension to use.  Default
               is html.  Do not include the leading period!

       HTMLPre text
               Insert text at the very beginning of the generated
               HTML  file.   Defaults to a standard html 3.2 DOC­
               TYPE record.

       HTMLHead text
               Insert text within the <HEAD></HEAD> block of  the
               HTML file.

       HTMLBody text
               Insert text in HTML page, starting with the <BODY>
               tag.  If used, the first line must be a <BODY ...>
               tag.  Multiple lines may be specified.

       HTMLPost text
               Insert  text  at  top (before horiz. rule) of HTML
               pages.  Multiple lines may be specified.

       HTMLTail text
               Insert text at bottom of the HTML page.  The  text
               is  top and right aligned within a table column at
               the end of the report.

       HTMLEnd text
               Insert text at the very end of the HTML page.   If
               not specified, the default is to insert the ending
               </BODY> and </HTML> tags.  If used, you must  sup­
               ply these tags yourself.

       Dump Object Keywords

       The Webalizer allows you to export processed data to other
       programs by using tab delimited  text  files.   The  Dump*
       commands specify which files are to be written, and where.

       DumpPath name
               Save dump files in directory name.  If not  speci­
               fied,  the  default output directory will be used.
               Do not specify a trailing slash (/fP).

       DumpExtension name
               Use name as the filename extension for dump files.
               If not given, the default of tab will be used.

       DumpHeader ( yes | no )
               Print  a  column header as the first record of the
               file.

       DumpSites ( yes | no )
               Dump the sites data to a tab delimited file.

       DumpURLs ( yes | no )
               Dump the url data to a tab delimited file.

       DumpReferrers ( yes | no )
               Dump the referrer data to  a  tab  delimitd  file.
               This  data  is  only available if using a log that
               contains referrer information (ie: a combined for­
               mat web log).

       DumpAgents ( yes | no )
               Dump  the user agent data to a tab delimited file.
               This data is only available if using  a  log  that
               contains  user  agent  information (ie: a combined
               format web log).

       DumpUsers ( yes | no )
               Dump the username data to a  tab  delimited  file.
               This  data  is  only available if processing a wu-
               ftpd xferlog or  a  web  log  that  contains  http
               authentication information.

       DumpSearchStr ( yes | no )
               Dump  the  search  string  data to a tab delimited
               file.  This data is only available if processing a
               web log that contains referrer information and had
               search string information present.


FILES

       webalizer.conf      Default   configuration   file.     Is
                           searched  for in the current directory
                           and if not found, in the /etc/  direc­
                           tory.

       webalizer.hist      Monthly  history  file for previous 12
                           months.  (can be changed)

       webalizer.current   Current state data  file  (Incremental
                           processing).  (can be changed)

       xxxxx_YYYYMM.html   Various monthly HTML output files pro­
                           duced. (extension can be changed)

       xxxxx_YYYYMM.png    Various monthly image  files  used  in
                           the reports.

       xxxxx_YYYYMM.tab    Monthly   tab  delimited  text  files.
                           (extension can be changed)


BUGS

       Report bugs to brad@mrunix.net.


COPYRIGHT

       Copyright (C) 1997-2000  by  Bradford  L.  Barrett.   Dis­
       tributed  under  the GNU GPL.  See the files "COPYING" and
       "Copyright", supplied with  all  distributions  for  addi­
       tional information.


AUTHOR

       Bradford L. Barrett <brad@mrunix.net>


Man(1) output converted with man2html