Install, configure and protect Awstats for multiple nginx vhost on Debian

There’s already a lot of tutorial on internet on how to install awstats for nginx there are even options where you can get Cheap Instagram followers. I didn’t find any for the configuration I wanted for my outdoor gear website, FishingPicks.com, so I’ll write one, for my record.

I have some custom needs, let’s suppose I have 3 domains :

  • master-domain.com
  • alpha.com
  • beta.com

And I want to have stats for the 2 latest domains. The master-domain.com is used as the master domain of the server, with awstats available at awstats.master-domain.com, instead of having alpha.com/awstats and beta.com/awstats. The idea it to group all the server script/tools (phpmyadmin, zabbix, etc …) under master-domain.com.

We also want to password protect the stats, but with different credential for each vhost.

These steps have been tested on Debian Squeeze, on a Kimsufi.

Install Awstats

apt-get install awstats

On debian squeeze, awstats install things in 3 places :

  • /etc/awstats : contains all the conf files for each of your awstats installation
  • /usr/share/awstats : contains all tools and libraries used by awstats
  • /usr/share/doc/awstats : docs, tools for building the static html pages, icons and other static files used by html

Formatting Nginx log

Nginx by default output logs that already can be read by awstats, as long as you use the Combined format. If you set your errors log like this :

error_log /path/to/log.log;

Then you’re good. The combined format is implicit. It’s equivalent to

error_log /path/to/log.log combined;

Optional step

Using the default format is fine, but you can log one more field, that could be pertinent : the http_x_forwarded_for.

It’s used to capture the client IP address when he is connecting through a proxy of load balancer.

For that, we define another log format, named main in /etc/nginx/nginx.conf. In the server scope, add :

log_format main     '$remote_addr - $remote_user [$time_local] "$request" '
                    '$status $body_bytes_sent "$http_referer" '
                    '"$http_user_agent" "$http_x_forwarded_for"';).    

It’s the same as the combined format, plus the $http_x_forwarded_for bit at the end. To use this format, add main at the end of your error_log directive.

error_log /path/to/log.log main;  

As this last field is not used by awstats, we should tell it to ignore it. In /etc/awstats/awstats.conf.local, add :

LogFormat = "%host - %host_r %time1 %methodurl %code %bytesd %refererquot %uaquot %otherquot"

This file should be empty by default. It’s used to set the settings shared by all your awstats config.

We teach awstats the meaning of each field when parsing the log. The last token (%otherquot) means that “Oh, that string here does not mean anything.”.

Creating a configuration file for each vhost

Awstats is picky about the configuration files : you should have one config file by vhost, they should be named following the convention : awstats.domain.tld.conf, and should be placed inside the /ect/awstats/ directory.

So, for the vhost alpha.com and beta.com, you should create these two files :

  1. awstats.alpha.com.conf
  2. awstats.beta.com.conf

The official method

There is already a model configuration file inside the /ect/awstats/ directory : awstats.conf. Documentation says to clone that file when creating your own config files, with

cp /ect/awstats/awstats.conf awstats.alpha.com.conf  
cp /ect/awstats/awstats.conf awstats.beta.com.conf

Then you just edit these files to your needs… Method I’m not fond of. If you take a look at awstats.conf, you’ll see that it’s a very complete conf, with plenty of comments, and all the available settings, all of that for just * suspense music * … 1500 lines.

I’m personally not interested into having multiples conf files, for 1500 lines each, with each files differing of just 4 lines.

The DRY method

If you have ls the /etc/awstats folder, you’ll see that there’s by default 2 files here :

  • awstats.conf
  • awstats.conf.local

awstats.conf is the main conf file, origin of all the other conf files. It’ll also fallback to this file if no other config file exists.

awstats.conf.local is an empty file. It’s the parent of all the other config files. If you have some rules that are shared among all your config, you put them here.

What I do is I copy all the contents of awstats.conf into awstats.conf.local, and just put the important rules inside each vhost config, so they’re easier to read, and shorter.

What to put in the conf files

Let’s create the conf files for alpha.com.

vi /etc/awstats/awstats.alpha.com.conf

We start with an empty file, insert the following lines

# Path to you nginx vhost log file
LogFile="/var/log/nginx/access.alpha.com.log"

# Domain of your vhost
SiteDomain="www.alpha.com"

# Directory where to store the awstats data
DirData="/var/lib/awstats/"

# Other alias, basically other domain/subdomain that's the same as the domain above
HostAliases="www.alpha.com"

By default, awstats store all its data inside /var/lib/awstats/, which is the default settings. You could change that to another directory, or have a subdirectory for each vhost, like /var/lib/awstats/alpha.com/.

But even if you use the default setting, you have to set it in each config, as it can not be inherited from awstats.conf.local.

You’re free to add more setting if some of your vhost requires additional customization.
Repeat the same steps for each vhost.

Tune the global settings

Edit awstats.conf.local,

  • Disable DNSLookup : DNSLookup = 0

  • Remove LogFile, SiteDomain, DirData and HostAliases directive, as they’re useless outside their context.

  • Set LogFormat to Combined (if you didn’t use the optional step in formatting the nginx log) LogFormat = 1

  • You could also enable some plugin, like GeoIP (require additional steps, beside uncommenting the line).

Computing data

Awstats is now configured for each vhost. We will now tell it to read the log files, and generate the stats from them. It’s a boring operation that should be done regularly (e.g, once a day, each 6 hours, etc…) depending on your need. More you wait, more the log file grow in size, and more time it will take to process it. It’ll depend on your website traffic.

To compute the data, a perl script is available in /usr/share/doc/awstats/examples. The awstats_updateall.pl will compute the stats for each available config. It’s easy, just run :

/usr/share/doc/awstats/examples/awstats_updateall.pl now -awstatsprog=/usr/lib/cgi-bin/awstats.pl

The -awstatsprog flag tell the script where to find the awstats.pl script, because awstats_updateall.pl is just a wrapper that is executing awstats.pl for each of your config.

The obvious solution to run this script regularly is to use a cron job. The drawback is that nginx logs are rotated with logrotate. It means that every X days, the log file will be archived (and renamed), and a new log file will be created. If you use a cronjob to compute the stats

  • Just before the log rotation, you’ll lose all data between the computation and the rotation, as the file is renamed and not accessible by awstats anymore
  • After the rotation, you’ll also lose all data between the computation and the next rotation.
  • At the rotation, you’ll experience some weird things.

Solution #1

We could prevent the data loss by telling awstats to always parse 2 logs files : the regular one, and the last archived log.

Logrotate always rename the file using the convention filename.1, filename.2. At each rotation, all filenames are incremented, and filename will become filename.1. A new filename will be created, so the newest archive is always filename.1.

In the awstats config for your vhost, edit the LogFile setting

LogFile="/usr/share/awstats/tools/logresolvemerge.pl /path/to/log/access.domain.tld.log /path/to/log/access.domain.tld.log.1 |"

logresolvemerge.pl will combine the 2 log files into one.

You’ll never lose data because of the rotation, since you’ll parse the rotated file too.

Solution #2

Execute the computation just before the rotation, using logrotate postrotate hook. This is useful especially if your computation interval equal the rotation interval (e.g, you rotate every day at midnight, and you compute also every day at midnight).

Edit the logrotate config for nginx :

vi /etc/logrotate/nginx.conf

I like to rotate log every day, to keep them lighter. By default, nginx rotate logs weekly.

/var/log/nginx/*.log {
    daily # rotate daily
    missingok 
    rotate 52 # Keep 52 days
    compress
    delaycompress
    notifempty
    create 0640 www-data adm
    sharedscripts
    prerotate
            # Trigger awstats computation
            /usr/share/doc/awstats/examples/awstats_updateall.pl now -awstatsprog=/usr/lib/cgi-bin/awstats.pl
    endscript
    postrotate
            # Reload Nginx to make it read the new log file
            [ ! -f /var/run/nginx.pid ] || kill -USR1 `cat /var/run/nginx.pid`
    endscript
}

You could also trigger manually computation by running the

/usr/share/doc/awstats/examples/awstats_updateall.pl now -awstatsprog=/usr/lib/cgi-bin/awstats.pl

directly in the shell, if you don’t want to wait for the log rotation at midnight.

You could use a regular cronjob on a single log file if you compute more than once a day, and use the postrotate hook just for the computation near midnight.

Building the html reports

awstats_updateall.pl will compute new stats, but not build the html pages. Awstats come with 2 options :

  • Build the static html page yourself
  • Use cgi to build the page dynamically

I’ll use the dynamic options, explained below. There’s already plenty of articles on internet explaining how to build static pages if it’s the way you want to go.

Exposing awstats

Now that awstats is configured and charged with data, let’s make it viewable by the internet.

Let’s create the subdomain where awstats will live : awstats.master-domain.com, linked to /var/www/awstats.

Let’s assume that the subdomain is already redirected to your server (creating the subdomain is not in the scope of this post), you just have to create the nginx virtual host for awstats.master-domain.com.

How you create it is your own choice, there’s multiple ways (single conf file, ‘sites-enabled’ a la apache, etc …).

A regular nginx vhost conf should looks like that :

server {
    listen 80;
    server_name awstats.master-domain.com;
    root        /var/www/awstats;
}

Let’s define the error log, and disable access log

error_log /var/log/nginx/awstats.master-domain.com.error.log;
access_log off;
log_not_found off;

Alias the icon folder, so it’s viewable online, instead of copy/pasting it.

location ^~ /icon {
    alias /usr/share/awstats/icon/;
}

Finally, configure /cgi-bin/scripts to go through php-fastcgi

location ~ ^/cgi-bin/.*\\.(cgi|pl|py|rb) {
    gzip off;
    include         fastcgi_params;
    fastcgi_pass    unix:/var/run/php5-fpm.sock;
    fastcgi_index   cgi-bin.php;
    fastcgi_param   SCRIPT_FILENAME    /etc/nginx/cgi-bin.php;
    fastcgi_param   SCRIPT_NAME        /cgi-bin/cgi-bin.php;
    fastcgi_param   X_SCRIPT_FILENAME  /usr/lib$fastcgi_script_name;
    fastcgi_param   X_SCRIPT_NAME      $fastcgi_script_name;
    fastcgi_param   REMOTE_USER        $remote_user;
}

Edit the fastcgi_pass to your own php-fpm server.

Create the /etc/nginx/cgi-bin.php file

<?php
$descriptorspec = array(
    0 => array("pipe", "r"),  // stdin is a pipe that the child will read from
    1 => array("pipe", "w"),  // stdout is a pipe that the child will write to
    2 => array("pipe", "w")   // stderr is a file to write to
);

$newenv = $_SERVER;
$newenv["SCRIPT_FILENAME"] = $_SERVER["X_SCRIPT_FILENAME"];
$newenv["SCRIPT_NAME"] = $_SERVER["X_SCRIPT_NAME"];

if (is_executable($_SERVER["X_SCRIPT_FILENAME"])) {
    $process = proc_open($_SERVER["X_SCRIPT_FILENAME"], $descriptorspec, $pipes, NULL, $newenv);
    if (is_resource($process)) {
        fclose($pipes[0]);
        $head = fgets($pipes[1]);
        while (strcmp($head, "\\n")) {
            header($head);
            $head = fgets($pipes[1]);
        }
        fpassthru($pipes[1]);
        fclose($pipes[1]);
        fclose($pipes[2]);
        $return_value = proc_close($process);
    } else {
        header("Status: 500 Internal Server Error");
        echo("Internal Server Error");
    }
} else {
    header("Status: 404 Page Not Found");
    echo("Page Not Found");
}
?>

Final vhost config :

server {
    listen 80;
    server_name awstats.master-domain.com;
    root    /var/www/awstats;

    error_log /var/log/nginx/awstats.master-domain.com.error.log;
    access_log off;
    log_not_found off;

    location ^~ /icon {
        alias /usr/share/awstats/icon/;
    }

        location ~ ^/cgi-bin/.*\\.(cgi|pl|py|rb) {
        gzip off;
        include         fastcgi_params;
        fastcgi_pass    unix:/var/run/php5-fpm.sock;
        fastcgi_index   cgi-bin.php;
        fastcgi_param   SCRIPT_FILENAME    /etc/nginx/cgi-bin.php;
        fastcgi_param   SCRIPT_NAME        /cgi-bin/cgi-bin.php;
        fastcgi_param   X_SCRIPT_FILENAME  /usr/lib$fastcgi_script_name;
        fastcgi_param   X_SCRIPT_NAME      $fastcgi_script_name;
        fastcgi_param   REMOTE_USER        $remote_user;
    }
}

Beautifying the url

You can now view multiple websites stats, from a single website : awstats.master-domain.com.

But awstats don’t use url rewriting for beautiful link, and you end up with long and ugly url like :

http://awstats.master-domain.com/cgi-bin/awstats.pl?config=alpha.com  
http://awstats.master-domain.com/cgi-bin/awstats.pl?config=beta.com

We could make them easier to share, by transforming them into :

http://awstats.master-domain.com/alpha.com  
http://awstats.master-domain.com/beta.com

In the awstats conf for your vhost, add :

location ~ ^/([a-z0-9-_\.]+)$ {
    return 301 $scheme://awstats.master-domain.com/cgi-bin/awstats.pl?config=$1;
}

Protecting the stats

Let’s now protect the stats. The idea is to have different credential for each awstats config. The login used to view alpha.com stats should not let the user browse beta.com stats.

Let’s edit the /cgi-bin/ location bloc in the vhost

location ~ ^/cgi-bin/.*\\.(cgi|pl|py|rb) {

    # Protect each config with a different credential
    if ($args ~ "config=([a-z0-9-_\.]+)") {
        set $domain $1;
    }

    auth_basic            "Admin";
    auth_basic_user_file  /etc/awstats/awstats.$domain.htpasswd;

    gzip off;
    include         fastcgi_params;
    fastcgi_pass    unix:/var/run/php5-fpm.sock;
    fastcgi_index   cgi-bin.php;
    fastcgi_param   SCRIPT_FILENAME    /etc/nginx/cgi-bin.php;
    fastcgi_param   SCRIPT_NAME        /cgi-bin/cgi-bin.php;
    fastcgi_param   X_SCRIPT_FILENAME  /usr/lib$fastcgi_script_name;
    fastcgi_param   X_SCRIPT_NAME      $fastcgi_script_name;
    fastcgi_param   REMOTE_USER        $remote_user;
}

This will protect each awstats config with it’s own credential, stored in /etc/awstats/awstats.domain.tld.htpasswd. Authentication is based on HTTP Basic Authentication.

For the examples alpha.com and beta.com websites, the login and password are stored in

  • /etc/awstats/awstats.alpha.com.htpasswd
  • /etc/awstats/awstats.beta.com.htpasswd

Each files contains the credential for the corresponding domain.
You can create these files with htpasswd (tools shipped with apache):

htpasswd -c /etc/awstats/awstats.alpha.com.htpasswd username

You’ll be prompt for the password next.

Final Nginx Awstats vHost

server {
    listen 80;
    server_name awstats.master-domain.com;
    root    /var/www/awstats;

    error_log /var/log/nginx/awstats.master-domain.com.error.log;
    access_log off;
    log_not_found off;

    location ^~ /icon {
        alias /usr/share/awstats/icon/;
    }

    location ~ ^/([a-z0-9-_\.]+)$ {
        return 301 $scheme://awstats.master-domain.com/<cgi-></cgi->bin/awstats.pl?config=$1;
    }

    location ~ ^/cgi-bin/.*\\.(cgi|pl|py|rb) {
        if ($args ~ "config=([a-z0-9-_\.]+)") {
            set $domain $1;
        }

        auth_basic            "Admin";
        auth_basic_user_file  /etc/awstats/awstats.$domain.htpasswd;

        gzip off;
        include         fastcgi_params;
        fastcgi_pass    unix:/var/run/php5-fpm.sock;
        fastcgi_index   cgi-bin.php;
        fastcgi_param   SCRIPT_FILENAME    /etc/nginx/cgi-bin.php;
        fastcgi_param   SCRIPT_NAME        /cgi-bin/cgi-bin.php;
        fastcgi_param   X_SCRIPT_FILENAME  /usr/lib$fastcgi_script_name;
        fastcgi_param   X_SCRIPT_NAME      $fastcgi_script_name;
        fastcgi_param   REMOTE_USER        $remote_user;
    }
}

And voila !

alpha.com webmaster can browse its stats via awstats.master-domain.com/alpha.com, and beta.com, via awstats.master-domain.com/beta.com. And they’re protected with their own credential, no peeking.