Home
 
contact us | search
products & services | download | support | order | partners | about issel

ISSEL Log Companion

Overview

Features

Benefits

Architecture

Requirements

Optional Modules

Datasheets

White Papers

Customers

Download Enterprise

Download Pro

Home > Products & Services > ISSEL Log Companion

ISSEL Log Companion Whitepaper

by Björn Svensson, Technical Director, ISSEL

The reason for pre-processing log files is either to simplify processing or to achieve something that cannot be handled by the Web Traffic Analysis software which is to analyze the data. This can include:

  • ABCE compliance
  • Load Balanced servers
  • Log Cleansing
  • Adding and transforming information
  • Transforming log files
  • Automation
  • Bug fixing
  • Customisation

Screen Shots

Log Companion Enterprise
Log Companion Enterprise with open Config
Log Companion Pro (PDF)

ABCE Compliance
ABC Electronics audits web site traffic. This is particularly important for advertisers and investors so that traffic to different sites can be correctly compared. To achieve this ABCE applies standards for what can be counted and how it is counted. More information about these rules and web site auditing can be found at ABC Electronic's web site: http://www.abce.org.uk.

Some of the rules that ABCE requires are not handled by most common Web Traffic analysis tools. For example, ABCE does not allow traffic generated by robots indexing the site to be counted for auditing purposes. To enable fair comparison between different sites ABCE keeps a regularly updated list of robots which must be excluded. The list is available from the ABCE site and comprises around 300 different robot User-Agents. Most analysis tools cannot handle this volume of exclusions whereas the Log Companion can handle it with ease. Indeed, the exclusion list can be downloaded from the ABCE site and used directly with the Log Companion.

ABCE also enforces other special requirements which the Log Companion can handle:

  • exclude requests where the User-Agent field is empty
  • exclude all requests which are not GET or POST
  • exclude requests with error and re-direction Status codes
  • exclude all requests which are not directly requested by the user

The last entry is part of the core of ABCE compliance. ABCE counts Page Impressions which are defined as a unique page presented to the viewer. For each click the user makes on the site, one and only one Page Impression may be counted. In this case it does not matter how many files were used to create the page. Furthermore, frame sets, style sheets, java scripts etc. which are merely part of the construction of the page must not be counted. Once again this creates a long list of exclusions which the Log Companion is ideally suited to handle. ABCE does not allow counting of general Hits since this is completely dependant on the proportion of graphics on the site.

Load Balanced Servers
Most Web Traffic Analysis tools analyse log files strictly one-by-one, creating the statistics in a linear fashion. This can cause particular problems when handling Load Balanced web servers. Load balanced servers are used to increase the performance of web sites by spreading the load of the traffic across several machines, each typically having its own copy of the web site. This set up is completely transparent to the visitor to the site who does not notice that each page he views may come from a different physical machine.

However each of the machines also creates a separate set of log files which must all be analysed together to generate correct statistics. This is especially important for Visit calculations where many requests can be part of the same Visit. In this scenario the requests are spread out between different servers and it is not until you look at all the log files together that you get the complete picture.

Since most Web Traffic analysis tools read the log files one-by-one they cannot track Visits that span several different log files. To solve this problem the Log Companion can merge the separate log files, one for each load balanced server, into one file where every request is organised in order of time stamps. This file can then be processed by the analysis tool which receives the log data from all the servers in one file to allow it to correctly calculate the Visits spanning all the servers.

Note: This function is at the moment only available on request from ISSEL but will be rolled into a future release of the Log Companion.

Log File Management

Popular web sites typically produce huge amounts of log files. Multiple web sites obviously compounds the problem. The Log Companion simplifies the management of huge log file volumes by:

  • Quickly processing vast log files. Log files size is typically reduced 5 - 10 times when unnecessary data is removed which typically speeds up the loading of those logs into your analysis tool correspondingly.
  • Processed log files can be archived and compressed which typically reduces log file size 10 - 20 times, thereby saving valuable disk space.
  • Processed log files can be output to any location and also renamed thereby simplifying any further handling by your analysis tool.

Log Cleansing
Surprising though it may seem, in real life log files often have a lot of "rubbish" stored in them. This is data or characters that have no relation to log file data whatsoever and can sometimes cause un-expected problems for the analysis tools, not to mention odd entries in the reports produced. The Log Companion can be used to clean up log files and has a number of built-in features:

  • option to exclude Control characters
  • checks that each request has all the fields as defined in the log file  header or by the log file format
  • checks that each request has basic characteristics which must  always be present

The Log Companion has the option to exclude any request line which has an embedded Control character. This is an ASCII character which cannot be represented by the normal alphabet.

The Log Companion also checks that each line has all the entries which are supposed to be there. For example, for IIS files it checks that each line has all the fields defined in the header. For other formats it checks the fields based on which log format has been selected.

In addition the Log Companion checks for some basic characteristics for each line. For example, the URL file path must always begin with a "/" character. For NCSA files it requires that there is always a field for the protocol version "http/1.x" etc.

Adding and Transforming Information
The Log Companion can add some information to the log files to make analysis easier:

  • ABCE Users
  • web server name

In addition to adding information the Log Companion can also transform some of the information in the log files:

  • port number
  • cookies
  • query strings
  • source fields

ABCE defines a User as every unique combination of Visitor IP-Address and User-Agent. Whilst some analysis tools can create this information out of the data in the log files sometimes this prevents full analysis of the data. To simplify this process the Log Companion can insert this combined value into the User field in the log file, which is not normally used.

The Log Companion can also add proper Web Server names to the log files. This is important for four main reasons:

  1. Many log files do not include any information about which Web Server the data actually belongs to, not even in the log file name. This is typically the case with Apache and NCSA log files.
  2. Some log files have include information but in a format which is not very useful to present in Web Traffic reports, for example the Web Server IP addresses. This is typical for IIS4 log files.
  3. The correct Web Server name may be present but not in a field recognised by the Web Traffic Analysis tool.
  4. Some Web Traffic Analysis tools can produce automated reports based on the Virtual Server information but this only makes sense if the proper Web Server names are available from the log files.

The Log Companion can either add any user defined text string as the Web Server name or in the case of IIS5 log files, copy the Host Header field to the Server IP Field. Normally the server information is written to an existing field in the log file but in the case of IIS4 and IIS5 the field will also be added if it is missing.

The Log Companion can also change the Port Number in the log files. This is especially useful if you have several Virtual Servers which use different ports but are tobe analysed as one server. For example you may have one Virtual Server using the standard port number 80 and then a secure server for the same web site using port 443. In this case some analysis tools treat them as separate servers. The Log Companion can make them appear as the same server for the analysis tool by changing the Virtual Servers to use the same port, for example port 80 in this case.

The Log Companion has two special features for handling cookies in an effective way:

  • Extract the persistent cookie from the cookie field.
  • Convert cookies to parameters.

The first feature can extract a particular persistant cookie value from the cookie field. This is the value of the cookie that represents a unique user coming to the site. After the transformation only the persistant value will be left in the cookie field and the rest of the cookie string discarded. This make it easier for analysis tools to use the cookie value for Visits calculations.

Converting cookies to parameters is useful to optimise the reporting functions of the analysis tool as many have better functions for reporting on parameters then on cookies. This is also handy if the previous function is used (persistant cookies) since it can be used to save selected cookies as parameters which would other wise have been discarded.

The Log Companion can modify the query strings in the log files. This option allows you to specify which query parameters you want to keep with the Log Companion discarding all other parameters. The function is available both for the query parameters requested and also separately for the source query parameters.
Discarding parameters can have major benefits for simplifying reporting and also saving space in the reporting system databases. On web sites with dynamic content where the pages are defined by he query string parameters this is particulary useful.

Automation
The Log Companion produces Batch files which can be scheduled to run at specific times using the standard Windows NT scheduling functions, for example by using the AT command.

The Log Companion can be used to control Marketwave HitList so that HitList executes whenever the log file pre-processing is completed or to off-load Marketwave HitList by executing Marketwave HitList directly after every log file is processed and thereby assuring that only one log file is processed by Marketwave HitList at a time. When many large log files are to be loaded into Marketwave HitList at once this prevents Marketwave HitList running out of system resources on smaller machines.

Customisation
Web Sites can be designed in a multitude of unique ways and by using Web Server programming functions it is also possible to programme what gets written to the log files to some extent. This can result in log files being produced which do not follow the established standards so causing problems when it is time to report on the traffic to the Web Site.
The Log Companion can be infinitely customised to handle special requirements so that the log files are processed in the best way for Web Traffic analysis. Here are just a few examples of what can be done:

  • Re-format the log file to fit in with standards so that the Analysis  tool can properly understand the file
  • Move information between different fields to where they are best  processed by the Analysis tool
  • Add missing information based on defined rules
  • Remove information or whole request lines which do not comply  with defined rules, like the ABCE rules
  • Transform fields to better suit the Analysis tools

Anything that can be described programmatically can be achieved. Here are some examples of customisations completed:

  • Copy custom user registration information from URL parameters  to Cookies
  • Remove unique User ID information from the URL and put it in the  cookie field. This also makes the URL file paths less unique and  therefore easier to analyse
  • Transform log files from proprietary formats to standardised    formats
Top
Next - Customers


"Some of the rules that ABCE requires are not handled by most common Web Traffic analysis tools."

"ABCE defines a User as every unique combination of Visitor IP-Address and User-Agent. Some analysis tools can create this information out of the data in the log files but sometimes this prevents full analysis of the data."

Björn Svensson
Technical Director, ISSEL

ABCE Audit ActionPack from ISSEL

p +44-(0)870-166-2435, f +44-(0)870-054-8795, e info@issel.co.uk
© 1996-2004 Intranet Software Solutions (Europe) Limited. All rights reserved.