Home
 
contact us | search
products & services | download | support | order | partners | about issel

Summary vs Detail Data

Installation/Database Management Decisions: Storing "Summary Only" Data vs. "Summary and Detail" Data

Hit List Summary Architecture Generally (QuickList)
Hit List version 4.0 introduces the second generation of our revolutionary QuickList» Database Architecture that allows Hit List to produce quick, efficient reports regardless of the size of the original log files. Compared to previous versions of Hit List, this new database architecture allows most reports to run between 10 and 1,000 times faster. Combined with Hit ListÍs ability to group and aggregate data over any time period, this makes it easy to run reports that show trends over very long periods. This summarization process actually becomes more efficient as your daily logs grow. In other words, Hit List Summary data from a 100MB daily log file isn't much larger than from a 1MB daily log.


When importing logs, Hit List summarizes the incoming information. This information is stored in special summary tables in the Hit List database and is normally referred to as Summary Data. Hit List normally also stores detailed information in other tables that represent one entry for each line in your log files. Hit List uses detail data when the report element cannot be calculated from the summary data.


Hit List determines whether to use summary or detail data for each element in a report, not for the report as a whole. Therefore, if most elements in a report can be computed from summary data but one element require detail, the report will still run very fast except for that one element.


QuickList Extreme
Hit List have additional optimizations designed specifically for the very largest web sites in the world. See Hit List Help's "QuickList Extreme Optimizations" topic for more information.

How Hit List Decides to use Detail Data

Hit List uses fast Summary Data except when:

1. Calculating crosstabs
2. Doing complex Path Analysis, as seen in elements such as "Previous Pages", "Jumps within the Site", "Path Through Site", etc.
3. Calculating Total or Average Time Viewed (except for "Time Viewed per Visit" which does use Summary Data).
4. Filtering by Entry Source, Entry Pages, URLs within the visit, Days of the Week, Hours, or Realm.
5. Filtering by Directories, URLs, Visitor IPs, Visitor Site Names, User Names, URL Groups, HTTP Codes, Application Arguments, Methods or Browsers unless the report element is either based on the Totals standard report or is directly related to the filter.

For example, Hit List can use Summary Data when showing the "Total Number of Requests" from people using a Netscape browser and can show "Most Popular Browsers" and filter out spiders, etc. However, Detail Data must be used if you filter by Browser but use the element "Most Popular Pages" since those two arenÍt directly related. One very common situation is that Hit List is forced to use Detail Data if you filter by Visitor IP address to mask-out internal requests. In this example, youÍre infinitely better off using the Options/Updates tab to exclude internal IPs when the database is created, and if you need to occasionally monitor internal requests, you can create a short-term separate Hit List database that does include internal requests.

Tip: Filtering by URL or Directory usually forces Hit List to use detail data so you should take advantage of the Virtual Server Manager to define ïsoftÍ virtual servers and use the Web server name or IP filter instead. In this way, Hit List can run reports for all virtual servers very quickly.

6. Calculating Visits when any Object Types are unchecked in the Filter tab except for the special-case of unchecking Graphics when there are no graphics in the database anyway because they were filtered-out during import.
7. When Grouping is set to Hourly unless the report element is based on Totals.
8. When Grouping is set to IP Address, Visitor Site Name, User Name, Proxy Destination Site Name, URL, Source Site Name or Source URL
9. When no summary data is available because you set the Store switch in the Updates tab to "Just Detail Data".

 

How You Should Decide whether to use Summary
or Summary and Detail Data

Use the Options/Updates/"Store" drop-down list to select which combination of data to store.
If hard disk space isnÍt an issue, storing both summary and detail gives you the best combination of speed and flexibility. As noted, Hit List will automatically determine, for each element on the report, if it can use summary data or if it needs to use detail data.

However, if your logs are above average in size (such as 20MB+ per day), you might be better off storing only summary information. Providing youÍre not doing fancy filtering, sophisticated reports such as Complete Analysis, the built-in Advertising reports and even Query Parsing can be run entirely from summary data. If you pick this strategy, youÍll find that the size of your database grows relatively little as your daily logs get larger. The fundamental unit of summarization is the day so Hit List is actually more efficient storing 10 days of 100MB per day logs than storing 100 days of 10MB logs. If you choose this basic strategy but find that you occasionally need to run reports based on Detail Data, simply create a new database containing Detail Data from your original logs but use the fast, compact Summary database for routine reports.

 

The Database Manager/Tools tab
Additionally, you can use the Tools tab of the Database Manager and the Event Scheduler to selectively delete either summary or detail data. For example, take the case where you need to run a daily report that, for whatever reason, needs detail data but you also want to see long-term trends that would normally entail too much information to store as detail. Here you can set Hit List to store both detail and summary data, run a daily report, then use the Event Scheduler to delete detail data thatÍs more than one day old. Alternately, you can store both detail and summary data but, when you use the Database Manager or the Event Scheduler to cycle the database, you can have Hit List copy the existing summary data to the new database, effectively purging just the old detail data. Either of these approaches will normally give you the flexibility you may need for complex analysis and allow you to run the most commonly-required long-term trend reports.

Summary Data and Virtual Servers
The summary data concept was designed specifically with virtual servers in mind. Therefore, you can always use the "Run this report for each virtual server" switch with summary data.

Important: If youÍre using a web server that doesnÍt record virtual server information (Netscape and Apache, for example), you can use the Virtual Server Manager to show Hit List how to interpret your existing logs as virtual servers. This is exceptionally beneficial because you can always filter by Virtual Server Name or IP and still use summary data but filtering by URL or Directory forces Hit List to fallback to detail data.

Which Reports Use Detail Data?
Unless you modify them, all Hit List reports run from summary data except: Cookie Analysis (most elements run from summary data), Search Engine Visitor Quality, Marketing Report, Path Analysis, Technical Analysis (most elements run from summary data) and Visitor Sources. The easy way to tell is to highlight a report icon, then look at the gray Status Bar across the bottom of the Hit List main screen. You will see a short description of the report's purpose in addition to seeing a note about Detail data if it's required.

Summary vs. Detail Filtering
If you are trying to filter with Summary-only data, what you have to do is use "like elements" for "like filters". For example, if you want to filter for an IP of 123.123.123.123, you can filter for this IP in elements like "most common visitors" or "most popular visitor's Countries" because they are IP-based elements (look at their Properties in Design mode, under the Definition tab where it notes "standard report"). Conversely, if you want to filter for a URL of /kevin.htm, you can use elements like "most popular pages", "most popular URLs", etc. What you can't do is to put the URL filter for /kevin.htm into the IP-based element (ie "most common visitors") without Detail data.

With Summary and Detail data, you can filter an entire report (ie Complete Analysis) on URLs, Visitor IPs, Site Names, referrers, etc. without limitation.

For more information on Hit List Filtering, check our Filtering document.

 


p +44-(0)870-166-2435, f +44-(0)870-054-8795, e info@issel.co.uk
© 1996-2004 Intranet Software Solutions (Europe) Limited. All rights reserved.