Summary vs Detail Data
Installation/Database
Management Decisions: Storing "Summary Only" Data vs.
"Summary and Detail" Data
Hit List Summary Architecture Generally
(QuickList)
Hit List version 4.0 introduces the second generation
of our revolutionary QuickList» Database Architecture that
allows Hit List to produce quick, efficient reports regardless
of the size of the original log files. Compared to previous
versions of Hit List, this new database architecture allows
most reports to run between 10 and 1,000 times faster. Combined
with Hit ListÍs ability to group and aggregate data over any
time period, this makes it easy to run reports that show trends
over very long periods. This summarization process actually
becomes more efficient as your daily logs grow. In other words,
Hit List Summary data from a 100MB daily log file isn't much
larger than from a 1MB daily log.
When importing logs, Hit List summarizes the incoming information.
This information is stored in special summary tables in the
Hit List database and is normally referred to as Summary Data.
Hit List normally also stores detailed information in other
tables that represent one entry for each line in your log
files. Hit List uses detail data when the report element cannot
be calculated from the summary data.
Hit List determines whether to use summary or detail data
for each element in a report, not for the report as a whole.
Therefore, if most elements in a report can be computed from
summary data but one element require detail, the report will
still run very fast except for that one element.
QuickList Extreme
Hit List have additional optimizations designed specifically
for the very largest web sites in the world. See Hit List
Help's "QuickList Extreme Optimizations" topic for more information.
How Hit List Decides to use Detail Data
Hit List uses fast Summary Data except when:
1. Calculating crosstabs
2. Doing complex Path Analysis, as seen in elements
such as "Previous Pages", "Jumps within the Site", "Path Through
Site", etc.
3. Calculating Total or Average Time Viewed (except
for "Time Viewed per Visit" which does use Summary Data).
4. Filtering by Entry Source, Entry Pages, URLs within
the visit, Days of the Week, Hours, or Realm.
5. Filtering by Directories, URLs, Visitor IPs, Visitor
Site Names, User Names, URL Groups, HTTP Codes, Application
Arguments, Methods or Browsers unless the report element is
either based on the Totals standard report or is directly
related to the filter.
For example, Hit List can use Summary Data when showing
the "Total Number of Requests" from people using a Netscape
browser and can show "Most Popular Browsers" and filter
out spiders, etc. However, Detail Data must be used if you
filter by Browser but use the element "Most Popular Pages"
since those two arenÍt directly related. One very common
situation is that Hit List is forced to use Detail Data
if you filter by Visitor IP address to mask-out internal
requests. In this example, youÍre infinitely better off
using the Options/Updates tab to exclude internal IPs when
the database is created, and if you need to occasionally
monitor internal requests, you can create a short-term separate
Hit List database that does include internal requests.
Tip: Filtering by URL or Directory
usually forces Hit List to use detail data so you should take
advantage of the Virtual Server Manager to define ïsoftÍ virtual
servers and use the Web server name or IP filter instead.
In this way, Hit List can run reports for all virtual servers
very quickly.
6. Calculating Visits when any Object Types are unchecked
in the Filter tab except for the special-case of unchecking
Graphics when there are no graphics in the database anyway
because they were filtered-out during import.
7. When Grouping is set to Hourly unless the report
element is based on Totals.
8. When Grouping is set to IP Address, Visitor Site
Name, User Name, Proxy Destination Site Name, URL, Source
Site Name or Source URL
9. When no summary data is available because you set
the Store switch in the Updates tab to "Just Detail Data".
How You Should Decide whether to use
Summary
or Summary and Detail Data
Use the Options/Updates/"Store" drop-down
list to select which combination of data to store.
If hard disk space isnÍt an issue, storing both summary
and detail gives you the best combination of speed and flexibility.
As noted, Hit List will automatically determine, for each
element on the report, if it can use summary data or if it
needs to use detail data.
However, if your logs are above average in size (such as
20MB+ per day), you might be better off storing only summary
information. Providing youÍre not doing fancy filtering, sophisticated
reports such as Complete Analysis, the built-in Advertising
reports and even Query Parsing can be run entirely from summary
data. If you pick this strategy, youÍll find that the size
of your database grows relatively little as your daily logs
get larger. The fundamental unit of summarization is the day
so Hit List is actually more efficient storing 10 days of
100MB per day logs than storing 100 days of 10MB logs. If
you choose this basic strategy but find that you occasionally
need to run reports based on Detail Data, simply create a
new database containing Detail Data from your original logs
but use the fast, compact Summary database for routine reports.
The Database Manager/Tools tab
Additionally, you can use the Tools tab of the Database
Manager and the Event Scheduler to selectively delete either
summary or detail data. For example, take the case where you
need to run a daily report that, for whatever reason, needs
detail data but you also want to see long-term trends that
would normally entail too much information to store as detail.
Here you can set Hit List to store both detail and summary
data, run a daily report, then use the Event Scheduler to
delete detail data thatÍs more than one day old. Alternately,
you can store both detail and summary data but, when you use
the Database Manager or the Event Scheduler to cycle the database,
you can have Hit List copy the existing summary data to the
new database, effectively purging just the old detail data.
Either of these approaches will normally give you the flexibility
you may need for complex analysis and allow you to run the
most commonly-required long-term trend reports.
Summary Data and Virtual Servers
The summary data concept was designed specifically with virtual
servers in mind. Therefore, you can always use the "Run this
report for each virtual server" switch with summary data.
Important: If youÍre using a web server that doesnÍt record
virtual server information (Netscape and Apache, for example),
you can use the Virtual Server Manager to show Hit List how
to interpret your existing logs as virtual servers. This is
exceptionally beneficial because you can always filter by
Virtual Server Name or IP and still use summary data but filtering
by URL or Directory forces Hit List to fallback to detail
data.
Which Reports Use Detail Data?
Unless you modify them, all Hit List reports run from summary
data except: Cookie Analysis (most elements run from summary
data), Search Engine Visitor Quality, Marketing Report, Path
Analysis, Technical Analysis (most elements run from summary
data) and Visitor Sources. The easy way to tell is to highlight
a report icon, then look at the gray Status Bar across the
bottom of the Hit List main screen. You will see a short description
of the report's purpose in addition to seeing a note about
Detail data if it's required.
Summary vs. Detail Filtering
If you are trying to filter with Summary-only data, what you
have to do is use "like elements" for "like filters". For
example, if you want to filter for an IP of 123.123.123.123,
you can filter for this IP in elements like "most common visitors"
or "most popular visitor's Countries" because they are IP-based
elements (look at their Properties in Design mode, under the
Definition tab where it notes "standard report"). Conversely,
if you want to filter for a URL of /kevin.htm, you can use
elements like "most popular pages", "most popular URLs", etc.
What you can't do is to put the URL filter for /kevin.htm
into the IP-based element (ie "most common visitors") without
Detail data.
With Summary and Detail data, you can filter an entire
report (ie Complete Analysis) on URLs, Visitor IPs, Site Names,
referrers, etc. without limitation.
For more information on Hit List Filtering, check our Filtering document.
|