154 pm: 2007-11

The fastest (and cheapest) transfer of data is that which doesn't have to be done - because the data is already there. Or still there - the caching of files is widely known and practised. Each web browser is caching the files he formerly requested from the web server. But if the file is to be loaded a second time, still a request is sent to the web server which then usually is answering with a "304 Not modified" (see also in RFC 2616) as can be seen in his access log:

215.82.12.78 - - [02/Nov/2007:10:20:33 +0100] "GET /images/logo.gif HTTP/1.1" 304 -

Receiving this, finally the browser is loading the file from his cache. Instead of transfering the file just a couple of bytes went through the cables. But if a page uses lots of cached files like gif images and javascript files, even those 304 requests sum up - especially on connections with high latency.

To get rid of this unnecessary load the Apache web server is providing the module mod_expires which enables you to stamp each delivered file with a kind of "valid until" mark. Therefore and according to the HTTP/1.1 specifications (subsection 14.21), one line like the following is added to the http response header:

Expires: Thu, 01 Mar 2007 09:30:00 GMT

This would cause the browser not to ask again for this document or file unless he looses it from his cache. Now lets have a closer look to the directives of mod_expires and the resulting possibilities. Here's an example:

ExpiresActive On
ExpiresDefault "access plus 5 minutes"
ExpiresByType image/gif "access plus 2 days"
ExpiresByType text/html "modification plus 5 minutes"

ExpiresActive is used to enable or disable (on/off) the modification of the http response header. ExpiresDefault gives a default value for all documents, which are not captured by a rule on their own. Here the expiration date will be five minutes since time of access. This ensures that the document isn't reloaded by the client every few seconds while on the other hand he'll get the newer version within reasonable time, should the document be changed on the server.

ExpiresByType is giving you the possibility to control the expiration based on mime types. In the example gif images won't be reloaded from the server for 24 hours from of the point of downloading - based on the assumption that the gif images aren't changing but rather would be replaced by new files with new names. If your site is using gif buttons with roll over effects this takes lots of requests off your apache. On the other hand: If the layout of the web page and the gif images are changed while the names of the images remain the same, this would lead to rather strange looking pages in browsers which firstly accessed the page less than 24 hours ago. So this directive, though it offers a lot of reduction of traffic, is also to be handled with care.

In the example html pages won't be reloaded if their content is younger than five minutes. Imagine a highly frequented front page of a web portal which is generated every five minutes from dynamic content but saved as static html file for reasons of performance. If this was managed by the "access" alternative, a client that just loaded the 4:59 minutes old page would miss the newer version for five minutes. Based on "modification" it's ensured that nobody would miss the latest news.

In short, use the access rule for content that doesn't change (or at least not frequently). If modifications are necessary, try to use a new name. Use the modification rule for content that often is modified or for what reasons ever has a short lifetime.

Last but not least, this way you don't only reduce the load on your apache but also speed up your web application on client side: images and javascripts that don't have to be asked for every time they are to be used maybe displayed just some fractions of a second faster. But in the end (and at least subjectively) it all sums up.

Sometimes you've got to work with POSIX time, which might be known more commonly as UNIX time (http://en.wikipedia.org/wiki/Unix_time). While some newer systems already use a 64bit value, the standard Unix time_t usually is a signed 32bit integer which represents the seconds elapsed since 1970-01-01UTC00:00:00.

Assuming in your DB2 database there's sample.table with a column containing t_time and named the same. A conversion from time_t to a DB2 timestamp could be done like this:

SELECT
  DISTINCT time_t,
  timestamp('1970-01-01-00.00.00') + (time_t) seconds
  as calculated_timestamp
FROM
  sample.table
WHERE
  time_t=946684800
;

The result would look like the following:

TIME_T      CALCULATED_TIMESTAMP
----------- --------------------------
  946684800 2000-01-01-00.00.00.000000

  1 record(s) selected.

This is also useful for changing the time value permanently from time_t to timestamp when working with DB2 on Unix log data. Daily logs containing time_t could be loaded into a temporary table in the first step. When adding the data to the permament table (working with a timestamp column) , a load is to be done from a cursor which has to be defined as SELECT using the above mentioned conversion.

154 pm

25.11.2007

Advanced caching with mod_expires

05.11.2007

DB2: Converting Unix time

Blog-Archiv

favorite sites

Über mich