08.12.2007

Pagination with DB2

Modern web pages are widely using pagination, and the most common database in LAMP environment, MySQL, ist supporting this by the LIMIT offset, row_count parameter. With DB2 one has to use a little trickery to come to a similar result.

Imagine a list of blog entries, ordered by time, displaying the latest on top. This list is easily created by a VIEW:

Create View
blog.vw_entries_500latest
as
Select
ROW_NUMBER() OVER(order by date,time desc) as ROWNO,
id,date,time,title
From
blog.tb_entries_all
order by
date,time desc
Fetch
first 500 rows only;
Now lets load the data for the 11th page, where each page should list 25 blog entry titles:

Select
rowno,id,date,time,title
from
blog.tb_entries_500latest
where
rowno > 25 * (11 - 1)
fetch
first 25 rows only ;
This should do the trick.


Diese Seite zu Mister Wong hinzufügen

05.12.2007

1st business day in month

Quiet often events are logged in databases for the purpose of statistics or billing. Maybe there's the need to know which day of the month was the first business day. Here's a way to find out with DB2.

Imagine a table log.events, containing a column named when of type date. Basically you'll rely on the function DayOfWeek(), which returns a minimum of 1 in case the given date is sunday, and a maximum of 7 for saturday:

select
( 15 - ( DayOfWeek( when - (Day(when) - 1) days) ) ) / 7
+
( 2 * ( DayOfWeek( when - (Day(when) - 1) days) / 7 ) )
from
log.events ;

With some arithmetic trickery the following code returns

  • 3 (1st of month is saturday, 1st business day is monday 3rd)
  • 2 (1st of month is sunday, 1st business day is monday 2rd)
  • 1 (1st of month and 1st business day is within tuesday to friday)

Diese Seite zu Mister Wong hinzufügen

25.11.2007

Advanced caching with mod_expires

The fastest (and cheapest) transfer of data is that which doesn't have to be done - because the data is already there. Or still there - the caching of files is widely known and practised. Each web browser is caching the files he formerly requested from the web server. But if the file is to be loaded a second time, still a request is sent to the web server which then usually is answering with a "304 Not modified" (see also in RFC 2616) as can be seen in his access log:

215.82.12.78 - - [02/Nov/2007:10:20:33 +0100] "GET /images/logo.gif HTTP/1.1" 304 -

Receiving this, finally the browser is loading the file from his cache. Instead of transfering the file just a couple of bytes went through the cables. But if a page uses lots of cached files like gif images and javascript files, even those 304 requests sum up - especially on connections with high latency.


To get rid of this unnecessary load the Apache web server is providing the module mod_expires which enables you to stamp each delivered file with a kind of "valid until" mark. Therefore and according to the HTTP/1.1 specifications (subsection 14.21), one line like the following is added to the http response header:

Expires: Thu, 01 Mar 2007 09:30:00 GMT

This would cause the browser not to ask again for this document or file unless he looses it from his cache. Now lets have a closer look to the directives of mod_expires and the resulting possibilities. Here's an example:

ExpiresActive On
ExpiresDefault "access plus 5 minutes"
ExpiresByType image/gif "access plus 2 days"
ExpiresByType text/html "modification plus 5 minutes"

ExpiresActive is used to enable or disable (on/off) the modification of the http response header. ExpiresDefault gives a default value for all documents, which are not captured by a rule on their own. Here the expiration date will be five minutes since time of access. This ensures that the document isn't reloaded by the client every few seconds while on the other hand he'll get the newer version within reasonable time, should the document be changed on the server.

ExpiresByType is giving you the possibility to control the expiration based on mime types. In the example gif images won't be reloaded from the server for 24 hours from of the point of downloading - based on the assumption that the gif images aren't changing but rather would be replaced by new files with new names. If your site is using gif buttons with roll over effects this takes lots of requests off your apache. On the other hand: If the layout of the web page and the gif images are changed while the names of the images remain the same, this would lead to rather strange looking pages in browsers which firstly accessed the page less than 24 hours ago. So this directive, though it offers a lot of reduction of traffic, is also to be handled with care.

In the example html pages won't be reloaded if their content is younger than five minutes. Imagine a highly frequented front page of a web portal which is generated every five minutes from dynamic content but saved as static html file for reasons of performance. If this was managed by the "access" alternative, a client that just loaded the 4:59 minutes old page would miss the newer version for five minutes. Based on "modification" it's ensured that nobody would miss the latest news.

In short, use the access rule for content that doesn't change (or at least not frequently). If modifications are necessary, try to use a new name. Use the modification rule for content that often is modified or for what reasons ever has a short lifetime.

Last but not least, this way you don't only reduce the load on your apache but also speed up your web application on client side: images and javascripts that don't have to be asked for every time they are to be used maybe displayed just some fractions of a second faster. But in the end (and at least subjectively) it all sums up.

Diese Seite zu Mister Wong hinzufügen

05.11.2007

DB2: Converting Unix time

Sometimes you've got to work with POSIX time, which might be known more commonly as UNIX time (http://en.wikipedia.org/wiki/Unix_time). While some newer systems already use a 64bit value, the standard Unix time_t usually is a signed 32bit integer which represents the seconds elapsed since 1970-01-01UTC00:00:00.

Assuming in your DB2 database there's sample.table with a column containing t_time and named the same. A conversion from time_t to a DB2 timestamp could be done like this:
SELECT
DISTINCT time_t,
timestamp('1970-01-01-00.00.00') + (time_t) seconds
as calculated_timestamp
FROM
sample.table
WHERE
time_t=946684800
;
The result would look like the following:

TIME_T      CALCULATED_TIMESTAMP
----------- --------------------------
946684800 2000-01-01-00.00.00.000000

1 record(s) selected.

This is also useful for changing the time value permanently from time_t to timestamp when working with DB2 on Unix log data. Daily logs containing time_t could be loaded into a temporary table in the first step. When adding the data to the permament table (working with a timestamp column) , a load is to be done from a cursor which has to be defined as SELECT using the above mentioned conversion.
Diese Seite zu Mister Wong hinzufügen

25.10.2007

DB2: Automatic line numbering

When generating reports with lots of lines, sometimes it's nice to have them numbered automatically. For this DB2 offers the ROW_NUMBER() OVER() functionality. The first part numbers the rows in the order they are passed from the retrieving process. Therefore, if you're using the ORDER BY clause for displaying the resultset, you may get line numbers but in no special order. You'll need to pass the ordering directive to the second part OVER().

Here's an example:

Select
ROW_NUMBER() OVER(order by NMBR asc) as ROW,
NMBR
from
lucky.numbers
order by
NMBR asc

ROW NMBR
-------------------- ----------
1 123
2 456
3 2345
4 12345

4 record(s) selected.

Diese Seite zu Mister Wong hinzufügen

21.10.2007

Case insensitive indices

In my experience DB2 is one of the most correctly working databases. Where others are, lets say, a bit more flexible, DB2 follows the given path down to the last bit. An example we just fell over: Case sensitivity.

These days we installed a new product which usually works with a database system made in Redmond. We wanted it to work with DB2 and were told that - in theory - there should be no big problem. Praxis showed that there's always Murphys law awaiting you.

At one point the software does a query with condition like „WHERE upper(NAME)='ABCD' “. While in the beginning there was no problem, we found that several data imports later this little phrase slowed down the whole system. The cause: Although there was a (case sensitive) index on the NAME column, DB2 had to do an upper() on every single row.

While DB2 v9 for z/OS allows you to build an index using the upper() function, DB2 v8 doesn't give you this possibility (yet). But there's a workaround. Here's how we got around.

First, We added a generated column, containing the upper case text:

-- alter table
SET INTEGRITY FOR A001.FOOBAR OFF ;

alter table A001.FOOBAR
add column NAME_UP
GENERATED ALWAYS AS ( UPPER(NAME) ) ;

-- enable integrity again and fill column; may take a while
SET INTEGRITY FOR A001.FOOBAR
IMMEDIATE CHECKED FORCE GENERATED ;

Then, we added an index on the newly created column:

-- drop old index; create an index using new upper_column
DROP INDEX A001.FOOBAR_IDX1;
CREATE INDEX A001.FOOBAR_IDX1 ON FOOBAR ( NAME_UP, XID, YID );

After that, the internal optimizer recognized that it could use the GENERATED column and its index, and the import jobs returned to the high velocity they had shown in their very first moments: Instead of hours or even days, everything was done within a couple of minutes.
Diese Seite zu Mister Wong hinzufügen

15.10.2007

Long running transactions

Frequently there are jobs to do on the database which affect quiet a couple of rows, like deleting yesterdays log events from the log table, turning into a long running transaction. While being executed, this may lead to certain lock events, blocking other processes. So what to do about those long runners?

Rule #1 above all others: Avoid long runners. This may be done by turning them into a couple of short runners. Based on the above example, a delete of yesterdays events could be broken down to 24 steps, starting with a
"DELETE ... WHERE event_date=(current_date-1 day) AND event_time>='23:00:00' "
and ending with
"... even_time>='00:00:00' ".

If there's no possibility to go that way, the execution may gain speed by one or more of the following steps:
  1. Try to do this in off peak hours to have more resources for your long runner.
  2. Increase the size of the buffer pool(s) and the NUM_IOCLEANERS if necessary/possible.
  3. Log files should not be placed on the same physical file system as the database itself.
  4. Sometimes increasing the parameter LOGBUFSZ has a possitive effect. (Attention: LOGBUFSZ is allocated within DBHEAP so you have to change this value, too.)
For point 2-4 you'll need DB2 admin rights on the database. Besides, moving the log files to a different disk often improves all DB2 write actions to the specific database, and so does adding memory to the buffer pools. But when adding memory to those, try to add it step by step and monitor the gain of performance. Usually you'll get to a breaking point somewhere when adding memory doesn't earn you much performance any more. You may also use the high water marks and the log_space_used value when monitoring the usage of memory.

15.08.2007

DB2: Internal functions won't run

They say that time and space are relative. If knowing Einstein and having a voice, my computer would confirm this, for that at times its disc space is relatively short.

DB2 sometimes has a rather complex relationship to time itself, too. Once we changed the system time on a Linux based DB2 test server, and subsequently forgot about this. Next day, a developer called us and told us that some of his sql statements failed where they still had been working several days ago.

Our investigations showed that when containing an internal function, like in "db2 values ucase('hello') ", DB2 gave us the following error message:

SQL0440N No authorized routine named "UCASE" of type "FUNCTION" having compatible arguments was found. SQLSTATE=42884

This error appeared on all existing databases in that instance. After creating a new database, we found that only the old ones were affected.

We opened a PMR with IBM and the support told us that sometimes this may happen when changing the system time. (Maybe we moved it
too far?) There's a DB2 tool called 'fixfunc' available at IBM support. We tried it and it succeeded - all databases were usable afterwards.



31.05.2007

DB2 Express-C v9 for Power architecture

Since v8, the freely distributable version of IBMs database software, DB2 Express-C, is available for Windows and Linux, the latter supporting both intel and power architecture. Several days ago, I was looking for v9 to download, but couldn't find it. All the download list offered was v9 for Windows and Linux on x86 and ia64 systems.

After writing an email to db2x@ca.ibm.com, Ryan Chase and Ian Hakes of the db2 express-c community team reacted and added the v9 ppc64 (power) version to the list again. It's available in the Express-C download area. You 'll need to login before selecting the download.