Oracle: Snapshot too old?

Berlin Wall
Berlin Wall
Okay, so you have received the Oracle error ORA-01555 Snapshot Too Old and have no clue how to go about resolving it? This post is made for you then. (The first time an application developer has written about this rather than a DBA.)

First, why does this occur? When you run a query, Oracle retains that data in a “snapshot”. The underlying tables in that query might continue to get changed, but you will see the data as it was when you executed the query. You can keep moving back and forth (rows) within the snapshot using the cursor. However, as you might expect: Oracle cannot hold that snapshot for ever. For how long it retains the snapshot is defined via the UNDO_RETENTION parameter.

So one way to solve this problem might be to increase the limit defined by this parameter. However, that is not always the best solution.

This problem normally occurs when a process opens a cursor (by running the query), and processes each row one by one. For example, let’s assume the process runs a query that returns 10000 rows. Processing each row takes, on average, 10 seconds. It goes on to the next row after processing the previous. Hence the total processing of all these rows will take around 28 hours. If your UNDO_RETENTION is defined as 10 hours, this process will fail on the snapshot too old error.

One of the best ways to solve this problem is to execute performance tuning on this process. This should be carried out specifically on the part of the processes that runs within the query in question, and should be targeted at reducing the time it takes to process one row. For example, if we can get our processing time down to 3 seconds, we will be done within about 8.5 hours, which is below our current setting for UNDO_RETENTION. In most cases, this can actually be done. (Read more here and here.)

A second way to solve the problem is to use a temporary table. For example, suppose you want to analyse all open purchase orders. From the table containing POs, pull the ones that are open, and put them into the temporary table. Since the temporary table is being used only by your process, Oracle will not have to hold the “snapshot” for you. Again the main driver query is the candidate for putting into temporary table. This will also make your process faster overall if it’s based on a not-so-small subset of a large table.

However, a third solution is also possible. For our problem we had a process that had to run for days and days, rather than doing something and finishing. So obviously, we got this error.

To solve the problem, we exited the loop after every n rows, and then reentered it. For example, if the pseudocode looked as below prior to the fix:


select something from somewhere;
while (rows) {
  do process
} 

We changed it as below:


hasAtleastOneRow = True;
while (hasAtleastOneRow) {
  hasAtleastOneRow = False;
  select something from somewhere where rownum<n;
  while (rows) {
    do process
    hasAtleastOneRow = True;
  }
} 

Note that the SELECT statement must have a mechanism to prevent picking up rows that have already been processed earlier. This could be a flag-condition or ‘check’ another table. For example:

select po_id from po_table where po_status='O' and rownum<100
and not exists(select 1 from po_temp where po_table.po_id = po_temp.po_id)

As part of the 'do process' then, we should insert into po_temp.

How do we select the value of 'n'? You will have to do some timing and hit-and-try here. Try to keep the highest value that is guaranteed to take lower processing time compared to the undo retention window.

Share

When NOT to normalise the database

database
When talking of Database Normalisation, textbooks often talk of BCNF, fifth and higher normal forms. However, in practice (in large software/ERPs) I have rarely noticed normalisation beyond Third Normal form. In fact, there is a certain degree of redundancy that is desirable.

While doing database design, I believe there are two critical aspects that should be kept in mind but I see ignored in a lot of common software.

The first is the time aspect of data. First – an example from finance. Consider a company having multicurrency invoicing. The tables can be designed as:

INVOICE: InvoiceID, ..., Currency, BaseCurrency, TransactionDate, ...
CONVERSIONS: FromCurrency, ToCurrency, EffectiveDate, RateMultiplier

This is a design having no redundancy. On the basis of the three fields in the INVOICE relation, we can always find out the latest row from the CONVERSIONS table having EffectiveDate less than TransactionDate. Hence we can determine the RateMultiplier.

Consider another design:

INVOICE: InvoiceID, …, Currency, BaseCurrency, TransactionDate, RateMultiplier, …
CONVERSIONS: FromCurrency, ToCurrency, EffectiveDate, RateMultiplier

Here, the system determines the value of the RateMultiplier at the time of invoice creation and records it permanently within the INVOICE table itself. To me this would be more mature design. Why? Because a lot of data in the INVOICE table would actually depend on the RateMultiplier: for example the VAT details. If on 1-JAN-2009 we know that the exchange rate is 1.1. However, on 3-JAN-2009 we come to know that the rate was incorrectly recorded. Someone changes the CONVERSIONS table to reflect the new exchange rate, of 1.2. All the details in the INVOICE table for the invoices created between 1-JAN and 3-JAN become inconsistent since the BaseCurrency is now inconsistent with the RateMultiplier.

Now consider an example from HR appraisal systems. A table stores what stage an appraisal process is at for a particular employee. This is then used to decide what access he has.

STAGE_CURRENT: EmpID, Stage

Note that this has no Date, or Year field. An employee is trying to see records for the previous year appraisals, yet is unable to see some of the data, because current appraisal process is still in initial stage.

The next problem is that of storage of “under calculation” fields. For example, consider the training department maintains the scores of each student trained. The test administered is of 100 marks, but has a weightage 40. Proposed design:

SCORES: CandidateID, TestID, Score, Flag

At the time of recording, the Flag is set to N. Thereafter a process runs that multiplies the score by 0.4 and sets the Flag to Y.

In my opinion a better design would be to retain both the scores even though the pre-weightage score is not relevant to the business process, because a process can also terminate in between due to erroneous data being supplied. Hence if the process ends after setting the flag to Y, and before changing the score; or in reverse order: after changing the score and before setting the flag then we end up with inconsistent data. Improved design:

Scores: CandidateID, TestID, Score, WeightedScore

At the time of recording, Score is entered and WeightedScore is set to zero. Thereafter a process runs that multiplies the Score by 0.4 and stores the value in WeightedScore.

The central idea is to retain all information permanently so that even if the process fails, we know what data existed.

Share

Generating sequential numbers in a database

You are creating an application that allows organisations to manage employees. One of the tasks that it has to do is generate an employee ID when a new employee is being entered. One way of doing this is through this query:

SELECT max(empl_id)+1 FROM employee;

By James Cridland
By James Cridland

However, this query presents a problem in a multiuser environment: if more than one user is entering employee details at the same time, they will both get the same empl_id. To tide over this problem, one way to go is to look at the auto numbering solution provided by the database – however I personally find that solution limiting and have never used it.

The other approach is to create a single row table for global settings (in all probability your application will already have this) and maintain a field in that table as the last number used. Thereafter, the code can be written as below:

UPDATE settings_tbl SET lastnumber=lastnumber+1;

SELECT lastnumber FROM settings_tbl;

Remember that the order of the queries is important in a multiuser environment. Placing SELECT before the UPDATE can cause problems (locking has to happen first).

This piece of code should be executed at the time of saving the employee and not when a request for the blank employee form is generated. This is necessary so that one user of the application doesn’t have to wait for another to be finished. Note that the UPDATE lock is released only when you do the COMMIT or ROLLBACK.

Another way of doing the same thing in Oracle, one that I prefer myself and have used in a number of tight situations is the FOR UPDATE clause. This one allows you to do the SELECT first:

SELECT lastnumber FROM settings_tbl FOR UPDATE;

UPDATE settings_tbl SET lastnumber=lastnumber+1;

Share

Using SQL potential

How to use the database SQL to its full potential. The idea is to reduce procedural coding and thereby improve performance, reduce defects.

I like to use the database to its full potential. For example, suppose someone has a list of vouchers and needs to find the vouchers that were paid later than the due date. One way to do this might be to read the vouchers one by one from the database, compare the due date with the payment date and determine the results. The other, recommended method will be to add the required criteria to the query itself so that only the exact result is obtained. With the second method, only 5% or 10% of the vouchers will need to be transferred from the database to the application while in the first method, all vouchers will need to be transferred.

In other words, the exact business requirements should determine the query. While you are at it, you should also keep in mind the indexes. Queries should always be written to minimise Disk I/O and transfers between the DB and the Application (server).

The database itself is quite powerful (esp Oracle) and I feel that its potential is always under-utilised. Let me show through an example.

I once had a requirement that there is a table having first, middle and last names of employees and the email ID. Something like this, ignoring the datatypes – assume all are VARCHAR2:

create table userlist(fname,mname,lname,emailid);

Each employee has middle name blank. Its possible that multiple employees have identical fname, lname with each other. For example, there can be two people having name ‘Hardeep Singh’. In this case, if the emailid of the two employees is same that means they are the same person having multiple rows, else they are different persons having the same name.

For example:

  1. Hardeep Singh alpha@gmail.com
  2. Hardeep Singh beta@gmail.com
  3. Hardeep Singh beta@gmail.com
  4. Satinder Singh gamma@gmail.com
  5. Satinder Singh gamma@gmail.com
  6. Gorakh Nath gn@gmail.com

In this case, 2 & 3 are the same person and 4 & 5 are also the same person. 1 & 2 are two different people.

Now the requirement is that we have to modify the middle name by adding a number such that every different person has a unique name. In the example above, the names should be:

  1. Hardeep Singh '1' alpha@gmail.com
  2. Hardeep Singh ‘ ′ beta@gmail.com
  3. Hardeep Singh ‘ ′ beta@gmail.com
  4. Satinder Singh ‘ ′ gamma@gmail.com
  5. Satinder Singh ‘ ′ gamma@gmail.com
  6. Gorakh Nath ‘ ′ gn@gmail.com

Now we know that ‘1’ is different from ‘2’ and ‘3’ because he has a different middle name.

The middle name to be added is given at the end of the name, in quotes. Gorakh Nath does not get any middle name since his name is unique. Any Tom, Dick or Harry would do this requirement in the following way: Read all the details one by one, look for people having the same name, then check the emailID then issue an UPDATE like this:

UPDATE userlist SET mname='1' where emailID='alpha@gmail.com';

Such UPDATES would need to be issued one for each person. However, this can be done through just a single UPDATE statement, without reading the list of employees at all. Here is the query:

update userlist a
set mname=(select x from (select rownum x,emailid,fname,
                                 lname
                          from userlist xa
                          where exists
                          (select 1
                           from userlist xb
                           where xa.lname=xb.lname and
                           xa.mname=xb.mname and
                           xa.fname=xb.fname and
                           xa.emailid<>xb.emailid))
                          ord
           where ord.emailid=a.emailid and
                 ord.fname=a.fname and
                 ord.lname=a.lname)
where exists(select 1
             from userlist b
             where a.lname=b.lname and
                   a.mname=b.mname and
                   a.fname=b.fname and
                   a.emailid<>b.emailid);
  

I guess an explanation is owed as to how it works. To my knowledge this query would work only in Oracle – but there would be ways to make it work in other Databases as well.

‘rownum’ returns the number of that particular row in the result set. The ‘exists’ clause at the end makes sure only people with same names are processed (‘gn@gmail.com’ is ignored). The part:

(select x from (select rownum x,emailid,fname,lname
from userlist xa
where exists
(select 1
from userlist xb
where xa.lname=xb.lname and
xa.mname=xb.mname and
xa.fname=xb.fname and
xa.emailid<>xb.emailid))
ord

creates a temporary view having the number, the email ID and the firstname. In the given scenario the result from this will be something like:

  1. 1, alpha@gmail.com, Hardeep, Singh
  2. This row will be absent because of the xa.emailid<>xb.emailid clause
  3. This row will be absent because of the xa.emailid<>xb.emailid clause
  4. This row will be absent because of the xa.emailid<>xb.emailid clause
  5. This row will be absent because of the xa.emailid<>xb.emailid clause
  6. This row wont even be considered, as I explained above

 

Had there been yet another ‘Hardeep Singh’ with a different email ID, he would have got a middle name of ‘2’.
Now the last step is to copy over the numbers based on the first and last names only – that part is pretty simple. Please post any questions in the comments area.

Solution using PGSQL:

update userlist a
set mname=(select rn from (
select row_number() over(order by(select null)) rn,emailid from(
select distinct p.emailid from userlist p) x) y
where y.emailid=a.emailid)
where exists(select 1
from userlist b
where a.fname=b.fname and
a.mname=b.mname and
a.lname=b.lname and
a.emailid<>b.emailid)

Share

Licensing and information about the blog available here.