Oracle deadlocks: the what and the how

Everyone knows what a deadlock is: a situation in which two or more competing processes are waiting for the other to finish, and thus neither ever does. The purpose of this post is to help people understanding the deadlock a little better with a view to enable them to fix the problem when they find one.

Assume that there are two processes running, A & B and that they require a (shared) file and a printer to do their work. Process A locks up the printer, and Process B locks up the file for its own use. Now, none of the processes can complete because they do not have all the resources needed for their completion, and neither will they release the resource they have: they will keep on waiting for the second resource.

Let us create a deadlock now, using Oracle database and SQL Plus client.

We opened two sessions, and executed “set autocommit off” as the first statement.

Now in the first session we executed:

UPDATE ps_voucher SET grp_ap_id='A' WHERE voucher_id='00692096' AND invoice_dt='2-JAN-2002';

second session:

UPDATE ps_voucher SET grp_ap_id='A' WHERE voucher_id='00692096' AND invoice_dt='13-MAR-2007';

back to the first:

UPDATE ps_voucher SET address_seq_num=2 WHERE voucher_id='00692096';

and then the second:

UPDATE ps_voucher SET address_seq_num=2 WHERE voucher_id='00692096'

BAM! Deadlock. See screenshots:

Deadlock - Session I
Deadlock - Session I
Deadlock - Session II
Deadlock - Session II

What went wrong? There existed two vouchers in the system, with the same VOUCHER_ID but with different INVOICE_DTs (invoice dates). Each process first locked up one of those vouchers, and then – as the second UPDATE – tried to update both. (On the database side, a process gets a lock on a specific row when it UPDATEs that row, and the lock is released when the process COMMITs or ROLLBACKs.)

Yes, the programmer could have been smarter and written better code: if he had put the INVOICE_DT clause in the second statement also we would have been fine. However, in practice, with huge systems having tons of code – programmer will sometimes make mistakes. Even if they do not, deadlocks will occur: not all deadlocks are caused by SQL issues.

From a system design perspective, what can be done to prevent deadlocks? One way is for the execution of each process to have a unique ID – let’s call it process instance (PI). So if a process ABC is run once, it will have a PI of 1222 and when it’s run next it will have a PI of 1224. If, after this process PQR is run, it will have a PI of 1223. Before changing any transactions, the process can update it own PI on the transactions that qualify:

UPDATE ps_voucher
SET pi=1223
WHERE <process specific selection criteria>
AND pi=0;

COMMIT;

The commit here is important – only then will other processes be able to see the ‘locking’.

Thereafter the normal processing SQLs can be changed as below:

UPDATE ps_voucher
SET grp_ap_id='1'
WHERE <process specific criteria>
AND pi=1223;

At the end, set the transactions back to ‘open for processing’ by setting PI to zero:

UPDATE ps_voucher
SET pi=0
WHERE pi=1223;

If there are other ways to achieve this, please let me know by posting comments.

The DBA is usually able to specify the SQL queries involved in a deadlock. Many times one process is UPDATing the rows that the other is DELETing.

Share

Using SQL potential

How to use the database SQL to its full potential. The idea is to reduce procedural coding and thereby improve performance, reduce defects.

I like to use the database to its full potential. For example, suppose someone has a list of vouchers and needs to find the vouchers that were paid later than the due date. One way to do this might be to read the vouchers one by one from the database, compare the due date with the payment date and determine the results. The other, recommended method will be to add the required criteria to the query itself so that only the exact result is obtained. With the second method, only 5% or 10% of the vouchers will need to be transferred from the database to the application while in the first method, all vouchers will need to be transferred.

In other words, the exact business requirements should determine the query. While you are at it, you should also keep in mind the indexes. Queries should always be written to minimise Disk I/O and transfers between the DB and the Application (server).

The database itself is quite powerful (esp Oracle) and I feel that its potential is always under-utilised. Let me show through an example.

I once had a requirement that there is a table having first, middle and last names of employees and the email ID. Something like this, ignoring the datatypes – assume all are VARCHAR2:

create table userlist(fname,mname,lname,emailid);

Each employee has middle name blank. Its possible that multiple employees have identical fname, lname with each other. For example, there can be two people having name ‘Hardeep Singh’. In this case, if the emailid of the two employees is same that means they are the same person having multiple rows, else they are different persons having the same name.

For example:

  1. Hardeep Singh alpha@gmail.com
  2. Hardeep Singh beta@gmail.com
  3. Hardeep Singh beta@gmail.com
  4. Satinder Singh gamma@gmail.com
  5. Satinder Singh gamma@gmail.com
  6. Gorakh Nath gn@gmail.com

In this case, 2 & 3 are the same person and 4 & 5 are also the same person. 1 & 2 are two different people.

Now the requirement is that we have to modify the middle name by adding a number such that every different person has a unique name. In the example above, the names should be:

  1. Hardeep Singh '1' alpha@gmail.com
  2. Hardeep Singh ‘ ′ beta@gmail.com
  3. Hardeep Singh ‘ ′ beta@gmail.com
  4. Satinder Singh ‘ ′ gamma@gmail.com
  5. Satinder Singh ‘ ′ gamma@gmail.com
  6. Gorakh Nath ‘ ′ gn@gmail.com

Now we know that ‘1’ is different from ‘2’ and ‘3’ because he has a different middle name.

The middle name to be added is given at the end of the name, in quotes. Gorakh Nath does not get any middle name since his name is unique. Any Tom, Dick or Harry would do this requirement in the following way: Read all the details one by one, look for people having the same name, then check the emailID then issue an UPDATE like this:

UPDATE userlist SET mname='1' where emailID='alpha@gmail.com';

Such UPDATES would need to be issued one for each person. However, this can be done through just a single UPDATE statement, without reading the list of employees at all. Here is the query:

update userlist a
set mname=(select x from (select rownum x,emailid,fname,
                                 lname
                          from userlist xa
                          where exists
                          (select 1
                           from userlist xb
                           where xa.lname=xb.lname and
                           xa.mname=xb.mname and
                           xa.fname=xb.fname and
                           xa.emailid<>xb.emailid))
                          ord
           where ord.emailid=a.emailid and
                 ord.fname=a.fname and
                 ord.lname=a.lname)
where exists(select 1
             from userlist b
             where a.lname=b.lname and
                   a.mname=b.mname and
                   a.fname=b.fname and
                   a.emailid<>b.emailid);
  
                
I guess an explanation is owed as to how it works. To my knowledge this query would work only in Oracle – but there would be ways to make it work in other Databases as well.

‘rownum’ returns the number of that particular row in the result set. The ‘exists’ clause at the end makes sure only people with same names are processed (‘gn@gmail.com’ is ignored). The part:

(select x from (select rownum x,emailid,fname,lname
from userlist xa
where exists
(select 1
from userlist xb
where xa.lname=xb.lname and
xa.mname=xb.mname and
xa.fname=xb.fname and
xa.emailid<>xb.emailid))
ord

                                             
creates a temporary view having the number, the email ID and the firstname. In the given scenario the result from this will be something like:

  1. 1, alpha@gmail.com, Hardeep, Singh
  2. This row will be absent because of the xa.emailid<>xb.emailid clause
  3. This row will be absent because of the xa.emailid<>xb.emailid clause
  4. This row will be absent because of the xa.emailid<>xb.emailid clause
  5. This row will be absent because of the xa.emailid<>xb.emailid clause
  6. This row wont even be considered, as I explained above

 

Had there been yet another ‘Hardeep Singh’ with a different email ID, he would have got a middle name of ‘2’.
Now the last step is to copy over the numbers based on the first and last names only – that part is pretty simple. Please post any questions in the comments area.

Share