Oracle deadlocks: the what and the how

Everyone knows what a deadlock is: a situation in which two or more competing processes are waiting for the other to finish, and thus neither ever does. The purpose of this post is to help people understanding the deadlock a little better with a view to enable them to fix the problem when they find one.

Assume that there are two processes running, A & B and that they require a (shared) file and a printer to do their work. Process A locks up the printer, and Process B locks up the file for its own use. Now, none of the processes can complete because they do not have all the resources needed for their completion, and neither will they release the resource they have: they will keep on waiting for the second resource.

Let us create a deadlock now, using Oracle database and SQL Plus client.

We opened two sessions, and executed “set autocommit off” as the first statement.

Now in the first session we executed:

UPDATE ps_voucher SET grp_ap_id='A' WHERE voucher_id='00692096' AND invoice_dt='2-JAN-2002';

second session:

UPDATE ps_voucher SET grp_ap_id='A' WHERE voucher_id='00692096' AND invoice_dt='13-MAR-2007';

back to the first:

UPDATE ps_voucher SET address_seq_num=2 WHERE voucher_id='00692096';

and then the second:

UPDATE ps_voucher SET address_seq_num=2 WHERE voucher_id='00692096'

BAM! Deadlock. See screenshots:

Deadlock - Session I
Deadlock - Session I
Deadlock - Session II
Deadlock - Session II

What went wrong? There existed two vouchers in the system, with the same VOUCHER_ID but with different INVOICE_DTs (invoice dates). Each process first locked up one of those vouchers, and then – as the second UPDATE – tried to update both. (On the database side, a process gets a lock on a specific row when it UPDATEs that row, and the lock is released when the process COMMITs or ROLLBACKs.)

Yes, the programmer could have been smarter and written better code: if he had put the INVOICE_DT clause in the second statement also we would have been fine. However, in practice, with huge systems having tons of code – programmer will sometimes make mistakes. Even if they do not, deadlocks will occur: not all deadlocks are caused by SQL issues.

From a system design perspective, what can be done to prevent deadlocks? One way is for the execution of each process to have a unique ID – let’s call it process instance (PI). So if a process ABC is run once, it will have a PI of 1222 and when it’s run next it will have a PI of 1224. If, after this process PQR is run, it will have a PI of 1223. Before changing any transactions, the process can update it own PI on the transactions that qualify:

UPDATE ps_voucher
SET pi=1223
WHERE <process specific selection criteria>
AND pi=0;

COMMIT;

The commit here is important – only then will other processes be able to see the ‘locking’.

Thereafter the normal processing SQLs can be changed as below:

UPDATE ps_voucher
SET grp_ap_id='1'
WHERE <process specific criteria>
AND pi=1223;

At the end, set the transactions back to ‘open for processing’ by setting PI to zero:

UPDATE ps_voucher
SET pi=0
WHERE pi=1223;

If there are other ways to achieve this, please let me know by posting comments.

The DBA is usually able to specify the SQL queries involved in a deadlock. Many times one process is UPDATing the rows that the other is DELETing.

Share

When NOT to normalise the database

database
When talking of Database Normalisation, textbooks often talk of BCNF, fifth and higher normal forms. However, in practice (in large software/ERPs) I have rarely noticed normalisation beyond Third Normal form. In fact, there is a certain degree of redundancy that is desirable.

While doing database design, I believe there are two critical aspects that should be kept in mind but I see ignored in a lot of common software.

The first is the time aspect of data. First – an example from finance. Consider a company having multicurrency invoicing. The tables can be designed as:

INVOICE: InvoiceID, ..., Currency, BaseCurrency, TransactionDate, ...
CONVERSIONS: FromCurrency, ToCurrency, EffectiveDate, RateMultiplier

This is a design having no redundancy. On the basis of the three fields in the INVOICE relation, we can always find out the latest row from the CONVERSIONS table having EffectiveDate less than TransactionDate. Hence we can determine the RateMultiplier.

Consider another design:

INVOICE: InvoiceID, …, Currency, BaseCurrency, TransactionDate, RateMultiplier, …
CONVERSIONS: FromCurrency, ToCurrency, EffectiveDate, RateMultiplier

Here, the system determines the value of the RateMultiplier at the time of invoice creation and records it permanently within the INVOICE table itself. To me this would be more mature design. Why? Because a lot of data in the INVOICE table would actually depend on the RateMultiplier: for example the VAT details. If on 1-JAN-2009 we know that the exchange rate is 1.1. However, on 3-JAN-2009 we come to know that the rate was incorrectly recorded. Someone changes the CONVERSIONS table to reflect the new exchange rate, of 1.2. All the details in the INVOICE table for the invoices created between 1-JAN and 3-JAN become inconsistent since the BaseCurrency is now inconsistent with the RateMultiplier.

Now consider an example from HR appraisal systems. A table stores what stage an appraisal process is at for a particular employee. This is then used to decide what access he has.

STAGE_CURRENT: EmpID, Stage

Note that this has no Date, or Year field. An employee is trying to see records for the previous year appraisals, yet is unable to see some of the data, because current appraisal process is still in initial stage.

The next problem is that of storage of “under calculation” fields. For example, consider the training department maintains the scores of each student trained. The test administered is of 100 marks, but has a weightage 40. Proposed design:

SCORES: CandidateID, TestID, Score, Flag

At the time of recording, the Flag is set to N. Thereafter a process runs that multiplies the score by 0.4 and sets the Flag to Y.

In my opinion a better design would be to retain both the scores even though the pre-weightage score is not relevant to the business process, because a process can also terminate in between due to erroneous data being supplied. Hence if the process ends after setting the flag to Y, and before changing the score; or in reverse order: after changing the score and before setting the flag then we end up with inconsistent data. Improved design:

Scores: CandidateID, TestID, Score, WeightedScore

At the time of recording, Score is entered and WeightedScore is set to zero. Thereafter a process runs that multiplies the Score by 0.4 and stores the value in WeightedScore.

The central idea is to retain all information permanently so that even if the process fails, we know what data existed.

Share