My apologies for my extended absence. Well, I ran into something interesting today, and, I thought it would be appropriate for a blog post, and apparently, I got inspired. No startling revelation here, this is just a bit of a cautionary tale about the unintended consequences of using a new feature. A quick search of the Oracle Documentation confirms that the
skip locked directive of the
select for update statement was introduced in Oracle 11g, version 11.1. But, before we dive into that, let’s review the functionality of
select for update through versions of Oracle preceding 11g.
Select for update
First, a bit about
select for update in general. The
select for update functionality has been available in Oracle since some very ancient release of Oracle. It allows for the user to lock specific rows in a table, without changing them. This is generally used as part of a pessimistic locking strategy. (See here for an explanation of optimistic and pessimistic locking.)
select for update default behavior is to wait for locks. That is, Oracle will do a read consistent view of the data, to identify the set of rows that will be locked, and then do current mode gets to impose row-level locks on that set of rows. If, when attempting to lock the set of rows, a row is encountered that is already locked by another session, the session will wait, indefinitely, on a TX enqueue, for a commit or rollback to free the lock, and then execution will continue.
Select for Update with nowait
Oracle also provides for a
nowait option. If the developer codes
select ... from .... for update nowait, then the select for update will proceed in much the same way described above, except, if it encounters a row locked by a different session, rather than waiting, it will raise an
ORA-00054 resource busy and NOWAIT specified error and statement level rollback will occur.
Select for Update with wait n
The functionality of
select for update, with the option of
nowait, was unchanged from some very ancient version of Oracle, up until Oracle 9i Release 2. Starting with Oracle 9i Release 2, (version 9.2.0), Oracle introduced the
wait n clause. This allows the developer to modify the behavior of the
select for update to wait up to n seconds. If a lock cannot be acquired on a particular row for n seconds, Oracle will raise an
ORA-30006 resource busy; acquire with WAIT timeout expired error, and statement level rollback will occur.
New in 11g!
Starting with 11g, (version 11.1.0), Oracle introduced the
skip locked clause. This clause allows for the
select for update to skip any rows it encounters that are already locked. In this way, it’s neither subject to waiting for locks, nor to the ORA-00054 or ORA-30006 error. It simply locks what it can, and moves on. This can be quite useful, when the application design calls for single or multiple producers of data (programs that add rows to a table) and multiple consumers (concurrently executing programs that consume, or delete, rows from the table). In this scenario, and absent any strategy to partition the data so that the consumers do not interfere with each other, implementing either a wait or nowait strategy with
select for update simply won’t work. But, the
skip locked functionality fits the bill very well. Each consumer can query the table, locking the rows that are not locked, and skipping over any rows already locked by someone else. All the consumers can work, and none of them serialize behind each other. Note that this functionality has actually been available in Oracle since at least version 8.0, (the Advanced Queueing feature is built on it), but was not documented, and therefore not supported by Oracle, until version 11.1.
So, here begins my tale of unintended consequences. We have an application in which there are several concurrent processes, all running the same code, consuming rows from a table. The program does a
select for update skip locked, does some processing, and then finally deletes the locked rows. The SQL is similar in form to:
where [some predicates]
and rownum < 101
for update skip locked;
So, find the first 100 rows in the table that satisfy the predicates, skipping over any rows that are already locked. Now, one would expect, given that the number of rows in the table is far greater than 100 * (number of concurrent consumers), each consumer process should have no problem identifying 100 rows for processing. I was a bit puzzled then, when it was reported to me that Oracle was actually returning significantly fewer than 100 rows. The code is simple, the logic is simple, so, what’s going on? How could Oracle return fewer than 100 rows, when there’s plenty of data in the table? Note that there is no error being raised, Oracle is simply going through the table, locking what it can, skipping what it can’t, and coming up with fewer than 100 rows. This in spite of the fact that there are 50 or fewer consumer processes, and that there are 10s of thousands (sometimes 100s of thousands) of rows in the table. The most rows that could be locked at any one time would be 50 consumer processes * 100 rows per process = 5000 rows. So, what’s going on? After reviewing the application level log files, and and several discussions and walk-throughs of the code, we were stumped. So, I try enabling 10046 trace, and sure enough, significantly fewer than 100 rows are being locked. Simple queries prove that there are more than enough eligible rows in the table. So, again, we are stumped. Another test, running only one consumer process, shows that the code behaves as expected when run single-threaded. So, there’s a concurrency problem….but what? Is there some bizarre bug in the
skip locked functionality? It seems premature to consider that, so I decided to rule out that possibility, as extremely unlikely, unless I can find more direct evidence of that fact. So, what next? Well, I wondered, perhaps there is a diagnostic event that would reveal what’s happening under the covers with
skip locked? Well, I tried poking around and looking for such an event, but, to date, I’m not aware of any such functionality. So, how else to continue the investigation? Remember, as I said before, no error is being raised. The only symptom is that the number of rows returned is much smaller than expected. After another brainstorming session with the architect who designed and wrote the code, we came up with the idea to change the code in our test environment, and remove the
skip locked clause, causing the code to block, and allowing us (hopefully) to clearly see where the locking is happening. Sure enough, when we did that, the problem revealed itself almost immediately, and the solution was clear and obvious. When we started multiple, concurrent consumer programs, one wait event was immediately obvious and overwhelming:
enq: TX - allocate ITL entry
So, clearly, the root cause of all my problems and headaches is a shortage of ITL entries in the header of the data block. So, the blocking is happening at the block-level, not the row-level. This explains why locks are not being taken on rows that are not locked, even when the cursor has not yet identified 100 candidate rows. Due to the high level of concurrency, the number of ITL slots (8k block size * 10% PCTFREE = 819 bytes free space per block / 24 bytes per ITL entry = a max of 34 ITL entries per block) is not sufficient. We are running up to 50 concurrent consumer processes, and may want to go as high as 100, and the number of rows per block is well over 100 in many cases. So, now that we’ve gotten to the bottom of the problem, the solution is fairly straightforward and simple. A simple
alter table ... move nologging initrans 110 and a rebuild of the indexes, and we are back in business.
In my opinion, the interesting part of this little puzzle, is the effect of the
skip locked functionality. The whole point, is to allow the code to skip rows that have been locked by another session, and continue running. That’s great, and allows for coding applications that can be highly scalable and performant. However, it’s important to understand that the
skip locked clause means that Oracle will skip any time a TX lock is encountered, including TX locks that are happening at the block-level. The problem is, if you’re encountering non-row level locking,
skip locked is still going to skip over the locked resource and move on. No error will be reported! This may lead to unpredictable results. The moral of the story is that
skip locked doesn’t necessarily mean skip locked rows, it means skip any locked resource.
It seems that there is always how we think a feature works, and how it actually works.
So, this concludes my investigation. I hope it helps someone in the future avoid a similar problem.
I’m going to make an attempt to blog more regularly, but, I know better than to make any promises! Until next time….