If you think 16MB tables are so small that can’t be root cause of significant performance problems with SELECT queries, then read about my today’s headache – a story of 16MB table worth something like 270 seconds of response time.
It began mostly as usual: an application web page didn’t show up within 5 minutes. The application submitted a query to the database; really big query, 300+ lines. It didn’t work well (no surprise here). So I’ve used usual technique of gathering run-time execution statistics to find the culprit step(s) in the execution plan. Apparently there were no such a step in the plan of the top-level cursor (a bit of surprise here). According to the run-time statistics the query has finished within a second. Considering it was executed over VPN connection to the remote site on different continent this would mean instant reply in case of the same network (as it is for the application) – and this was obvious untrue. I noticed that query results were fetched slowly, row after row – 17 of them. A quick look over the select list confirmed a hypothesis: there were a number of PL/SQL function calls which were dispatched at the fetch and, apparently, weren’t accounted in the execution plan run-time statistics.
I’ve traced the query using extended SQL trace to see more information for function calls. It appeared that most of the time was spent in a single function call, with a single query – 100+ lines this time. It was called 17 times (no surprise here – the function was executed for each row returned by the outer query) and it took 337 seconds of pure CPU time (and nearly the same elapsed time). Per run-time execution statistic it was clear that a single step of the plan accounted for 277 seconds of response time. Here it is:
Rows Row Source Operation ------- --------------------------------------------------- 17 COUNT STOPKEY (cr=36992 pr=0 pw=0 time=337065522 us) ... (snipped) 51 NESTED LOOPS (cr=35496 pr=0 pw=0 time=277024431 us) 17 NESTED LOOPS (cr=102 pr=0 pw=0 time=3004 us) ... (snipped) 51 TABLE ACCESS FULL A_16MB_TABLE (cr=35394 pr=0 pw=0 time=277022293 us) ... (snipped)
The A_16MB_TABLE is a heap-organized table. It has no indexes. “Normally” I would expect it to be performance inhibitor as an inner table in a tight nested loop with a big number of rows returned from the outer row source. This was not the case: it was scanned just 17 times, which is ~16 sec/scan or ~1 MB/sec.
It was right time to have a look at the query and it’s plan – with predicate section of course. The table was joined in rather strange way, something like this:
select ... from (...) v, a_16mb_table t where t.vc_column like v.column1 || 'a pattern' || v.column2 || v.column3 || 'a pattern' escape '\' ...
The vew v (which was merged to the main block) in the query and it’s columns are not columns – they are derived from some character data with a bunch of TRANSLATE / SUBSTR / INSTR calls in different places. And the predicate section of the execution plan contained the answer why FTS was so slow: due to abilities to merge inline view, to move predicates and transitive closure, the actual predicate used to filter rows of the A_16MB_TABLE was more than just ugly. I couldn’t even understand the exact way Oracle generated such a huge filter. And this filter predicate was the root cause of the slow (yet fully in-memory) table scan. The hot-fix to this issue was easy: I materialized results of the view with rownum > 0 predicate, which has prevented Oracle using it’s intelligent techniques for modifying query predicates. However, I don’t consider this as an Oracle bug – in my view it was a designer’s fault. The data was stored in this table as crap, which is unacceptable if you want data to be accessed.