Recently I’ve seen not so smart optimizer behavior: one query took long time to parse, and ended with an error hitting PGA_AGGREGATE_LIMIT in few minutes; another query was just parsed for ages while using reasonable (under 2G :)) amount of PGA and still could hit PGA_AGGREGATE_LIMIT but after way more time – up to an hour.
Both cases were similar and involved queries which were accessing views; and those views’ code is generated by an application using lots of IN LISTs and other OR conditions. They both are really ugly SQLs with text length ~100K. When Oracle tried to parse them it took a lot of time and parse attempt had either failed with ORA-4036 soon or hanged for a long time and then failed. Strangely incident trace file generated for ORA-4036 doesn’t include PGA heaps breakdown and you have to manually enable PGA heapdump on error to get an idea what is taking up memory. Here’s what I’ve found in there:
Am I reading this right: there's 7816MB (4921+2895) free? what is "10GB free" then? pic.twitter.com/y8UKXHvj0Q
— Timur Akhmadeev (@tmmdv) January 24, 2017
Note kxs-heap-c of 10G. I don’t know how to interpret the numbers here as they don’t add up, and assume qkkele & kkoutlCreatePh as the culprit.
Short stack trace of the process while parsing 2nd query was mostly looking like this:
SQL> oradebug short_stack ksedsts()+244<-ksdxfstk()+58<-ksdxcb()+918<-sspuser()+224<-__sighandler()<-kghfnd_in_free_lists()+717<-kghprmalo()+1583<-kghalp()+1246<-qksshMemAllocPerm()+150 <-kkoabr()+810<-kkobifs()+1411<-kkofbp()+3483<-kkofbp()+2985<-kkofbp()+6206<-kkobmp()+10639<-kkoBitmapAccPath()+212 <-kkotap()+2019<-kkoiqb()+8672<-kkooqb()+528<-kkoOrExpand()+1105<-kkoOrExpand()+4559<-kkoOrExpand()+4559<-kkoqbc()+7123 <-apakkoqb()+182<-apaqbdDescendents()+488<-apadrv()+6244<-opitca()+2106<-kksFullTypeCheck()+79<-rpiswu2()+1780<-kksSetBindType()+2076<-kksfbc()+10522<-opiexe()+2536<-...
The closest hit in MOS search was Doc ID 23539020.8 (Fixed in 12.2). It is a perfect match except for one major difference: the document mentions possible ORA-4031, i.e. shared pool allocation error, while I’ve experienced issues in PGA allocations. Nevertheless I’ve tried suggested workaround of disabling OR expansion and it worked: parse time has dropped to acceptable numbers. I haven’t looked close enough why OR expansion needed so much resources. I think CBO tried to cost table access too many times due to large IN LISTS – as 10053 continuously dumped table access path information non stop; this is just an idea though.
CBO definitely needs some way to control parse time in recent releases. I’d appreciate a parameter which would set a target parse time, and if it’s crossed then either stop and use the best plan found so far or profile current parse call, try to eliminate the most time consuming transformation, and start again with this transformation/feature disabled. The latter would require some sophisticated code, and would be preferred; but even a simple target would be better than what we have now.
Another minor strange thing while investigating this issue was: SQL parsed correctly in under 10s when I’ve used SYS connection and set CURRENT_SCHEMA, even though there was no major differences in CBO environment between SYS and application user sessions that could’ve explained why it parsed fast in SYS. Yet another reason to avoid SYS connections and use real user sessions.