Some time ago there was a thread on the SQL.ru forum where user has asked the never-ending question “why CBO is doing this?”. The problem was a simple count(*) of parent-child tables join with no FK constraint was executed in very strange way: via NESTED LOOPS using child as a driving table.
Here is a demonstration:
drop table t1 cascade constraints purge; drop table t2 cascade constraints purge; create table t1 as select rownum id, lpad('x', 200, 'x') pad from dual connect by level <= 100000; create unique index t1_uq on t1(id); create table t2 as select rownum id, trunc(rownum/4) t1_id, lpad('x', 200, 'x') pad from t1 t11, t1 t12 where rownum <= 400000; create index t2_indx on t2(t1_id); begin dbms_stats.gather_table_stats(user, 't1', estimate_percent=>null, method_opt => 'for all columns size 1', cascade => true); dbms_stats.gather_table_stats(user, 't2', estimate_percent=>null, method_opt => 'for all columns size 1', cascade => true); end; / alter session set events '10053 trace name context forever'; explain plan for select /*+ */ count(*) from t2, t1 where t1.id = t2.t1_id; alter session set events '10053 trace name context off'; @plan explain plan for select /*+ use_hash(t1 t2) */ count(*) from t1, t2 where t1.id = t2.t1_id; @plan
Tables in my example have 100K/400K rows. Original poster had 4M/16M. Results of running test case on 10.2.0.4:
P | ID | Operation | Name | Rows |IO cost| Cost | Temp --|----|------------------------|--------|------|-------|------|------ | 0|SELECT STATEMENT | | 1 | 196 | 229 | | 1| SORT AGGREGATE | | 1 | | | | 2| NESTED LOOPS | | 400K| 196 | 229 | | 3| INDEX FAST FULL SCAN |T2_INDX | 400K| 196 | 198 | A | 4| INDEX UNIQUE SCAN |T1_UQ | 1 | 0 | 0 | ---------------------------------------------------------------------- 4 - access("T1"."ID"="T2"."T1_ID") P | ID | Operation | Name | Rows |IO cost| Cost | Temp --|----|------------------------|--------|------|-------|------|------ | 0|SELECT STATEMENT | | 1 | 646 | 653 | | 1| SORT AGGREGATE | | 1 | | | A | 2| HASH JOIN | | 400K| 646 | 653 | 1704K | 3| INDEX FAST FULL SCAN |T1_UQ | 100K| 47 | 48 | | 4| INDEX FAST FULL SCAN |T2_INDX | 400K| 196 | 198 | ---------------------------------------------------------------------- 2 - access("T1"."ID"="T2"."T1_ID")
The first plan says it all: for some reason Oracle costs a single UQ scan as zero. CBO trace confirms this:
Table Stats:: Table: T1 Alias: T1 #Rows: 100000 #Blks: 2998 AvgRowLen: 205.00 Column (#1): ID(NUMBER) AvgLen: 5.00 NDV: 100000 Nulls: 0 Density: 1.0000e-005 Min: 1 Max: 100000 Index Stats:: Index: T1_UQ Col#: 1 LVLS: 1 #LB: 208 #DK: 100000 LB/K: 1.00 DB/K: 1.00 CLUF: 2942.00 *********************** Table Stats:: Table: T2 Alias: T2 #Rows: 400000 #Blks: 12273 AvgRowLen: 210.00 Column (#2): T1_ID(NUMBER) AvgLen: 5.00 NDV: 100001 Nulls: 0 Density: 9.9999e-006 Min: 0 Max: 100000 Index Stats:: Index: T2_INDX Col#: 2 LVLS: 2 #LB: 886 #DK: 100001 LB/K: 1.00 DB/K: 1.00 CLUF: 12122.00 ... Join order[2]: T2[T2]#1 T1[T1]#0 *************** Now joining: T1[T1]#0 *************** NL Join Outer table: Card: 400000.00 Cost: 198.19 Resp: 198.19 Degree: 1 Bytes: 5 Inner table: T1 Alias: T1 ... Access Path: index (AllEqUnique) Index: T1_UQ resc_io: 0.00 resc_cpu: 1900 ix_sel: 1.0000e-005 ix_sel_with_filters: 1.0000e-005 NL Join (ordered): Cost: 228.76 Resp: 228.76 Degree: 1 Cost_io: 196.00 Cost_cpu: 814309596 Resp_io: 196.00 Resp_cpu: 814309596 Best NL cost: 228.76
Instead of using 1 as an IO cost for T1_UQ access, Oracle picks up zero. So instead of normal nested loops join cost of (198 + 400K * 1)= ~400K the plan’s cost is just 229. That is three orders of magnitude difference. Increasing number of rows in tables in four times turns the plan to HASH JOIN, but NESTED LOOPS is still costed badly and it’s IO cost is unreasonably low even though T1_UQ scan has cost of 1 (when it should be 2):
HJ | NL |
P | ID | Operation | Name | Rows |IO cost| Cost | Temp --|----|------------------------|--------|------|-------|------|------ | 0|SELECT STATEMENT | | 1 | 2572 | 2599 | | 1| SORT AGGREGATE | | 1 | | | A | 2| HASH JOIN | | 1600K| 2572 | 2599 | 6808K | 3| INDEX FAST FULL SCAN |T1_UQ | 400K| 184 | 186 | | 4| INDEX FAST FULL SCAN |T2_INDX | 1600K| 780 | 789 | ---------------------------------------------------------------------- 2 - access("T1"."ID"="T2"."T1_ID") |
P | ID | Operation | Name | Rows |IO cost| Cost | Temp --|----|------------------------|--------|------|-------|------|------ | 0|SELECT STATEMENT | | 1 | 2096 | 2685 | | 1| SORT AGGREGATE | | 1 | | | | 2| NESTED LOOPS | | 1600K| 2096 | 2685 | | 3| INDEX FAST FULL SCAN |T2_INDX | 1600K| 780 | 789 | A | 4| INDEX UNIQUE SCAN |T1_UQ | 1 | 1 | 1 | ---------------------------------------------------------------------- 4 - access("T1"."ID"="T2"."T1_ID") |
Increasing blevel of T1_UQ shows consistent pattern: UQ scan cost is decreased by 1; IO cost of NESTED LOOPS becomes adequate starting with the level of 4:
SQL> exec dbms_stats.set_index_stats(user, 't1_uq', indlevel=>3) PL/SQL procedure successfully completed. SQL> explain plan for 2 select /*+ leading(t2) use_nl(t1) */ count(*) 3 from t2, t1 4 where t1.id = t2.t1_id; Explained. SQL> @plan P | ID | Operation | Name | Rows |IO cost| Cost | Temp --|----|------------------------|--------|------|-------|------|------ | 0|SELECT STATEMENT | | 1 | 232K| 234K| | 1| SORT AGGREGATE | | 1 | | | | 2| NESTED LOOPS | | 1600K| 232K| 234K| | 3| INDEX FAST FULL SCAN |T2_INDX | 1600K| 780 | 789 | A | 4| INDEX UNIQUE SCAN |T1_UQ | 1 | 2 | 2 | ---------------------------------------------------------------------- 4 - access("T1"."ID"="T2"."T1_ID") SQL> exec dbms_stats.set_index_stats(user, 't1_uq', indlevel=>4) PL/SQL procedure successfully completed. SQL> explain plan for 2 select /*+ leading(t2) use_nl(t1) */ count(*) 3 from t2, t1 4 where t1.id = t2.t1_id; Explained. SQL> @plan P | ID | Operation | Name | Rows |IO cost| Cost | Temp --|----|------------------------|--------|------|-------|------|------ | 0|SELECT STATEMENT | | 1 | 4801K| 4802K| | 1| SORT AGGREGATE | | 1 | | | | 2| NESTED LOOPS | | 1600K| 4801K| 4802K| | 3| INDEX FAST FULL SCAN |T2_INDX | 1600K| 780 | 789 | A | 4| INDEX UNIQUE SCAN |T1_UQ | 1 | 3 | 3 | ---------------------------------------------------------------------- 4 - access("T1"."ID"="T2"."T1_ID") SQL> exec dbms_stats.set_index_stats(user, 't1_uq', indlevel=>5) PL/SQL procedure successfully completed. SQL> explain plan for 2 select /*+ leading(t2) use_nl(t1) */ count(*) 3 from t2, t1 4 where t1.id = t2.t1_id; Explained. SQL> @plan P | ID | Operation | Name | Rows |IO cost| Cost | Temp --|----|------------------------|--------|------|-------|------|------ | 0|SELECT STATEMENT | | 1 | 6401K| 6403K| | 1| SORT AGGREGATE | | 1 | | | | 2| NESTED LOOPS | | 1600K| 6401K| 6403K| | 3| INDEX FAST FULL SCAN |T2_INDX | 1600K| 780 | 789 | A | 4| INDEX UNIQUE SCAN |T1_UQ | 1 | 4 | 4 | ---------------------------------------------------------------------- 4 - access("T1"."ID"="T2"."T1_ID")
In 11.2.0.1 initial test case results are almost the same:
P | ID | Operation | Name | Rows |IO cost| Cost | Temp --|----|------------------------|--------|------|-------|------|------ | 0|SELECT STATEMENT | | 1 | 242 | 270 | | 1| SORT AGGREGATE | | 1 | | | | 2| NESTED LOOPS | | 400K| 242 | 270 | | 3| INDEX FAST FULL SCAN |T2_INDX | 400K| 242 | 244 | A | 4| INDEX UNIQUE SCAN |T1_UQ | 1 | 0 | 0 | ---------------------------------------------------------------------- 4 - access("T1"."ID"="T2"."T1_ID") P | ID | Operation | Name | Rows |IO cost| Cost | Temp --|----|------------------------|--------|------|-------|------|------ | 0|SELECT STATEMENT | | 1 | 703 | 709 | | 1| SORT AGGREGATE | | 1 | | | A | 2| HASH JOIN | | 400K| 703 | 709 | 1704K | 3| INDEX FAST FULL SCAN |T1_UQ | 100K| 58 | 58 | | 4| INDEX FAST FULL SCAN |T2_INDX | 400K| 242 | 244 | ---------------------------------------------------------------------- 2 - access("T1"."ID"="T2"."T1_ID")
But as T1_UQ blevel becomes more than 1, 11g computes NESTED LOOPS join cost correctly, but UQ scan cost is decreased by 1:
HJ | NL |
P | ID | Operation | Name | Rows |IO cost| Cost | Temp --|----|------------------------|--------|------|-------|------|------ | 0|SELECT STATEMENT | | 1 | 2801 | 2825 | | 1| SORT AGGREGATE | | 1 | | | A | 2| HASH JOIN | | 1600K| 2801 | 2825 | 6808K | 3| INDEX FAST FULL SCAN |T1_UQ | 400K| 228 | 230 | | 4| INDEX FAST FULL SCAN |T2_INDX | 1600K| 965 | 972 | ---------------------------------------------------------------------- 2 - access("T1"."ID"="T2"."T1_ID") |
P | ID | Operation | Name | Rows |IO cost| Cost | Temp --|----|------------------------|--------|------|-------|------|------ | 0|SELECT STATEMENT | | 1 | 1601K| 1601K| | 1| SORT AGGREGATE | | 1 | | | | 2| NESTED LOOPS | | 1600K| 1601K| 1601K| | 3| INDEX FAST FULL SCAN |T2_INDX | 1600K| 965 | 972 | A | 4| INDEX UNIQUE SCAN |T1_UQ | 1 | 1 | 1 | ---------------------------------------------------------------------- 4 - access("T1"."ID"="T2"."T1_ID") |
I have no precise explanation for why CBO behaves strange with what seems to be simple query. I guess it has something to do with assumption of visiting all index entries consequentially, which should probably result in accessing index branch blocks from memory rather than disk. But why just 1? Have no idea. Anyway, since 11g differs in behavior from 10g with larger tables, it suggests that this is probably a bug which was fixed somewhere in between.