NL join (ordered)

Some time ago there was a thread on the SQL.ru forum where user has asked the never-ending question “why CBO is doing this?”. The problem was a simple count(*) of parent-child tables join with no FK constraint was executed in very strange way: via NESTED LOOPS using child as a driving table.


Here is a demonstration:

drop table t1 cascade constraints purge;
drop table t2 cascade constraints purge;

create table t1 as
select rownum id, lpad('x', 200, 'x') pad
  from dual
connect by level <= 100000;

create unique index t1_uq on t1(id);

create table t2 as
select rownum id, trunc(rownum/4) t1_id, lpad('x', 200, 'x') pad
  from t1 t11, t1 t12
 where rownum <= 400000;

create index t2_indx on t2(t1_id);

begin
    dbms_stats.gather_table_stats(user, 't1', estimate_percent=>null, method_opt => 'for all columns size 1', cascade => true);
    dbms_stats.gather_table_stats(user, 't2', estimate_percent=>null, method_opt => 'for all columns size 1', cascade => true);
end;
/

alter session set events '10053 trace name context forever';
explain plan for
select /*+  */ count(*)
  from t2, t1
 where t1.id = t2.t1_id;
alter session set events '10053 trace name context off';

@plan

explain plan for
select /*+ use_hash(t1 t2) */ count(*)
  from t1, t2
 where t1.id = t2.t1_id;

@plan

Tables in my example have 100K/400K rows. Original poster had 4M/16M. Results of running test case on 10.2.0.4:

P | ID |       Operation        |  Name  | Rows |IO cost| Cost | Temp
--|----|------------------------|--------|------|-------|------|------
  |   0|SELECT STATEMENT        |        |    1 |   196 |  229 |
  |   1| SORT AGGREGATE         |        |    1 |       |      |
  |   2|  NESTED LOOPS          |        |  400K|   196 |  229 |
  |   3|   INDEX FAST FULL SCAN |T2_INDX |  400K|   196 |  198 |
A |   4|   INDEX UNIQUE SCAN    |T1_UQ   |    1 |     0 |    0 |
----------------------------------------------------------------------
   4 - access("T1"."ID"="T2"."T1_ID")

P | ID |       Operation        |  Name  | Rows |IO cost| Cost | Temp
--|----|------------------------|--------|------|-------|------|------
  |   0|SELECT STATEMENT        |        |    1 |   646 |  653 |
  |   1| SORT AGGREGATE         |        |    1 |       |      |
A |   2|  HASH JOIN             |        |  400K|   646 |  653 | 1704K
  |   3|   INDEX FAST FULL SCAN |T1_UQ   |  100K|    47 |   48 |
  |   4|   INDEX FAST FULL SCAN |T2_INDX |  400K|   196 |  198 |
----------------------------------------------------------------------
   2 - access("T1"."ID"="T2"."T1_ID")

The first plan says it all: for some reason Oracle costs a single UQ scan as zero. CBO trace confirms this:

Table Stats::
  Table: T1  Alias: T1
    #Rows: 100000  #Blks:  2998  AvgRowLen:  205.00
  Column (#1): ID(NUMBER)
    AvgLen: 5.00 NDV: 100000 Nulls: 0 Density: 1.0000e-005 Min: 1 Max: 100000
Index Stats::
  Index: T1_UQ  Col#: 1
    LVLS: 1  #LB: 208  #DK: 100000  LB/K: 1.00  DB/K: 1.00  CLUF: 2942.00
***********************
Table Stats::
  Table: T2  Alias: T2
    #Rows: 400000  #Blks:  12273  AvgRowLen:  210.00
  Column (#2): T1_ID(NUMBER)
    AvgLen: 5.00 NDV: 100001 Nulls: 0 Density: 9.9999e-006 Min: 0 Max: 100000
Index Stats::
  Index: T2_INDX  Col#: 2
    LVLS: 2  #LB: 886  #DK: 100001  LB/K: 1.00  DB/K: 1.00  CLUF: 12122.00

...

Join order[2]:  T2[T2]#1  T1[T1]#0
***************
Now joining: T1[T1]#0
***************
NL Join
  Outer table: Card: 400000.00  Cost: 198.19  Resp: 198.19  Degree: 1  Bytes: 5
  Inner table: T1  Alias: T1
...
  Access Path: index (AllEqUnique)
    Index: T1_UQ
    resc_io: 0.00  resc_cpu: 1900
    ix_sel: 1.0000e-005  ix_sel_with_filters: 1.0000e-005
    NL Join (ordered): Cost: 228.76  Resp: 228.76  Degree: 1
      Cost_io: 196.00  Cost_cpu: 814309596
      Resp_io: 196.00  Resp_cpu: 814309596
  Best NL cost: 228.76

Instead of using 1 as an IO cost for T1_UQ access, Oracle picks up zero. So instead of normal nested loops join cost of (198 + 400K * 1)= ~400K the plan’s cost is just 229. That is three orders of magnitude difference. Increasing number of rows in tables in four times turns the plan to HASH JOIN, but NESTED LOOPS is still costed badly and it’s IO cost is unreasonably low even though T1_UQ scan has cost of 1 (when it should be 2):

HJ NL
P | ID |       Operation        |  Name  | Rows |IO cost| Cost | Temp
--|----|------------------------|--------|------|-------|------|------
  |   0|SELECT STATEMENT        |        |    1 |  2572 | 2599 |
  |   1| SORT AGGREGATE         |        |    1 |       |      |
A |   2|  HASH JOIN             |        | 1600K|  2572 | 2599 | 6808K
  |   3|   INDEX FAST FULL SCAN |T1_UQ   |  400K|   184 |  186 |
  |   4|   INDEX FAST FULL SCAN |T2_INDX | 1600K|   780 |  789 |
----------------------------------------------------------------------
   2 - access("T1"."ID"="T2"."T1_ID")
P | ID |       Operation        |  Name  | Rows |IO cost| Cost | Temp
--|----|------------------------|--------|------|-------|------|------
  |   0|SELECT STATEMENT        |        |    1 |  2096 | 2685 |
  |   1| SORT AGGREGATE         |        |    1 |       |      |
  |   2|  NESTED LOOPS          |        | 1600K|  2096 | 2685 |
  |   3|   INDEX FAST FULL SCAN |T2_INDX | 1600K|   780 |  789 |
A |   4|   INDEX UNIQUE SCAN    |T1_UQ   |    1 |     1 |    1 |
----------------------------------------------------------------------
   4 - access("T1"."ID"="T2"."T1_ID")

Increasing blevel of T1_UQ shows consistent pattern: UQ scan cost is decreased by 1; IO cost of NESTED LOOPS becomes adequate starting with the level of 4:

SQL> exec dbms_stats.set_index_stats(user, 't1_uq', indlevel=>3)

PL/SQL procedure successfully completed.

SQL> explain plan for
  2  select /*+ leading(t2) use_nl(t1) */ count(*)
  3    from t2, t1
  4   where t1.id = t2.t1_id;

Explained.

SQL> @plan
P | ID |       Operation        |  Name  | Rows |IO cost| Cost | Temp
--|----|------------------------|--------|------|-------|------|------
  |   0|SELECT STATEMENT        |        |    1 |   232K|  234K|
  |   1| SORT AGGREGATE         |        |    1 |       |      |
  |   2|  NESTED LOOPS          |        | 1600K|   232K|  234K|
  |   3|   INDEX FAST FULL SCAN |T2_INDX | 1600K|   780 |  789 |
A |   4|   INDEX UNIQUE SCAN    |T1_UQ   |    1 |     2 |    2 |
----------------------------------------------------------------------
   4 - access("T1"."ID"="T2"."T1_ID")
SQL> exec dbms_stats.set_index_stats(user, 't1_uq', indlevel=>4)

PL/SQL procedure successfully completed.

SQL> explain plan for
  2  select /*+ leading(t2) use_nl(t1) */ count(*)
  3    from t2, t1
  4   where t1.id = t2.t1_id;

Explained.

SQL>  @plan
P | ID |       Operation        |  Name  | Rows |IO cost| Cost | Temp
--|----|------------------------|--------|------|-------|------|------
  |   0|SELECT STATEMENT        |        |    1 |  4801K| 4802K|
  |   1| SORT AGGREGATE         |        |    1 |       |      |
  |   2|  NESTED LOOPS          |        | 1600K|  4801K| 4802K|
  |   3|   INDEX FAST FULL SCAN |T2_INDX | 1600K|   780 |  789 |
A |   4|   INDEX UNIQUE SCAN    |T1_UQ   |    1 |     3 |    3 |
----------------------------------------------------------------------
   4 - access("T1"."ID"="T2"."T1_ID")
SQL> exec dbms_stats.set_index_stats(user, 't1_uq', indlevel=>5)

PL/SQL procedure successfully completed.

SQL> explain plan for
  2  select /*+ leading(t2) use_nl(t1) */ count(*)
  3    from t2, t1
  4   where t1.id = t2.t1_id;

Explained.

SQL>  @plan
P | ID |       Operation        |  Name  | Rows |IO cost| Cost | Temp
--|----|------------------------|--------|------|-------|------|------
  |   0|SELECT STATEMENT        |        |    1 |  6401K| 6403K|
  |   1| SORT AGGREGATE         |        |    1 |       |      |
  |   2|  NESTED LOOPS          |        | 1600K|  6401K| 6403K|
  |   3|   INDEX FAST FULL SCAN |T2_INDX | 1600K|   780 |  789 |
A |   4|   INDEX UNIQUE SCAN    |T1_UQ   |    1 |     4 |    4 |
----------------------------------------------------------------------
   4 - access("T1"."ID"="T2"."T1_ID")

In 11.2.0.1 initial test case results are almost the same:

P | ID |       Operation        |  Name  | Rows |IO cost| Cost | Temp
--|----|------------------------|--------|------|-------|------|------
  |   0|SELECT STATEMENT        |        |    1 |   242 |  270 |
  |   1| SORT AGGREGATE         |        |    1 |       |      |
  |   2|  NESTED LOOPS          |        |  400K|   242 |  270 |
  |   3|   INDEX FAST FULL SCAN |T2_INDX |  400K|   242 |  244 |
A |   4|   INDEX UNIQUE SCAN    |T1_UQ   |    1 |     0 |    0 |
----------------------------------------------------------------------
   4 - access("T1"."ID"="T2"."T1_ID")

P | ID |       Operation        |  Name  | Rows |IO cost| Cost | Temp
--|----|------------------------|--------|------|-------|------|------
  |   0|SELECT STATEMENT        |        |    1 |   703 |  709 |
  |   1| SORT AGGREGATE         |        |    1 |       |      |
A |   2|  HASH JOIN             |        |  400K|   703 |  709 | 1704K
  |   3|   INDEX FAST FULL SCAN |T1_UQ   |  100K|    58 |   58 |
  |   4|   INDEX FAST FULL SCAN |T2_INDX |  400K|   242 |  244 |
----------------------------------------------------------------------
   2 - access("T1"."ID"="T2"."T1_ID")

But as T1_UQ blevel becomes more than 1, 11g computes NESTED LOOPS join cost correctly, but UQ scan cost is decreased by 1:

HJ NL
P | ID |       Operation        |  Name  | Rows |IO cost| Cost | Temp
--|----|------------------------|--------|------|-------|------|------
  |   0|SELECT STATEMENT        |        |    1 |  2801 | 2825 |
  |   1| SORT AGGREGATE         |        |    1 |       |      |
A |   2|  HASH JOIN             |        | 1600K|  2801 | 2825 | 6808K
  |   3|   INDEX FAST FULL SCAN |T1_UQ   |  400K|   228 |  230 |
  |   4|   INDEX FAST FULL SCAN |T2_INDX | 1600K|   965 |  972 |
----------------------------------------------------------------------
   2 - access("T1"."ID"="T2"."T1_ID")
P | ID |       Operation        |  Name  | Rows |IO cost| Cost | Temp
--|----|------------------------|--------|------|-------|------|------
  |   0|SELECT STATEMENT        |        |    1 |  1601K| 1601K|
  |   1| SORT AGGREGATE         |        |    1 |       |      |
  |   2|  NESTED LOOPS          |        | 1600K|  1601K| 1601K|
  |   3|   INDEX FAST FULL SCAN |T2_INDX | 1600K|   965 |  972 |
A |   4|   INDEX UNIQUE SCAN    |T1_UQ   |    1 |     1 |    1 |
----------------------------------------------------------------------
   4 - access("T1"."ID"="T2"."T1_ID")

I have no precise explanation for why CBO behaves strange with what seems to be simple query. I guess it has something to do with assumption of visiting all index entries consequentially, which should probably result in accessing index branch blocks from memory rather than disk. But why just 1? Have no idea. Anyway, since 11g differs in behavior from 10g with larger tables, it suggests that this is probably a bug which was fixed somewhere in between.

About these ads

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

Join 288 other followers