Native full outer join

Starting with version 11gR1 Oracle database uses native FULL JOIN implementation based on HASH JOIN whenever possible. It’s good to know that this functionality first appeared in Oracle 10.2.0.3 and could be used to overcome different issues, including performance and bugs. Here is an example appeared on the SQL.ru recently:

with t1 as
 (select *
    from xmltable('/ROWSET/ROW'
                  passing xmltype('<ROWSET>
                                   <ROW><FNAME>one</FNAME><FSIZE>436</FSIZE></ROW>
                                   <ROW><FNAME>two</FNAME><FSIZE>0</FSIZE></ROW>
                                   </ROWSET>
                                 ')
                  columns fname varchar2(40), fsize integer)),
t2 as
 (select 'one' fname, 238 fsize
    from dual
  union all
  select 'two', 0 from dual)
select coalesce(t1.fname, t2.fname), t1.fsize, t2.fsize
  from t1
  full outer join t2
    on t1.fname = t2.fname
 where t1.fsize != t2.fsize

This query by default fails in 10g with “ORA-07445 address not mapped to object” in alert.log and “ORA-03113 end-of-file on communication channel” reported to user. When you ask Oracle to use native full join implementation either with parameter _optimizer_native_full_outer_join equals to force (which is mentioned here as an enhancement of 10.2.0.4) or a hint NATIVE_FULL_OUTER_JOIN, it’ll work fine in 10.2.0.3/4 too. Native full join seems to be more adequate in terms of run-time performance than what it was in previous releases, however, there’s a difference in CBO calculations for it. Here’s a simple example:

drop table t1 cascade constraints purge;
drop table t2 cascade constraints purge;

create table t1 as
select rownum id
     , lpad('x', 200, 'x') pad
  from dual
connect by level <= 1000;

create table t2 as
select rownum + 500 id 
     , lpad('x', 200, 'x') pad
  from dual
connect by level <= 1000;

exec dbms_stats.gather_table_stats(user, 't1')
exec dbms_stats.gather_table_stats(user, 't2')

explain plan for
select /*+ no_native_full_outer_join */ *
  from t1 full join t2
    on (t1.id = t2.id);
select * from table(dbms_xplan.display);

explain plan for
select /*+ native_full_outer_join */ *
  from t1 full join t2
    on (t1.id = t2.id);
select * from table(dbms_xplan.display);

explain plan for
select *
  from t1 join t2
    on (t1.id = t2.id);
select * from table(dbms_xplan.display);

and it’s relevant output:

-------------------------------------------------------------------------------
| Id  | Operation              | Name | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------
|   0 | SELECT STATEMENT       |      |  1501 |   337K|    45   (3)| 00:00:01 |
|   1 |  VIEW                  |      |  1501 |   337K|    45   (3)| 00:00:01 |
|   2 |   UNION-ALL            |      |       |       |            |          |
|*  3 |    HASH JOIN OUTER     |      |  1000 |   400K|    23   (5)| 00:00:01 |
|   4 |     TABLE ACCESS FULL  | T1   |  1000 |   200K|    11   (0)| 00:00:01 |
|   5 |     TABLE ACCESS FULL  | T2   |  1000 |   200K|    11   (0)| 00:00:01 |
|*  6 |    HASH JOIN RIGHT ANTI|      |   501 |   102K|    23   (5)| 00:00:01 |
|   7 |     TABLE ACCESS FULL  | T1   |  1000 |  4000 |    11   (0)| 00:00:01 |
|   8 |     TABLE ACCESS FULL  | T2   |  1000 |   200K|    11   (0)| 00:00:01 |
-------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   3 - access("T1"."ID"="T2"."ID"(+))
   6 - access("T1"."ID"="T2"."ID")

----------------------------------------------------------------------------------
| Id  | Operation             | Name     | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT      |          |  1000 |   224K|    23   (5)| 00:00:01 |
|   1 |  VIEW                 | VW_FOJ_0 |  1000 |   224K|    23   (5)| 00:00:01 |
|*  2 |   HASH JOIN FULL OUTER|          |  1000 |   400K|    23   (5)| 00:00:01 |
|   3 |    TABLE ACCESS FULL  | T1       |  1000 |   200K|    11   (0)| 00:00:01 |
|   4 |    TABLE ACCESS FULL  | T2       |  1000 |   200K|    11   (0)| 00:00:01 |
----------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("T1"."ID"="T2"."ID")

---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |  1000 |   400K|    23   (5)| 00:00:01 |
|*  1 |  HASH JOIN         |      |  1000 |   400K|    23   (5)| 00:00:01 |
|   2 |   TABLE ACCESS FULL| T1   |  1000 |   200K|    11   (0)| 00:00:01 |
|   3 |   TABLE ACCESS FULL| T2   |  1000 |   200K|    11   (0)| 00:00:01 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - access("T1"."ID"="T2"."ID")

Notice that

  • a query with “old”-style full join has almost perfect cardinality estimation of 1501
  • a query with “new”-style full join has incorrect cardinality estimation of 1000
  • an inner join query has cardinality estimation of 1000, too
  • costs of inner and full joins are the same

It seems like CBO uses arithmetic of a simple inner join to get the estimates for the full join and, of course, it fails to produce reasonable cardinality estimates. Here is 10053 trace file excerpt:

Join order[1]:  T1[T1]#0  T2[T2]#1

***************
Now joining: T2[T2]#1
***************
NL Join
  Outer table: Card: 1000.00  Cost: 11.02  Resp: 11.02  Degree: 1  Bytes: 205
Access path analysis for T2
  Inner table: T2  Alias: T2
  Access Path: TableScan
    NL Join:  Cost: 9516.62  Resp: 9516.62  Degree: 1
      Cost_io: 9492.00  Cost_cpu: 419669650
      Resp_io: 9492.00  Resp_cpu: 419669650

  Best NL cost: 179769313486231570814527423731704356798070567525844996598917476803157260780028538760589558632766878171540458953514382464234321326889464182768467546703537516986049910576551282076245490090389328944075868508455133942304583236903222948165808559332123348274797826204144723168738177180919299881250404026184124858368.00
          resc: 9516.62  resc_io: 9492.00  resc_cpu: 419669650
          resp: 9516.62  resp_io: 9492.00  resc_cpu: 419669650
Join Card:  1000.000000 = = outer (1000.000000) * inner (1000.000000) * sel (0.001000)
Join cardinality for NL:  0.000000, outer: 1000.000000, inner: 1000.000000, sel: 0.001000
Join Card - Rounded: 1000 Computed: 1000.00
  Outer table:  T1  Alias: T1
    resc: 11.02  card 1000.00  bytes: 205  deg: 1  resp: 11.02
  Inner table:  T2  Alias: T2
    resc: 11.02  card: 1000.00  bytes: 205  deg: 1  resp: 11.02
    using dmeth: 2  #groups: 1
    Cost per ptn: 0.51  #ptns: 1
    hash_area: 124 (max=3277) buildfrag: 27  probefrag: 27  ppasses: 1
  Hash join: Resc: 22.56  Resp: 22.56  [multiMatchCost=0.00]
HA Join
  HA cost: 22.56  
     resc: 22.56 resc_io: 22.00 resc_cpu: 9611728
     resp: 22.56 resp_io: 22.00 resp_cpu: 9611728
Best:: JoinMethod: HashFullOuterJoin
       Cost: 22.56  Degree: 1  Resp: 22.56  Card: 1000.00 Bytes: 410
***********************
Best so far:  Table#: 0  cost: 11.0246  card: 1000.0000  bytes: 205000
              Table#: 1  cost: 22.5639  card: 1000.0000  bytes: 410000
***********************

Two points here:

  1. WTF with NL cost? it is either a bug or a very weird way to block nested loops due to forced native full outer join
  2. cardinality estimate comes from nested loops join calculation

In my opinion FULL JOIN is not very popular thing. Most of those who are using it, will see significant performance benefits in 11g compared to the 10gR2 with default settings. However I think there will be relatively small amount of people who will find the new CBO calculations changes, possibly resulting in a poor SQL execution plan due to incorrect cardinality estimates.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s