Extended statistics and function-based indexes

About a year ago I’ve discovered nice feature of Oracle 10gR2 CBO: to overcome an issue with calculated selectivity for predicates on multiple columns, it can use DISTINCT_KEYS of the available index. Couple of weeks ago the bug fix mentioned in the OTN thread actually helped to solve a performance issue of a query. And about a week ago I found that this fix doesn’t work with function-based indexes. So, this post is about it.


A test case:

drop table t cascade constraints purge;
create table t as
with g as (select /*+ materialize */ null from all_objects where rownum <= 1000)
select rownum id,
       'x' || lpad(mod(rownum, 100), 9, '0') x,
       'y' || lpad(mod(rownum, 100), 9, '0') y
  from g g1, g g2
 where rownum <= 1e5;
exec dbms_stats.gather_table_stats(user, 't', method_opt=>'for all columns size 1', estimate_percent=>null)
alter session set "_fix_control"='5765456:0';

create index t_indx on t(x, y);
explain plan for select /*+ index(t t_indx) */ * from t where x = :1 and y = :2;
@xp

alter session set "_fix_control"='5765456:7';
explain plan for select /*+ full(t) */ * from t where x = :1 and y = :2;
@xp

explain plan for select /*+ index(t t_indx) */ * from t where x = :1 and y = :2;
@xp

And excerpts of output when running it on 10.2.0.4P35 with comments

-- defaults: good cardinality for IRS, but bad for table access
--------------------------------------------------------------------------------------
| Id  | Operation                   | Name   | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |        |    10 |   260 |   449   (0)| 00:00:06 |
|   1 |  TABLE ACCESS BY INDEX ROWID| T      |    10 |   260 |   449   (0)| 00:00:06 |
|*  2 |   INDEX RANGE SCAN          | T_INDX |  1000 |       |     7   (0)| 00:00:01 |
--------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("X"=:1 AND "Y"=:2)

-- enabled fix for bug 5765456 and FTS: all OK
--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |  1000 | 26000 |   103   (1)| 00:00:02 |
|*  1 |  TABLE ACCESS FULL| T    |  1000 | 26000 |   103   (1)| 00:00:02 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("X"=:1 AND "Y"=:2)

-- enabled fix for bug 5765456 and IRS: all OK too
--------------------------------------------------------------------------------------
| Id  | Operation                   | Name   | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |        |  1000 | 26000 |   449   (0)| 00:00:06 |
|   1 |  TABLE ACCESS BY INDEX ROWID| T      |  1000 | 26000 |   449   (0)| 00:00:06 |
|*  2 |   INDEX RANGE SCAN          | T_INDX |  1000 |       |     7   (0)| 00:00:01 |
--------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("X"=:1 AND "Y"=:2)

Now let’s add a function-based index and try similar queries:

create index t_indx2 on t(x, upper(y));
exec dbms_stats.gather_table_stats(user, 't', method_opt=>'for all hidden columns size 1', estimate_percent=>null)

explain plan for select /*+ full(t) */ * from t where x = :1 and upper(y) = :2;
@xp

explain plan for select /*+ index(t t_indx2) */ * from t where x = :1 and upper(y) = :2;
@xp

Oops:

-- enabled fix for bug 5765456 and FTS: wrong estimates
--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |    10 |   270 |   103   (1)| 00:00:02 |
|*  1 |  TABLE ACCESS FULL| T    |    10 |   270 |   103   (1)| 00:00:02 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("X"=:1 AND UPPER("Y")=:2)

-- -- enabled fix for bug 5765456 and IRS: wrong estimates for table access
---------------------------------------------------------------------------------------
| Id  | Operation                   | Name    | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |         |    10 |   270 |   449   (0)| 00:00:06 |
|   1 |  TABLE ACCESS BY INDEX ROWID| T       |    10 |   270 |   449   (0)| 00:00:06 |
|*  2 |   INDEX RANGE SCAN          | T_INDX2 |  1000 |       |     7   (0)| 00:00:01 |
---------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("X"=:1 AND UPPER("Y")=:2)

So, 10.2.0.4 is not good in this scenario. What about 11.2.0.1 (Win32)?

--------------------------------------------------------------------------------------
| Id  | Operation                   | Name   | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |        |  1000 | 27000 |   449   (0)| 00:00:06 |
|   1 |  TABLE ACCESS BY INDEX ROWID| T      |  1000 | 27000 |   449   (0)| 00:00:06 |
|*  2 |   INDEX RANGE SCAN          | T_INDX |  1000 |       |     7   (0)| 00:00:01 |
--------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("X"=:1 AND "Y"=:2)

-- fix 5765456 can't be disabled in 11.2.0.1
ERROR:
ORA-02097: parameter cannot be modified because specified value is invalid
ORA-00940: invalid ALTER command

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |  1000 | 27000 |   127   (1)| 00:00:02 |
|*  1 |  TABLE ACCESS FULL| T    |  1000 | 27000 |   127   (1)| 00:00:02 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("X"=:1 AND "Y"=:2)

--------------------------------------------------------------------------------------
| Id  | Operation                   | Name   | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |        |  1000 | 27000 |   449   (0)| 00:00:06 |
|   1 |  TABLE ACCESS BY INDEX ROWID| T      |  1000 | 27000 |   449   (0)| 00:00:06 |
|*  2 |   INDEX RANGE SCAN          | T_INDX |  1000 |       |     7   (0)| 00:00:01 |
--------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("X"=:1 AND "Y"=:2)

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |  1000 | 38000 |   127   (1)| 00:00:02 |
|*  1 |  TABLE ACCESS FULL| T    |  1000 | 38000 |   127   (1)| 00:00:02 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("X"=:1 AND UPPER("Y")=:2)

---------------------------------------------------------------------------------------
| Id  | Operation                   | Name    | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |         |  1000 | 38000 |   449   (0)| 00:00:06 |
|   1 |  TABLE ACCESS BY INDEX ROWID| T       |  1000 | 38000 |   449   (0)| 00:00:06 |
|*  2 |   INDEX RANGE SCAN          | T_INDX2 |  1000 |       |     7   (0)| 00:00:01 |
---------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("X"=:1 AND UPPER("Y")=:2)

So in 11.2.0.1:

  • bug fix 5765456 is enabled by default and can’t be disabled
  • in all cases Oracle calculates cardinality estimate properly

- it’s great, but requires an index (though you can make it unusable and save associated overheads). When I tried to use pure extended statistics in 11.2.0.1, I’ve encountered several issues. I’m copy-pasting output with comments

-- adding virtual column with the same function as in existing function-based index fails
SQL> alter table t add (upper_y as (upper(y)));
alter table t add (upper_y as (upper(y)))
                   *
ERROR at line 1:
ORA-54015: Duplicate column expression was specified

SQL> drop table t cascade constraints purge;

Table dropped.

SQL> create table t as
  2  with g as (select /*+ materialize */ null from all_objects where rownum <= 1000)
  3  select rownum id,
  4         'x' || lpad(mod(rownum, 100), 9, '0') x,
  5         'y' || lpad(mod(rownum, 100), 9, '0') y
  6    from g g1, g g2
  7   where rownum <= 1e5;

Table created.

SQL> alter table t add (upper_y as (upper(y)));

Table altered.

SQL> select column_name, hidden_column, virtual_column, avg_col_len from user_tab_cols where table_name = 'T';

COLUMN_NAME                    HID VIR          AVG_COL_LEN
------------------------------ --- --- --------------------
UPPER_Y                        NO  YES NULL
Y                              NO  NO  NULL
X                              NO  NO  NULL
ID                             NO  NO  NULL

4 rows selected.
-- can't gather statistics on virtual column explicitly
SQL> exec dbms_stats.gather_table_stats(user, 't', method_opt=>'for all virtual columns size 1', estimate_percent=>null)
BEGIN dbms_stats.gather_table_stats(user, 't', method_opt=>'for all virtual columns size 1', estimate_percent=>null); END;

*
ERROR at line 1:
ORA-20000: Cannot parse for clause: for all virtual columns size 1
ORA-06512: at "SYS.DBMS_STATS", line 20337
ORA-06512: at "SYS.DBMS_STATS", line 20360
ORA-06512: at line 1


SQL> exec dbms_stats.gather_table_stats(user, 't', method_opt=>'for virtual columns size 1', estimate_percent=>null)
BEGIN dbms_stats.gather_table_stats(user, 't', method_opt=>'for virtual columns size 1', estimate_percent=>null); END;

*
ERROR at line 1:
ORA-20000: Cannot parse for clause: for virtual columns size 1
ORA-06512: at "SYS.DBMS_STATS", line 20337
ORA-06512: at "SYS.DBMS_STATS", line 20360
ORA-06512: at line 1

-- and can't add extended statistics:
SQL> select dbms_stats.create_extended_stats(user, 't', '(upper_y)') from dual;
select dbms_stats.create_extended_stats(user, 't', '(upper_y)') from dual
       *
ERROR at line 1:
ORA-20001: Invalid Extension: Not an expression


SQL> select dbms_stats.create_extended_stats(user, 't', '(upper(y))') from dual;
select dbms_stats.create_extended_stats(user, 't', '(upper(y))') from dual
       *
ERROR at line 1:
ORA-20007: extension (upper(y)) already exists in the table
ORA-06512: at "SYS.DBMS_STATS", line 6674
ORA-06512: at "SYS.DBMS_STATS", line 28195

-- because virtual column added it behind the scene already:
SQL> select extension_name, extension
  2  from user_stat_extensions
  3  /

EXTENSION_NAME
------------------------------
EXTENSION
-------------------------------------
UPPER_Y
(UPPER("Y"))

-- OK then, now back to the main topic - FBI and statistics:
SQL> select dbms_stats.create_extended_stats(user, 't', '(x, upper_y)') from dual;
select dbms_stats.create_extended_stats(user, 't', '(x, upper_y)') from dual
       *
ERROR at line 1:
ORA-20001: Error when processing extension -  virtual column is referenced in a column expression
ORA-06512: at "SYS.DBMS_STATS", line 6815
ORA-06512: at "SYS.DBMS_STATS", line 28195

SQL> select dbms_stats.create_extended_stats(user, 't', '(x, upper(y))') from dual
  2  /
select dbms_stats.create_extended_stats(user, 't', '(x, upper(y))') from dual
       *
ERROR at line 1:
ORA-20001: Invalid Extension: Column group can contain only columns seperated by comma

Yes, I wasn’t able to make extended statistics work in case of an expression. And yes, last exception has a typo and this is not my mistake.
If you know how to make it work in 11g – don’t hesitate to leave a comment.

About these ads

3 Comments

  1. Muriel FRY
    Posted August 19, 2010 at 22:06 | Permalink | Reply

    Hello
    Have you tried ?

    Exec dbms_stats.gather_table_stats(null,’t’,method_opt => ‘for columns upper_y size 1′);
    or
    Exec dbms_stats.gather_table_stats(null,’t’,method_opt => ‘for columns (upper(y)) size 1′);
    or
    Exec dbms_stats.gather_table_stats(null,’t’,method_opt => ‘for all columns size 1′);

    Each case seems to work.

  2. philip v
    Posted February 15, 2011 at 17:01 | Permalink | Reply

    I have only recently learned about the 1% selectivity assumption behind function driven indexes. Quite a surprise!

    Using Oracle 11.1 I had similar issues to you with dbms_stats.create_extended_stats when trying to graft a proper histogram response to a function driven index.

    My example was based on using upper(last_name) on a copy of hr.employeesC where 99% of names were ‘SMITH’.

    My workaround is to use
    DBMS_STATS.gather_table_stats (‘HR’, ‘EMPLOYEESC’, method_opt => ‘for all indexed columns size auto for columns (upper(last_name)) size auto)’);

  3. Posted March 11, 2012 at 11:05 | Permalink | Reply

    Prompted by recent Jonathan Lewis post I’ve re-run tests on a 11.2.0.3. All the same, except the final exception: in this version Oracle allows to create extended statistics with expressions:

    SQL> select dbms_stats.create_extended_stats(user, 't', '(x, upper(y))') from dual
      2  ;
    
    DBMS_STATS.CREATE_EXTENDED_STATS(USER,'T','(X,UPPER(Y))')
    ----------------------------------------------------------------------------------
    SYS_STUP01RWF0EEKLXC2TYUD_3IG3
    
    SQL> select extension_name, extension from user_stat_extensions where table_name = 'T';
    
    EXTENSION_NAME                 EXTENSION
    ------------------------------ ----------------------------------------
    UPPER_Y                        (UPPER("Y"))
    SYS_STUP01RWF0EEKLXC2TYUD_3IG3 ("X",UPPER("Y"))

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

Join 288 other followers