Recently while observing AWR reports, I’ve seen a very good example of how average value hides important pattern.
Here is a Workload Comparison section from an AWR diff report (generated with $ORACLE_HOME/rdbms/admin/awrddrpt.sql):
Workload Comparison ~~~~~~~~~~~~~~~~~~~ 1st Per Sec 2nd Per Sec %Diff 1st Per Txn 2nd Per Txn %Diff --------------- --------------- ------ --------------- --------------- ------ DB time: 0.6 0.6 1.6 0.0 0.0 -50.0 CPU time: 0.3 0.3 -10.3 0.0 0.0 -100.0 Redo size: 112,587.2 62,032.3 -44.9 4,304.7 1,189.7 -72.4 Logical reads: 16,942.2 11,073.6 -34.6 647.8 212.4 -67.2 Block changes: 819.5 330.6 -59.7 31.3 6.3 -79.8 Physical reads: 62.5 17.5 -72.0 2.4 0.3 -85.8 Physical writes: 23.6 32.3 36.7 0.9 0.6 -31.1 User calls: 5,770.1 6,112.1 5.9 220.6 117.2 -46.9 Parses: 19.5 16.1 -17.5 0.7 0.3 -58.7 Hard parses: 0.6 0.0 -98.4 0.0 0.0 -100.0 W/A MB processed: 915,392.1 484,814.0 -47.0 34,999.2 9,298.1 -47.0 Logons: 0.4 0.4 2.4 0.0 0.0 -50.0 Executes: 1,253.9 1,274.2 1.6 47.9 24.4 -49.0 Transactions: 26.2 52.1 99.4
As you can see, the average transaction rate in the 2nd report is twice higher than in the 1st while most of the other metrics went down. It is a natural thing to start thinking that something went wrong or badly in the area of transaction management.
Luckily there are other sources of information in AWR, and system metrics is one of those thing which provides a more detailed view of the system performance. So I’ve used the following query to pull all the data out of DBA_HIST_SYSMETRIC_HISTORY:
with t_interval as ( select /*+ inline */ sysdate-30 begin, sysdate as end from dual ) select begin_time , end_time , aas , logons_sec , redo_sec , user_calls_sec , exec_sec , phys_reads_sec , network_sec , io_mb_sec , txn_sec , phys_writes_sec , io_req_sec , lio_txn , parse_cnt_txn , phys_reads_txn , redo_txn , response_txn , db_changes_txn , avg_1b_sync_read, sql_response from ( select begin_time , end_time , metric_name , value from dba_hist_sysmetric_history, t_interval t where begin_time >= t.begin and end_time <= t.end and dbid = (select d.dbid from v$database d) and instance_number = (select i.instance_number from v$instance i) ) pivot ( max(value) for metric_name in ( 'Logons Per Sec' as logons_sec , 'Redo Generated Per Sec' as redo_sec , 'Average Active Sessions' as aas , 'Total Parse Count Per Txn' as parse_cnt_txn , 'Physical Reads Per Txn' as phys_reads_txn , 'Redo Generated Per Txn' as redo_txn , 'User Calls Per Sec' as user_calls_sec , 'Average Synchronous Single-Block Read Latency' as avg_1b_sync_read , 'Response Time Per Txn' as response_txn , 'Executions Per Sec' as exec_sec , 'Physical Reads Per Sec' as phys_reads_sec , 'Network Traffic Volume Per Sec' as network_sec , 'Logical Reads Per Txn' as lio_txn , 'DB Block Changes Per Txn' as db_changes_txn , 'I/O Megabytes per Second' as io_mb_sec , 'User Transaction Per Sec' as txn_sec , 'SQL Service Response Time' as sql_response , 'Physical Writes Per Sec' as phys_writes_sec , 'I/O Requests per Second' as io_req_sec ) ) order by begin_time;
Then I used Excel to plot Transaction per Second metric (which is stored once every minute) and combined two sets of values into one graph:
And here is an important detail: 1st period is in red! So the pattern of transactions on this system is very inconsistent, and while the 1st AWR report showed only 26 transactions per second (TX/sec) on per-hour average, the spikes during this period were normally as high as 600 on per-minute average (Y axis is not shown on the screenshot). The second AWR report, which showed an increase of 100% in TX/sec on per-hour average compared to the 1st report, is actually showing that spikes of per-minute averages in TX/sec went down to around 200.
While analyzing different sorts of average values in AWR or Statspack, keep the time interval in mind. Different patterns of real data may result in odd averages depending on the interval used for averaging values.