Those who visit SQL.ru often know what the title means. It’s a very simple yet powerful technique to group data which doesn’t seem appropriate for grouping at first sight.
I learned this very nice tip long time ago but often have to re-think before applying it to SQL or even search for the correct way of doing it when my brains give me denial of service. Recently I needed to do this type of query again and thought I should document the process, so I won’t forget it next time.
Example query comes from this post and lists all extents in a given tablespace plus free space.
What I want to get is to reduce number of rows in the query output and list only the used and free space chunks inside the tablespace, merging adjacent extents of the same type into a single chunk of space.
So I’m starting with the base query listing used and free extents:
define m_tablespace=USERS select file_id, block_id, block_id + blocks end_block, 'used' segment_type from dba_extents where tablespace_name = '&m_tablespace' union all select file_id, block_id, block_id + blocks, 'free' from dba_free_space where tablespace_name = '&m_tablespace' |
FILE_ID BLOCK_ID END_BLOCK SEGM ---------- ---------- ---------- ---- 4 28160 28288 used 4 28288 28416 used 4 28416 28544 used 4 28544 28672 used 4 28672 28800 used 4 9560 9568 used 4 9544 9552 used 4 22272 26496 free 4 26624 26752 free 4 26880 27648 free 4 3248 3256 free FILE_ID BLOCK_ID END_BLOCK SEGM ---------- ---------- ---------- ---- 4 3280 3288 free 4 3264 3280 free 1597 rows selected. |
Now here’s the major part of this trick: I need to identify each start of a group of rows which I’m going to assign an ID later. If I have a block_id adjacent to the previous end_block, it means I’m in the same group, and if it’s not then it’s a new group and the marker is set to 1.
select file_id, block_id, end_block, segment_type, case when block_id = lag(end_block) over (partition by file_id, segment_type order by block_id) then 0 else 1 end start_of_group from ( select file_id, block_id, block_id + blocks end_block, 'used' segment_type from dba_extents where tablespace_name = '&m_tablespace' union all select file_id, block_id, block_id + blocks, 'free' from dba_free_space where tablespace_name = '&m_tablespace' ) |
FILE_ID BLOCK_ID END_BLOCK SEGM START_OF_GROUP ---------- ---------- ---------- ---- -------------- 4 3248 3256 free 1 4 3264 3280 free 1 4 3280 3288 free 0 4 22272 26496 free 1 4 26624 26752 free 1 4 26880 27648 free 1 4 128 136 used 1 4 136 144 used 0 4 144 152 used 0 ... FILE_ID BLOCK_ID END_BLOCK SEGM START_OF_GROUP ---------- ---------- ---------- ---- -------------- 4 28544 28672 used 0 4 28672 28800 used 0 1597 rows selected. |
The second part is assigning a group number using running total sum of the marker:
select file_id, block_id, end_block, segment_type, sum(start_of_group) over (partition by file_id, segment_type order by block_id) grp from ( select file_id, block_id, end_block, segment_type, case when block_id = lag(end_block) over (partition by file_id, segment_type order by block_id) then 0 else 1 end start_of_group from ( select file_id, block_id, block_id + blocks end_block, 'used' segment_type from dba_extents where tablespace_name = '&m_tablespace' union all select file_id, block_id, block_id + blocks, 'free' from dba_free_space where tablespace_name = '&m_tablespace' ) ) |
FILE_ID BLOCK_ID END_BLOCK SEGM GRP ---------- ---------- ---------- ---- ---------- 4 3248 3256 free 1 4 3264 3280 free 2 4 3280 3288 free 2 4 22272 26496 free 3 4 26624 26752 free 4 4 26880 27648 free 5 4 128 136 used 1 4 136 144 used 1 4 144 152 used 1 ... FILE_ID BLOCK_ID END_BLOCK SEGM GRP ---------- ---------- ---------- ---- ---------- 4 28544 28672 used 6 4 28672 28800 used 6 1597 rows selected. |
Now everything is ready for the final grouping:
select file_id, segment_type, min(block_id), max(end_block) - 1, max(end_block) - min(block_id) blocks from ( select file_id, block_id, end_block, segment_type, sum(start_of_group) over (partition by file_id, segment_type order by block_id) grp from ( select file_id, block_id, end_block, segment_type, case when block_id = lag(end_block) over (partition by file_id, segment_type order by block_id) then 0 else 1 end start_of_group from ( select file_id, block_id, block_id + blocks end_block, 'used' segment_type from dba_extents where tablespace_name = '&m_tablespace' union all select file_id, block_id, block_id + blocks, 'free' from dba_free_space where tablespace_name = '&m_tablespace' ) ) ) group by file_id, segment_type, grp order by file_id, min(block_id) |
FILE_ID SEGM MIN(BLOCK_ID) MAX(END_BLOCK)-1 BLOCKS ---------- ---- ------------- ---------------- ---------- 4 used 128 3247 3120 4 free 3248 3255 8 4 used 3256 3263 8 4 free 3264 3287 24 4 used 3288 22271 18984 4 free 22272 26495 4224 4 used 26496 26623 128 4 free 26624 26751 128 4 used 26752 26879 128 4 free 26880 27647 768 4 used 27648 28799 1152 11 rows selected. |
And maybe some graphing:
col graph format a16 select file_id, min_block_id, blocks, decode(segment_type, 'free', lpad('+', ceil(16*blocks/max_blocks), '+'), 'used', lpad('@', ceil(16*blocks/max_blocks), '@') ) graph from ( select file_id, segment_type, min(block_id) min_block_id, max(end_block) - 1 max_end_block, max(end_block) - min(block_id) blocks, max(max(end_block) - min(block_id)) over () max_blocks from ( select file_id, block_id, end_block, segment_type, sum(start_of_group) over (partition by file_id, segment_type order by block_id) grp from ( select file_id, block_id, end_block, segment_type, case when block_id = lag(end_block) over (partition by file_id, segment_type order by block_id) then 0 else 1 end start_of_group from ( select file_id, block_id, block_id + blocks end_block, 'used' segment_type from dba_extents where tablespace_name = '&m_tablespace' union all select file_id, block_id, block_id + blocks, 'free' from dba_free_space where tablespace_name = '&m_tablespace' ) ) ) group by file_id, segment_type, grp ) order by file_id, min_block_id; FILE_ID MIN_BLOCK_ID BLOCKS GRAPH ---------- ------------ ---------- ---------------- 4 128 3120 @@@ 4 3248 8 + 4 3256 8 @ 4 3264 24 + 4 3288 18984 @@@@@@@@@@@@@@@@ 4 22272 4224 ++++ 4 26496 128 @ 4 26624 128 + 4 26752 128 @ 4 26880 768 + 4 27648 1152 @ 11 rows selected.
3 Trackbacks
[…] about MATCH_RECOGNIZE. I am going to present the “start of group” method as demonstrated by Timur Akhmadeev and Solomon Yakobson. This is a powerful use of analytics to group data based on comparisons […]
[…] going on. In the named query joined we are joining all possible time intervals. Next, we are using “start of the group” method. First, we identify records where we have a gap or another set of values (HAS_GAP and […]
[…] hier? In der Unterabfrage joined joinen wir alle möglichen Zeitintervalle. Weiter setzen wir “start of the group” Method ein. Zuerst kennzeichnen wir die Datensätze mit einer Lücke oder wo sich die Daten […]