start_of_group

Those who visit SQL.ru often know what the title means. It’s a very simple yet powerful technique to group data which doesn’t seem appropriate for grouping at first sight.
I learned this very nice tip long time ago but often have to re-think before applying it to SQL or even search for the correct way of doing it when my brains give me denial of service. Recently I needed to do this type of query again and thought I should document the process, so I won’t forget it next time.

Example query comes from this post and lists all extents in a given tablespace plus free space.
What I want to get is to reduce number of rows in the query output and list only the used and free space chunks inside the tablespace, merging adjacent extents of the same type into a single chunk of space.
So I’m starting with the base query listing used and free extents:

define m_tablespace=USERS
select
    file_id,
    block_id,
    block_id + blocks end_block,
    'used' segment_type
from
    dba_extents
where
    tablespace_name = '&m_tablespace'
union all
select
    file_id,
    block_id,
    block_id + blocks,
    'free'
from
    dba_free_space
where
    tablespace_name = '&m_tablespace'
   FILE_ID   BLOCK_ID  END_BLOCK SEGM
---------- ---------- ---------- ----
         4      28160      28288 used
         4      28288      28416 used
         4      28416      28544 used
         4      28544      28672 used
         4      28672      28800 used
         4       9560       9568 used
         4       9544       9552 used
         4      22272      26496 free
         4      26624      26752 free
         4      26880      27648 free
         4       3248       3256 free

   FILE_ID   BLOCK_ID  END_BLOCK SEGM
---------- ---------- ---------- ----
         4       3280       3288 free
         4       3264       3280 free

1597 rows selected.

Now here’s the major part of this trick: I need to identify each start of a group of rows which I’m going to assign an ID later. If I have a block_id adjacent to the previous end_block, it means I’m in the same group, and if it’s not then it’s a new group and the marker is set to 1.

select
    file_id,
    block_id,
    end_block,
    segment_type,
    case when 
      block_id = lag(end_block)
                 over (partition by file_id, segment_type 
                       order by block_id) 
      then 0 else 1 
    end start_of_group
 from (
    select
        file_id,
        block_id,
        block_id + blocks end_block,
        'used' segment_type
    from
        dba_extents
    where
        tablespace_name = '&m_tablespace'
    union all
    select
        file_id,
        block_id,
        block_id + blocks,
        'free'
    from
        dba_free_space
    where
        tablespace_name = '&m_tablespace'
  )
   FILE_ID   BLOCK_ID  END_BLOCK SEGM START_OF_GROUP
---------- ---------- ---------- ---- --------------
         4       3248       3256 free              1
         4       3264       3280 free              1
         4       3280       3288 free              0
         4      22272      26496 free              1
         4      26624      26752 free              1
         4      26880      27648 free              1
         4        128        136 used              1
         4        136        144 used              0
         4        144        152 used              0
...
   FILE_ID   BLOCK_ID  END_BLOCK SEGM START_OF_GROUP
---------- ---------- ---------- ---- --------------
         4      28544      28672 used              0
         4      28672      28800 used              0

1597 rows selected.

The second part is assigning a group number using running total sum of the marker:

select
  file_id,
  block_id,
  end_block,
  segment_type,
  sum(start_of_group) 
    over (partition by file_id, segment_type 
          order by block_id) grp
from (
  select
      file_id,
      block_id,
      end_block,
      segment_type,
      case when 
        block_id = lag(end_block) 
                   over (partition by file_id, segment_type 
                         order by block_id) 
        then 0 else 1 
      end start_of_group
   from (
      select
          file_id,
          block_id,
          block_id + blocks end_block,
          'used' segment_type
      from
          dba_extents
      where
          tablespace_name = '&m_tablespace'
      union all
      select
          file_id,
          block_id,
          block_id + blocks,
          'free'
      from
          dba_free_space
      where
          tablespace_name = '&m_tablespace'
    )
  )
   FILE_ID   BLOCK_ID  END_BLOCK SEGM        GRP
---------- ---------- ---------- ---- ----------
         4       3248       3256 free          1
         4       3264       3280 free          2
         4       3280       3288 free          2
         4      22272      26496 free          3
         4      26624      26752 free          4
         4      26880      27648 free          5
         4        128        136 used          1
         4        136        144 used          1
         4        144        152 used          1
...
   FILE_ID   BLOCK_ID  END_BLOCK SEGM        GRP
---------- ---------- ---------- ---- ----------
         4      28544      28672 used          6
         4      28672      28800 used          6

1597 rows selected.

Now everything is ready for the final grouping:

select
  file_id,
  segment_type,
  min(block_id),
  max(end_block) - 1,  
  max(end_block) - min(block_id) blocks
from
(
  select
    file_id,
    block_id,
    end_block,
    segment_type,
    sum(start_of_group) 
      over (partition by file_id, segment_type 
            order by block_id) grp
  from (
    select
        file_id,
        block_id,
        end_block,
        segment_type,
        case when 
          block_id = lag(end_block) 
                     over (partition by file_id, segment_type 
                           order by block_id) 
          then 0 else 1 
        end start_of_group
     from (
        select
            file_id,
            block_id,
            block_id + blocks end_block,
            'used' segment_type
        from
            dba_extents
        where
            tablespace_name = '&m_tablespace'
        union all
        select
            file_id,
            block_id,
            block_id + blocks,
            'free'
        from
            dba_free_space
        where
            tablespace_name = '&m_tablespace'
      )
    )
)
group by file_id, segment_type, grp
order by file_id, min(block_id)
   FILE_ID SEGM MIN(BLOCK_ID) MAX(END_BLOCK)-1     BLOCKS
---------- ---- ------------- ---------------- ----------
         4 used           128             3247       3120
         4 free          3248             3255          8
         4 used          3256             3263          8
         4 free          3264             3287         24
         4 used          3288            22271      18984
         4 free         22272            26495       4224
         4 used         26496            26623        128
         4 free         26624            26751        128
         4 used         26752            26879        128
         4 free         26880            27647        768
         4 used         27648            28799       1152

11 rows selected.

And maybe some graphing:

col graph format a16

select
  file_id,
  min_block_id,
  blocks,
  decode(segment_type, 'free', lpad('+', ceil(16*blocks/max_blocks), '+'),
                       'used', lpad('@', ceil(16*blocks/max_blocks), '@')
  ) graph  
from
(
  select
    file_id,
    segment_type,
    min(block_id) min_block_id,
    max(end_block) - 1 max_end_block, 
    max(end_block) - min(block_id) blocks,
    max(max(end_block) - min(block_id)) over () max_blocks
  from
  (
    select
      file_id,
      block_id,
      end_block,
      segment_type,
      sum(start_of_group) 
        over (partition by file_id, segment_type 
              order by block_id) grp
    from (
      select
          file_id,
          block_id,
          end_block,
          segment_type,
          case when 
            block_id = lag(end_block) 
                       over (partition by file_id, segment_type 
                             order by block_id) 
            then 0 else 1 
          end start_of_group
       from (
          select
              file_id,
              block_id,
              block_id + blocks end_block,
              'used' segment_type
          from
              dba_extents
          where
              tablespace_name = '&m_tablespace'
          union all
          select
              file_id,
              block_id,
              block_id + blocks,
              'free'
          from
              dba_free_space
          where
              tablespace_name = '&m_tablespace'
        )
      )
  )
  group by file_id, segment_type, grp
)
order by file_id, min_block_id;

   FILE_ID MIN_BLOCK_ID     BLOCKS GRAPH
---------- ------------ ---------- ----------------
         4          128       3120 @@@
         4         3248          8 +
         4         3256          8 @
         4         3264         24 +
         4         3288      18984 @@@@@@@@@@@@@@@@
         4        22272       4224 ++++
         4        26496        128 @
         4        26624        128 +
         4        26752        128 @
         4        26880        768 +
         4        27648       1152 @

11 rows selected.

3 Trackbacks

  1. […] about MATCH_RECOGNIZE. I am going to present the “start of group” method as demonstrated by Timur Akhmadeev and Solomon Yakobson. This is a powerful use of analytics to group data based on comparisons […]

  2. […] going on. In the named query joined we are joining all possible time intervals. Next, we are using “start of the group” method. First, we identify records where we have a gap or another set of values (HAS_GAP and […]

  3. […] hier? In der Unterabfrage joined joinen wir alle möglichen Zeitintervalle. Weiter setzen wir “start of the group” Method ein. Zuerst kennzeichnen wir die Datensätze mit einer Lücke oder wo sich die Daten […]

Leave a comment