How big is a table?

I’ve been asked this question in various forms and forums over the years and the askers are often surprised when I tell them there is more than one way to answer that question. Furthermore, each method yields a different but not necessarily wrong answer!

First, let’s take a simple, maybe even obvious method:

SELECT avg_row_len * num_rows
  FROM DBA_TABLES
 WHERE owner = 'MY_OWNER' AND table_name = 'MY_TABLE_NAME';

That’s simple enough, take the size of a row multiplied by the number of rows and you get the total size of all rows in the table. Right? Well, maybe, maybe not. Each of those columns is populated by gathering statistics on the table. So, there is a good chance they might be samples, not actual values. Plus, they are only as good as the most recent collection. If any data has been added, deleted, or modified since the last statistics collection, the values are now wrong. Also, the average length is rounded to an integer value, so even with up-to-date numbers the multiplication is still subject to imprecision.

So, is it wrong? In a strictly mathematical and precise definition of “size”, yes, it likely is. It may be an adequate approximation though if the sampling is sufficient and recent enough, especially if the result might be rounded to nearest GB in a report.

Along the same lines you can sum the exact size of every column of every row. This is will be accurate, but extremely slow for tables of a significant size. If however, you need an exact number, a strategy like this will do it. Simply extend to include all of the columns of your table.

SELECT vsize(column1) + vsize(column2) + vsize(column3) ... 
  FROM MY_OWNER.MY_TABLE_NAME;

This method can still suffer from timing errors. The query will return the correct result for a given point in time because of Oracle’s transaction integrity. However, if your table is volatile, its size could change by the time your query finishes due to other users in other transactions isolated from your results.

So, is it wrong? I don’t think I can say it’s wrong in an absolute sense; but it’s often impractical and even if accurate as to the table’s content byte-count it still may not be the answer we’re really looking for.

Another approach is to not look at the rows, but the space allocated for the table. After all, if my table has 100GB allocated on disk it doesn’t matter if there are zero rows or a million rows in it, the space can’t be used by anything else because it belongs to the table.

SELECT SUM(bytes)
  FROM dba_segments
 WHERE owner = 'MY_OWNER' AND segment_name = 'MY_TABLE_NAME';

When I’ve been asked about a table’s size, the space allocation is usually what people are looking for. Summing across the segment takes into account any partitions or sub-partitions the table may have. This then seems like it’s THE answer then.

So, is it wrong? Maybe. What if the table has CLOB or BLOB columns? Those are stored in their own segments. Since LOB values can be large, it’s quite possible the majority of a table’s size is in its LOB contents and not the table’s other columns. Also, LOBs have LOB indexes as well, so those segments should be included in the table’s total size calculation.

   SELECT SUM(bytes)
    FROM (SELECT owner, segment_name table_name, bytes
            FROM dba_segments t
          UNION ALL
          SELECT l.owner, l.table_name, s.bytes
            FROM dba_lobs l 
            JOIN dba_segments s 
              ON s.owner = l.owner 
             AND s.segment_name IN (l.segment_name, l.index_name)
         )
   WHERE owner = 'MY_OWNER' AND table_name = 'MY_TABLE_NAME'
GROUP BY table_name;

Does this solve the problem? For many tables, yes it will.

But, what about Index-Organized tables (IOTs?) For an IOT, the table itself has no segment, table data is stored in the primary key.

SELECT SUM(bytes)
  FROM (SELECT i.table_owner, i.table_name, s.bytes
          FROM dba_indexes i
          JOIN dba_segments s
            ON s.owner = i.owner
           AND s.segment_name = i.index_name
         WHERE index_type = 'IOT - TOP')
 WHERE owner = 'MY_OWNER' AND table_name = 'MY_TABLE_NAME';

Combining this with the lob query we can get even closer to a total picture of how much space is consumed by a table and its constituent parts.

  SELECT SUM(bytes)
    FROM (SELECT owner, segment_name table_name, bytes
            FROM dba_segments t
          UNION ALL
          SELECT l.owner, l.table_name, s.bytes
            FROM dba_lobs l 
            JOIN dba_segments s
              ON s.owner = l.owner
             AND s.segment_name IN (l.segment_name, l.index_name)
          UNION ALL
          SELECT i.table_owner, i.table_name, s.bytes
            FROM dba_indexes i
            JOIN dba_segments s
              ON s.owner = i.owner
             AND s.segment_name = i.index_name
           WHERE index_type = 'IOT - TOP')
   WHERE owner = 'MY_OWNER' AND table_name = 'MY_TABLE_NAME';

So, is this finally the answer? Again, for many tables, yes; but if your IOT has an overflow segment those will need to be included too.

SELECT SUM(bytes)
  FROM (SELECT owner, segment_name table_name, bytes
          FROM dba_segments t
        UNION ALL
        SELECT l.owner, l.table_name, s.bytes
          FROM dba_lobs l 
          JOIN dba_segments s 
            ON s.owner = l.owner
           AND s.segment_name IN (l.segment_name, l.index_name)
        UNION ALL
        SELECT i.table_owner, i.table_name, s.bytes
          FROM dba_indexes i
          JOIN dba_segments s
            ON s.owner = i.owner
           AND s.segment_name = i.index_name
         WHERE index_type = 'IOT - TOP'
        UNION ALL
        SELECT s.owner, t.iot_name table_name, s.bytes
          FROM dba_tables t
          JOIN dba_segments s
            ON s.owner = t.owner
           AND s.segment_name = t.iot_name
       )
 WHERE owner = 'MY_OWNER' AND table_name = 'MY_TABLE_NAME';

This should cover just about all table variations.

You can probably see I’ve left myself some wiggle room in that declaration.
This is intentional because there are, of course, other variations that can, again, change the answer based on the specific requirements and interpretation.

A few of the more obvious ones…

Sharded Tables – does it make sense to sum space across all shards or are you looking for a local answer?
External Tables – the table itself consumes none of the tablespace allocation; but the underlying files of an external table do consume space. Does it make sense to include those? What about external tables with a preprocessor? Do you want the size of the resulting output or just the size of the original input files, assuming there are some?
Other Indexes – Other indexes on a table aren’t part of the table or its structural components unlike the IOT and LOB indexes; but their inclusion could be argued to make a more accurate picture of a table’s full requirements. Furthermore, if you drop a table, its indexes will be dropped too, clearly freeing up space. So it could make sense to include them, on the other hand if you export the data of a table, the index contents are not part of the export so it also makes sense to exclude them, again, depending on the requirements.
Views – Obviously a view itself has no storage, but like external tables someone might be interested in the size of the underlying objects supporting the view.
Materialize Views – Like a view, the query portion of a materialized view consumes no space; but the supporting table holding the results of the MV-query does. In this sense it’s not particularly interesting because the table can be calculated using any of the methods shown above. Should you then include the materialized view log tables? Maybe if they are local, but maybe not if they are on a remote system accessed via db-link.

Hopefully the queries above provide the answers you will need when asked “How big is a table?” but if not, hopefully they will provide enough of an outline to show how to add other components as needed. If not, questions and comments, as always, are welcome.

Leave a ReplyCancel reply