Tale of the Tape — Direct Path Bulk Bind Inserts
Oracle database offers a number of different methods to load many rows into a table.
Perhaps the most well known is the bulk load, which performs a bulk copy of many rows from one table to another.
But there are many more methods, and they can use either a conventional or a direct path.
The direct path insert is renown for being extremely fast, but is there a downside?
Row-by-row is slow-by-slow
If loading a collection (ie associative array, varray, and nested table) into a table, the simplest method is to insert one row at a time within a loop:
FOR j IN id.FIRST .. id.LAST
LOOP
INSERT INTO &tbl
VALUES (
id(j),
SYSDATE,
SYS_CONTEXT('USERENV', 'SESSION_USER'),
SYS_CONTEXT('USERENV', 'SID'),
'new'
);
END LOOP;
Here the PL/SQL sends an SQL statement to the SQL engine for each insert.
Unfortunately context switches between PL/SQL and SQL engines lead to slow performance.
Also while being simple, unfortunately this type of insert inside a loop can not take advantage of the direct path for inserts.
However, there is a method to efficiently load a collection into a table.
Just as a bulk load can be used to efficiently load rows from one table into another, a bulk bind (ie FORALL statement) of insert statements can efficiently batch load a collection into a table.
Bulk Binds
A bulk bind can use insert, delete, and update statements, and an example of a bulk bind delete is below.
-- example of a bulk bind delete
FORALL i IN id.FIRST .. id.LAST
DELETE FROM &tbl
WHERE department_id = id(i);
Just as there are conventional and direct path bulk loads, there are also two types of bulk bind insert:
- conventional bulk bind insert.
- direct path bulk bind insert.
Both types of bulk bind insert are achieved with the FORALL clause, and there are no context switches between the SQL and PL/SQL engines as there is no PL/SQL involved.
-— conventional bulk bind insert.
FORALL j IN id.FIRST .. id.LAST
INSERT INTO &tbl
VALUES (
id(j),
SYSDATE,
SYS_CONTEXT('USERENV', 'SESSION_USER'),
SYS_CONTEXT('USERENV', 'SID'),
'new'
);
But a bulk bind of insert statements can also use the direct path via the APPEND_VALUES hint. In fact, the APPEND_VALUES hint is exclusively for bulk bind inserts.
An example of a direct path bulk bind insert is below:
-— direct path bulk bind insert
FORALL j IN id.FIRST .. id.LAST
INSERT /*+ APPEND_VALUES */ INTO &tbl
VALUES (
id(j),
SYSDATE,
SYS_CONTEXT('USERENV', 'SESSION_USER'),
SYS_CONTEXT('USERENV', 'SID'),
'new'
);
The advantage of using the direct path for inserting is for the potential performance benefits, especially if using the nologging mode.
However are there any storage consequences of using a direct path bulk bind insert of a small collection many times into the same table?
Conventional bulk bind inserts are storage efficient
Firstly, as a benchmark, let’s insert a 10 element collection into a table, then repeat the insert so there is a total of 1,000,000 rows in the table.
So inserting a 10 element collection into a table, there will be 10 inserts per batch of statements at a time.
FORALL i IN id.FIRST .. id.LAST
INSERT INTO &tbl
VALUES (
sysdate,
id(i),
x,
x,
null,
TO_CHAR(z)
);COMMIT;
Once the bulk bind insert completes, we need to verify that there are 1M rows in the table.
SELECT count(*)/1000/1000 M_of_rows
FROM &tbl;M_OF_ROWS
--------—
1SQL>
Also we’ll determine the segment size of the table needed to store 1M rows.
SELECT bytes/1024/1024 MB
FROM dba_segments
WHERE owner=UPPER('&schm') AND segment_name=UPPER('&tbl'); MB
—-—-—
42SQL>
So there are 1M rows inside a table of 42 MB.
We can also take some simple statistics of the table then query how many rows per block:
SELECT avg(rows_per_block) "avg_rows_per_block",
median(rows_per_block) "median_rows_per_block",
stddev(rows_per_block) "standard deviation"
FROM (
SELECT
count(*) rows_per_block,
dbms_rowid.rowid_relative_fno(rowid),
dbms_rowid.rowid_block_number(rowid)
FROM &tbl
GROUP BY dbms_rowid.rowid_relative_fno(rowid), dbms_rowid.rowid_block_number(rowid)
);avg_rows_per_block median_rows_per_block standard deviation
———————————————--- --------------------- ------------------
191.534189 191 1.80161418SQL>
So the median rows per block is 191.
And the histogram of the rows per block in the segment can be determined too:
col "no rows" heading "no|rows"
col bucket1 heading "1–9|rows"
col bucket2 heading "10|rows"
col bucket3 heading "11–19|rows"
col bucket4 heading "20|rows"
col bucket5 heading "21–185|rows"
col bucket6 heading "186–195|rows"
col bucket7 heading "196–205|rows"
col bucket8 heading "206–215|rows"
col bucket_max heading ">=216|rows"
SELECT
MIN((select SUM(BLOCKS) from dba_extents where owner=UPPER('&schm') AND segment_name=UPPER('&tbl'))-acc_blocks) "no rows",
SUM(CASE WHEN rows_per_block BETWEEN 1 AND 9 THEN 1 ELSE NULL END) bucket1,
SUM(CASE WHEN rows_per_block BETWEEN 10 AND 10 THEN 1 ELSE NULL END) bucket2,
SUM(CASE WHEN rows_per_block BETWEEN 11 AND 19 THEN 1 ELSE NULL END) bucket3,
SUM(CASE WHEN rows_per_block BETWEEN 20 AND 20 THEN 1 ELSE NULL END) bucket4,
SUM(CASE WHEN rows_per_block BETWEEN 21 AND 185 THEN 1 ELSE NULL END) bucket5,
SUM(CASE WHEN rows_per_block BETWEEN 186 AND 195 THEN 1 ELSE NULL END) bucket6,
SUM(CASE WHEN rows_per_block BETWEEN 196 AND 205 THEN 1 ELSE NULL END) bucket7,
SUM(CASE WHEN rows_per_block BETWEEN 206 AND 215 THEN 1 ELSE NULL END) bucket8,
SUM(CASE WHEN rows_per_block >= 216 THEN 1 ELSE NULL END) bucket_max
FROM (
— count the rows in each block
SELECT
count(*) rows_per_block,
dbms_rowid.rowid_relative_fno(rowid),
dbms_rowid.rowid_block_number(rowid),
COUNT(*) OVER (ORDER BY dbms_rowid.rowid_relative_fno(rowid), dbms_rowid.rowid_block_number(rowid)) acc_blocks
FROM &tbl
GROUP BY dbms_rowid.rowid_relative_fno(rowid), dbms_rowid.rowid_block_number(rowid)
); no 1–9 10 11–19 20 21–185 186–195 196–205 206–215 >=216
rows rows rows rows rows rows rows rows rows rows
———— ———— ———— -———— ———- ------ ------- ------- ------- -----
155 1 4712 503 5SQL>
We know the median rows per block is 191, and the histogram confirms that the table has between 186 and 195 rows per block.
These statistics become the benchmark for a conventional bulk bind insert.
Many small direct path bulk bind inserts waste storage
This time, we’ll bulk bind insert the same 10 element collection into a new table. Again the insert will be repeated to obtain 1,000,000 rows.
Rather than using a conventional bulk bind insert, we’ll use the APPEND_VALUES hint that will invoke a direct path bulk bind insert.
—- direct path bulk bind insert
FORALL i IN id.FIRST .. id.LAST
INSERT /*+ APPEND_VALUES */ INTO &tbl
VALUES (
sysdate,
id(i),
x,
x,
null,
TO_CHAR(z)
);-— commit to avoid possible ORA-12838
COMMIT;
When we query the table again, we confirm there are 1M rows present in total.
M_OF_ROWS
—-—-—----
1SQL>
Now we determine the segment size of the table after the direct path bulk bind insert completes.
MB
———
792SQL>
It’s shown here that the segment size for 1M rows has blown out to 792 MB (from 42 MB previously), which suggests massive storage inefficiency!
Looking at the new statistics, and the new histogram will indicate how the storage is being wasted:
avg_rows_per_block median_rows_per_block standard deviation
———————————————--- --------------------- ------------------
10 10 0SQL>no 1–9 10 11–19 20 21–185 186–195 196–205 206–215 >=216
rows rows rows rows rows rows rows rows rows rows
———— ———— ---——— -———— ———- ------ ------- ------- ------- -----
1376 100000SQL>
So when using a direct path bulk bind insert of a 10 element collection into a table, there are only 10 rows per block.
Compare this with the conventional bulk bind insert which obtained 191 rows per block.
The direct path ensures that the next insert is always above the high-water mark for the table, so a new block is always allocated for the next insert even though there is sufficient room below the high-water mark.
Few large direct path bulk bind inserts are storage efficient
This time we’ll direct path bulk bind insert a collection with 2000 elements into a new table.
So there will be 2000 inserts in a batch of direct path insert statements.
-— direct path bulk bind insert
FORALL i IN id.FIRST .. id.LAST
INSERT /*+ APPEND_VALUES */ INTO &tbl
VALUES (
sysdate,
id(i),
x,
x,
null,
TO_CHAR(z)
);—- commit to avoid possible ORA-12838
COMMIT;
After the bulk bind completes, we’ll verify that there are 1M rows in the table, and what the new segment size is.
M_OF_ROWS
———------
1SQL> MB
———
40SQL>
For storage efficiency, it is shown that the segment size for 1M rows has returned to close to the conventional bulk bind benchmark of 42 MB (down from 792 MB). The block statistics will confirm the storage efficiency for the larger collection is higher:
avg_rows_per_block median_rows_per_block standard deviation
———————————————--- --------------------- ------------------
200 200 9.03683201SQL> no 1–9 10 11–19 20 21–185 186–195 196–205 206–215 >=216
rows rows rows rows rows rows rows rows rows rows
———— ———— ---——— -———— ———- ------ ------- ------- ------- -----
120 99 4010 891SQL>
Now there are on average 200 rows per block for the direct path bulk bind insert of a large collection.
Again this is close to the benchmark of 191 rows per block for the conventional bulk bind insert.
Conclusion
In conclusion, when using the direct path bulk bind insert statement, the larger the collection the more efficient the storage into a table.
It follows that if there are higher row densities in a table, then potentially more rows can be queried per block.
More rows per block can result in less blocks being queried, which lead to higher levels of application performance.
Paul Guerin has presented at some of the world’s leading Oracle conferences, including Oracle Open World 2013. Since 2015, his work has been featured in the IOUG Best Practices Tip Booklet, and in publications from AUSOUG, Oracle Technology Network, Quest, and Oracle Developers (Medium). In 2019, he was awarded as a most valued contributor for the My Oracle Support Community. He is a DBA OCP, and continues to be a participant of the Oracle ACE program.