by Stanislas Nanchen
Splitting Strings in Oracle with the Model Clause
In database applications, it can happen that some information get stored as an encoded comma separated string. After some time, it is recognized as an error in the modeling and this information must be split into individual values and inserted into a child table. The problem is to find an efficient way to insert these values into the child table.
In Oracle, it is possible to solve this problem with a hierarchical query or with PL/SQL. In this post, we look at an elegant solution with a query using the model clause¹, a feature introduced in Oracle 10g. With the model clause, it is possible to add rows and columns to the result set of a query and to fill theses generated rows and columns via declarative rules. These rules describe the content of cells from the content of other cells. It is really like having a spreadsheet in SQL.
Lets begin and create a simple table
entity with 2 columns: a primary key, a field for
comma separated values and some dummy data.
Now we want to normalize that information into a
The solution using the model clause is the following insert as select query.
As you can see, the model clause is added at the end of a select sql statement. Here we select everything
from the table
entity and will build model(s) from the content of it.
The model clause allows actually the creation of many independent models by partitioning
the result set returned by the primary sql statement. It is similar to the frames of a window function.
By partitioning by
id, we indicate
an independent model for each row of the table
measures are additional computed columns
and they are given dimensions to express their values relatively to other measures thanks to rules that
are declared in the
rule clause. We only need a simple numeric dimension
i that has the start
1 representing the first extracted value. Later, we will extend the dimension
The first measure
vals is the column
vals itself to allow its use in the
The second measure
cnt is the number of values that has to be extracted from the column
vals is encoded as comma separated string, it is the number of commas plus one; we extract this information with the function
The third and last measure
val represents one single
extracted value. Because we will compute the real values for
val with a rule, we just use
varchar2 (as we extract individual strings).
The last part of the model clause are the rules that specify how the measures are computed from other measures.
We have a single rule that will fill the measure
val. At the beginning of the computation of the model,
because we have partitioned by
id and have a dimension
i with start value
1, we have a simple model
for each line of the table
entity with 3 cells:
vals which contains the content of the
cnt containing the number of individual values, and
null for the
The model then applies the rule. On the left side of the equality, we have an expression of the type
val[x] signaling that we want to compute the value of
val at point
x and on the right we have an
expression whose value will be assigned to
val[x]. In our case,
x is the multi-cells expression
for i from 1 to cnt increment 1, which can be translated as
for all integers between
cnt. The rule represents the computation of many cells.
As for the moment only
val exists in the model, Oracle will create
cnt-1 new cells to
The right side of our expression is a call to the
regexp_substr function to extract
the i-th non-comma word from
vals. The model clause function
CV(i) return the current value of
Now we can return to the
select clause of our sql statement: we select the columns
Oracle will return one row for each combination of dimensions in the models. This means that we will have
one row for each extracted value. You can run the following query to see the values
for the dimension
i and for all measures. The measures
cnt are only computed for