A lot of developers might be confused about index selectivity while using %LIKE% operator. So please find below how index worked when you use LIKE operator.
Problem Statement
- While optimizing high CPU consuming queries on 3rd party application, verified that most of the queries are using
'%LIKE%'
operator. - Interestingly enough, while some of these queries are going for
"INDEX RANGE"
while others are going for"FULL TABLE SCAN"
??
Brief about LIKE operator
'LIKE'
Determines whether a specific character string matches a specified pattern.- % allows you to match any string of any length (including zero length)
Before starting this you must know that :
- Only the part before the first wildcard serves as access predicate.
- The remaining characters do not narrow the scanned index range they just discard non-matching results.
We can use LIKE operator in 4 ways in our queries:
- SEARCH-STRING%
- %SEARCH-STRING
- %SEARCH-STRING%
- SEARCH%STRING
1. SEARCH-STRING%
The SEARCH-STRING% will perform INDEX RANGE SCAN data in least possible time.
set autotrace traceonly;
select * from sac where object_type like 'TAB%';
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer Mode=ALL_ROWS (Cost=85 Card=3 K Bytes=543 K)
1 0 TABLE ACCESS BY INDEX ROWID EMPL101.SAC (Cost=85 Card=3 K Bytes=543 K)
2 1 INDEX RANGE SCAN EMPL101.SAC_INDX (Cost=11 Card=3 K)
Code language: SQL (Structured Query Language) (sql)
Here the optimizer knows ,where the string gets started (means it know the predicate),so It used Index Range Scan .
2. %SEARCH-STRING
When using %SEARCH-STRING it’s access the FULL table.
set autotrace traceonly;
select * from sac where object_type like '%TAB';
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer Mode=ALL_ROWS (Cost=398 Card=16 Bytes=2 K)
1 0 TABLE ACCESS FULL EMPL101.SAC (Cost=398 Card=16 Bytes=2 K)
Code language: SQL (Structured Query Language) (sql)
The opposite case is also possible: a LIKE expression that starts with a wildcard. Such a LIKE expression cannot serve as access predicate. The database has to scan the entire table, if the where clause does not provide another access path.
3. %SEARCH-STRING%
When using %SEARCH-STRING% it’s access the FULL table.
set autotrace traceonly;
select * from sac where object_type like '%TAB%';
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer Mode=ALL_ROWS (Cost=398 Card=3 K Bytes=543 K)
1 0 TABLE ACCESS FULL EMPL101.SAC (Cost=398 Card=3 K Bytes=543 K)
Code language: SQL (Structured Query Language) (sql)
Here also the optimizer doesn’t know from which letter the String get started ,so it will scan the whole table.
4. SEARCH%STRING
The SEARCH%STRING will perform INDEX RANGE SCAN and generate an initial result set, containing the values that match first string i.e. SEARCH%. Next it will scan through the values to get second string i.e. %STRING
set autotrace traceonly;
select * from sac where object_type like 'TA%BLE';
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer Mode=ALL_ROWS (Cost=85 Card=3 K Bytes=543 K)
1 0 TABLE ACCESS BY INDEX ROWID EMPL101.SAC (Cost=85 Card=3 K Bytes=543 K)
2 1 INDEX RANGE SCAN EMPL101.SAC_INDX (Cost=11 Card=3 K)
Code language: SQL (Structured Query Language) (sql)
Lets see the chart below for various search using wildcard character: The more selective the part before the first wildcard is, the smaller the scanned index range becomes. That, in turn, makes the index lookup more efficient.
- The first expression has two characters before the wildcard.
- They limits the scanned index range to 18 rows. Only one of them matches the entire LIKE expression—the other 17 are discarded.
- The second expression has a longer prefix, which narrows the scanned index range to two rows.
- With this expression, the database just reads one extra row that is not relevant for the result.
- The last expression does not have a filter predicate at all.
- The database just reads the entry that matches the entire LIKE expression.
Now, this is how our normal LIKE operator works, but what happen when you want to use index in 2nd and 3rd case of the example above. This is when Oracle*Text Utility comes in picture :)
Oracle*Text utility
The Oracle*Text utility (formally called Oracle ConText and Oracle Intermedia) allows us to parse through a large text column and index on the words within the column. Unlike ordinary b-tree or bitmap indexes, Oracle context, ctxcat
and ctxrule
indexes can be set not to update as content is changed. Since most standard Oracle databases will use the ctxcat index with standard relational tables, you must decide on a refresh interval. Oracle provides the SYNC operator for this. The default is SY^NC=MANUAL
and you must manually synchronize the index with CTX_DDL.SYNC_INDEX
.
SYNC (MANUAL | EVERY "interval-string" | ON COMMIT)
Code language: SQL (Structured Query Language) (sql)
Hence, Oracle Text indexes are only useful for removing full-table scans when the tables are largely read-only and/or the end-users don’t mind not having 100% search recall:
- The target table is relatively static (e.g. nightly batch updates)
- Your end-users would not mind “missing” the latest row data
Lets take up this with an example:
SQL> CREATE TABLE sac AS SELECT * FROM all_objects;
Table created.
SQL> CREATE INDEX sac_indx ON sac(object_type);
Index created.
SQL> set autotrace trace explain
SQL> select * from sac where object_type LIKE 'TAB%';
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=ALL_ROWS (Cost=1 Card=1 Bytes=128
)
1 0 TABLE ACCESS (BY INDEX ROWID) OF 'SAC' (TABLE) (Cost=1 Car
d=1 Bytes=128)
2 1 INDEX (RANGE SCAN) OF 'SAC_INDX' (INDEX) (Cost=1 Card=1)
Code language: SQL (Structured Query Language) (sql)
Above example shows that using % wild card character towards end probe an Index search. But if it is used towards start, it will not be used. And sensibly so, because Oracle doesn’t know which data to search, it can start from ‘A to Z’ or ‘a to z’ or even 1 to any number. See this.
SQL> SELECT *
FROM sac
WHERE object_type LIKE '%ABLE';
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=ALL_ROWS (Cost=148 Card=1004 Byte
s=128512)
1 0 TABLE ACCESS (FULL) OF 'SAC' (TABLE) (Cost=148 Card=1004 B
ytes=128512)
Now how to use the index if you are using Like operator searches. The answer is Domain Indexes.
SQL> connect / as sysdba
Connected.
SQL> grant execute on ctx_ddl to public;
Grant succeeded.
SQL> connect sac;
Connected.
SQL> begin
2 ctx_ddl.create_preference('SUBSTRING_PREF',
3 'BASIC_WORDLIST');
4 ctx_ddl.set_attribute('SUBSTRING_PREF',
5 'SUBSTRING_INDEX','TRUE');
6 end;
7
8 /
PL/SQL procedure successfully completed.
Code language: SQL (Structured Query Language) (sql)
- ctx_ddl.create_preference: Creates a preference in the Text data dictionary.
- ctx_ddl.set_attribute : Sets a preference attribute. Use this procedure after you have created a preference with CTX_DDL.CREATE_PREFERENCE.
SQL> drop index sac_indx;
Index dropped.
SQL> create index sac_indx on sac(object_type) indextype is ctxsys.context parameters ('wordlist SUBSTRING_PREF memory 50m');
Index created.
SQL> set autotrace trace exp
SQL> select * from sac where contains (OBJECT_TYPE,'%PACK%') > 0
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=ALL_ROWS (Cost=8 Card=19 Bytes=17
86)
1 0 TABLE ACCESS (BY INDEX ROWID) OF 'SAC' (TABLE) (Cost=8 Car
d=19 Bytes=1786)
2 1 DOMAIN INDEX OF 'SAC_INDX' (INDEX (DOMAIN)) (Cost=4)
Code language: SQL (Structured Query Language) (sql)
In this case the index is getting used.
Index re-synchronization
Because rebuilding an Oracle Text index (context
, ctxcat
, ctxrule
) requires a full-table scan and lots of internal parsing, it is not practical to use triggers for instantaneous index updates. Updating Oracle Text indexes is easy and they can be schedules using dbms_job or the Oracle 10g dbms_scheduler utility package: Oracle text provides a CTX_DDL package with the sync_index
and optimize_index
procedures:
Code language: SQL (Structured Query Language) (sql)SQL> EXEC CTX_DDL.SYNC_INDEX('sac_indx'); SQL> EXEC CTX_DDL.OPTIMIZE_INDEX('sac_indx','FULL');
For example, if you create a nightly dbms_scheduler job to call sync_index, your index will be refreshed, but the structure will become sub-optimal over time. Oracle recommends that you periodically use the optimize_index package to periodically re-build the whole index from scratch. Index optimization can be performed in three modes (FAST, FULL or TOKEN). In sum, the Oracle Text indexes are great for removing unnecessary full-table scans from static Oracle tables and they can reduce I/O by several orders of magnitude, greatly improving overall SQL performance
Conclusion
For proximity, soundex and fuzzy searches, use domain indexes.