Prefixed index MySQL

During your developer career you probably experienced the following MySQL error message:

SQL Error (1170): BLOB/TEXT column ‘postContent’ used in key specification without a key length

Usually, this error appears when you want to add your column to the database index, but column type is BLOB/TEXT or long VARCHAR. This error can also arise if you already have an index defined on a column but you want to change column’s type to TEXT/BLOB by altering table structure.

If you want to change the data type of your column of your indexed column then you can read my post about How to choose right data type for primary key in MySQL database.

MySQL disallows indexing a full value of these columns because data they contain can be huge, and implicitly DB index would be big, and you will not benefit from it, and index maintenance is hard. Because of that, MySQL requires that you define first N characters to be indexed, and the trick is to choose a number N that’s long enough to give good selectivity, but short enough to save space. The prefix should be long enough to make the index nearly as useful as it would be if you’d indexed the whole column.

Before we go further let us define some important terms. Index selectivity is ratio of the total distinct indexed values and total number of rows. Here is one example. Here is our test table:

id value
1 abc
2 abd
3 acf

If we index only first character N=1, then index table will look like the following table:

index rows
a 1,2,3

Now it is easy to calculate index selectivity as ration between number of distinct rows in index table and total number of rows in test table IndexSelectivity = 1/3 = 0.33.

Please note that Index selectivity is value between [0,1].

If we increase a number of indexed characters to N=2 then index table will look like:

index rows
ab 1,2
ac 3

Now, index selectivity is equal to 2/3=0.66 but index table is bigger. You get the idea? The trick is to find minimal number N which will have the same index selectivity as if we have indexed the complete row and not only first N characters.

There are two possible approaches you can calculate this for your table and define the number N. I will use a test database in order to demonstrate how to perform calculations. Her is database dump I used.

Let’s say we want to add column last_name in table employees to the index, and we want to define the smallest number N which will produce the best index selectivity.

First let us identify the most frequent last names:

As you can see, the last name Baba is the most frequent one. Now we are going to find the most frequently occurring last_name prefixes, beginning with five-letter prefixes.

There are much more occurrences of every prefix, which means we have to increase number N until the values are almost the same as in the previous example.

Here are results for N=9

This means index will be a bit slower if we are searching last name ‘Schwartzb’ but in any other case, index selectivity is really good. Here are results for N=10.

This are very good results. This means that we can make index on column last_name with indexing only first 10 characters. In table definition column last_name is defined as VARCHAR(16), and this means we have saved 6 bytes (or more if there are UTF8 characters in the last name) per entry. In this table there are 1637 distinct values multiplied by 6 bytes is about 9KB, and imagine how this number would grow if our table contains millions of rows.

Other way to calculate index selectivity is to calculate it as per definition. IndexSelectivity = DinstinctRows/TotalRows. Here is how to calculate it:

Let’s calculate IndexSlectivity when index is based on first N characters where N is changed from 5 to 10. We are trying to find result closest the one in the table above.

From results we can see how index selectivity is changing as we are increasing number of indexed characters N. It is obvious that for N=9 or N=10 index selectivity is equal as we have indexed all characters from last_name column.

Benefits of this approach are obvious and I suggest you to test this approach on real world database.

I’d love to get your comments below or you can email me at code.epicenter at gmail.com.

Prefixed index in MySQL databasehttp://code-epicenter.com/wp-content/uploads/2015/11/mysql2.pnghttp://code-epicenter.com/wp-content/uploads/2015/11/mysql2-150x150.pngAmir DuranDatabaseProgrammingTutorialsIndex,MySQL,Prefixed Index
During your developer career you probably experienced the following MySQL error message: SQL Error (1170): BLOB/TEXT column 'postContent' used in key specification without a key length Usually, this error appears when you want to add your column to the database index, but column type is BLOB/TEXT or long VARCHAR. This error can...