How to choose right data type for a primary key in MySQL database
In this post I will write about primary keys in MySQL database. Developers are almost every day discussing about this topic, "fighting" about right data type to represent primary keys in MySQL database. During my career I experienced a lot of different approaches but I dig into this topic a bit more to see what is actually the best approach and what you should implement in your next project.
It is very important to choose right column for your row identifiers (primary keys), because this column will be used in comparisons and in your SQL joins. Data type used for primary key in one table will be most likely used in other table as foreign key. It is now clear that if you chose wrong data type for your primary key in identifier table, the same type will be used in your referenced tables, which can decrease performance of your database. When you define your primary key column in one table then make sure you use the same type in referenced tables. If you don't do this, then you will get additional time for data conversion. For example, there is Table A with primary key product_id as INT. Table A is referenced to Table B which defines foreign key as STRING. When you join tables A and B, some time T will be lost for converting STRING to INT. Defining different types for primary key and foreign keys is causing implicit conversions which may produce some bugs, so try to avoid this approach.
When you are defining your primary keys, then try to think about the number of rows the table will hold. If you have table containing all countries, or table containing all internet providers in your country, than the primary key doesn't need to support million of rows. Always choose the smallest size that can hold your required range of values.
Be careful when you are writing your SQL queries, so that your indexed column is not used in expression! If you have defined primary key on your product_id column, and if you use your product_id in expression, index will not be used. Here is an example:
select * from products where (product_id+1) = 12;//Index is not used
select * from products where product_id = 12;//Index is used
select * from product where substring(product_name,0,5) = "Apple"//Index is not used
Here are some interesting facts for some data types usually used as primary keys:
Integers are the best choice for primary keys, because they are fast. On this data type you can define AUTO_INCREMENT, and computations are also performing fast.
String types (VARCHAR; CHAR)
I will devote a bit more text explaining why is not good to use string types as primary keys. String types are usually requiring more space and are much slower then integer types. It is much easier to compare two numbers, then two strings and this comparison of strings may introduce additional processing time.
If you define your primary key as VARCHAR then for every non UTF-8 character is represented by one byte, meaning that for string "abcdefghi" 9 bytes will be used. The same amount of bytes will be used in index table, and this requires much more space than for integer.
Defining string types as primary keys especially is not good in MyISAM engine where you can experience bad performance.
You should be also very careful when using random strings for your primary keys. Random strings are produced by hashing functions like SHA1 or MD5. Why? Distribution of strings produced by hashing functions is random in a large space which decreases efficiency of your index, which can slow down INSERT and SELECT statements.
Here is one example of two similar strings "aaa" and "aab" and their SHA1 values "7e240de74fb1ed08fa08d38063f6a6a91462a815", "40b904fd8852297daeaeb426b1bca46fd2454aa3" respectively. Although strings "aaa" and "aab" are very similar, their SHA1 results are not and this can have a big impact on INSERT statements, because new row has to go into random location in index. This can cause page splits and random disk access which can slow overall performance.
In SELECT queries, adjacent values "aaa" and "aab" are on totally different locations in index and locality of the index is broken which is affecting performance.
If you have a column containing a large strings, and you want to define a index on it, then read my post about Prefixed index in MySQL database.
Hope my post helped you, and you have learned something new. You can always follow me on twitter @DurainusA to get my new posts and tweets.
I’d love to get your comments below or you can email me at code.epicenter at gmail.com.http://code-epicenter.com/how-to-choose-right-data-type-for-a-primary-key-in-mysql-database/How to choose right data type for a primary key in MySQL databasehttp://code-epicenter.com/wp-content/uploads/2015/11/mysql.jpghttp://code-epicenter.com/wp-content/uploads/2015/11/mysql-150x150.jpgDatabaseProgrammingTutorialsdata types,Index,MySQL