The introduction of the JSON data type in MySQL 5.7, and the subsequent improvements made in 8.0, have opened up a lot of new possibilities in how to store data in your web apps. The traditional approach of database normalization is widely taught in school, but is not always the best approach.
Let’s start by looking at a common business object in most web apps, the User profile. We will see that using a hybrid approach, traditional single-data columns with an additional JSON column, has many advantages.
A simple User profile
Suppose we want to store the following data in our user profiles:
private int id;
private String email;
private String displayName;
private Double height;
private Double weight;
private String nickname;
private String officeFax;
private String officePhone;
private String officePhoneExt;
private String mobilePhone;
private String altEmail;
private List<MailingAddress> addresses;
private List<String> favoriteMovies;
private List<String> favoriteQuotes;
Traditional approach
Using a traditional approach to store this in a relational database, you might create a User table with separate columns for each of the fields. You would then create 3 one-to-many tables to store the user’s addresses, favorite movies, and favorite quotes.
Hybrid-JSON approach
Using the hybrid approach, you will separate out all the “miscellaneous” data into a details object which might look something like this:
public class UserDetails {
private Double height;
private Double weight;
private String nickname;
private String officeFax;
private String officePhone;
private String officePhoneExt;
private String mobilePhone;
private String altEmail;
private List<MailingAddress> addresses;
private List<String> favoriteMovies;
private List<String> favoriteQuotes;
}
Now you only need a single table to store your data and it will be much simplified. It will consist of the core/critical data in traditional columns, and a JSON column to store all of the ‘details’:
CREATE TABLE `user` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`email` varchar(45) DEFAULT NULL,
`display_name` varchar(45) DEFAULT NULL,
`user_details` json DEFAULT NULL,
PRIMARY KEY (`id`)
);
Advantages of the Hybrid approach
- Ease of change – Over the lifetime of your application, things will change frequently. It will be much easier to add fields and collections into UserDetails than it would be to keep updating your database. The ‘user’ database table will rarely change, while the UserDetails might change a lot!
- Performance – To retrieve a user and all of their details requires a single query to the database. With the traditional approach, you would have to query the user table in addition to all the child tables.
- Using an ORM, such as Hibernate, you might define those child tables as collection mappings and be able to retrieve a User entity using a single operation. However, behind the scenes, that will still result in multiple queries.
- You might be thinking that this isn’t a big deal because those queries will hit the cache and not the DB most of the time. This might be true, however most real-world apps use a distributed network cache, which means a “cache-hit” still results in a network-roundtrip and is quite slow.
Indexing
You might rightly be concerned about indexing. Let’s say you want to create an index on the User’s weight for some reports, or to be able to quickly search for users within a weight range. Not to worry, MySQL has a solution for that called virtual columns. Let’s see how easy it is to create an index on the weight:
ALTER TABLE user ADD COLUMN weight DECIMAL GENERATED ALWAYS AS (user_details->>'$.weight');
ALTER TABLE user ADD INDEX weightIX (weight ASC);
The virtual weight column uses MySQL’s JSON path syntax to extract the weight field from UserDetails and store it in a separate column. Our application does not need to worry about managing or updating this new column, MySQL handles that automatically! We than create an index on the virtual column.
Bonus
Because the UserDetails class is essentially a Java bean and is converted to JSON when stored, we can add some very useful methods. For example we can add a method that gets the user’s primary address:
public MailingAddress getPrimaryAddress() {
return getAddresses().stream().filter(MailingAddress::isPrimary).findFirst().orElse(null);
}
When we store a User with multiple addresses into our database, the result looks something like this:
{
"addresses" : [
{
"address1" : "123 test",
"city" : "NY",
"countryCode" : "US",
"id" : 8517527643100,
"primary" : false
},
{
"address1" : "444 primary st",
"city" : "NY",
"countryCode" : "US",
"id" : 8517527664499,
"primary" : true
}
],
"favoriteMovies" : [
"Terminator",
"Star Trek"
],
"favoriteQuotes" : [ ],
"height" : 95,
"nickname" : "micky",
"primaryAddress" : {
"address1" : "444 primary st",
"city" : "NY",
"countryCode" : "US",
"id" : 8517527664499,
"primary" : true
},
"weight" : 220
}
We can see that MySQL took care of populating primaryAddress for us, and it can now be used for querying and indexing!
Conclusion
I hope this at least makes you think more about your database design instead of jumping straight into normalization.
In part 2 we will show the Hibernate/Jackson-JSON code that accomplishes all of this in more detail. But if you can’t wait, as always, you can find the full source code on GitHub.