• Fat Cats Boardgame
  • Wicket UI Library
  • About Me

Java and Wicket web development thoughts, tutorials, and tips

roman@coderdreams.com
Coder DreamsCoder Dreams
  • Fat Cats Boardgame
  • Wicket UI Library
  • About Me

Adding fuzzy search to Wicket autocomplete components

January 15, 2020 Posted by Roman Sery database, wicket No Comments

Note: This post is based on part 1 which you should read first, otherwise you might be lost.

Fuzzy search

We talked about how to create Wicket autocomplete dropdowns in the previous post. Now we want to improve the returned suggestions by adding fuzzy search. Returning the expected results even when the user has made a slight spelling mistake in their search term is known as fuzzy search. For example, suppose the user enters “Wicked” as the search term, we still want to return “Wicket” as a result if it’s in our data set.

You will see this kind of behavior all the time in search engines, however, the approach we will discuss here is much less complex 馃檪

Levenshtein distance

One such algorithm for doing this is called Levenshtein distance. The “distance” part of the name is referring to the distance between the search term and the string you want to match to. The distance is defined by how many insertions/deletions/substitutions of characters you need to make to the search term in order to reach the target string.

In our earlier example, the distance between the search string “Wicked” and the target string “Wicket” is 1 because we only need to make one substitution, replace “d” with “t”. So we can see that a distance of 1 is a very close match, and the greater the distance, the worse of a match it is.

MySQL implementation

A big thanks to Jon LaBelle for providing the implementation, it was used only with minor changes. You can find the script for creating the functions here.

You will find 3 MySQL function definitions:

  • levenshtein_match_all – This will take in our search term, the column to search against, and the maximum distance.聽 If the distance is above the max, the result will not be returned.聽 It will take our search term, split it into words, and for each word, try to match it.聽 This is the function that will actually be called by our app.
  • levenshtein_match – This will take in a single word from the above function and try to match it to each word in the target string.
  • levenshtein – This is the main function which calculates the distance between two words.

Changes to autocomplete component

The main change to our SearchService is going to be how to retrieve the suggestions. We first try to search using our normal string matching approach. If there are no results, than we try the fuzzy approach. The reason we do this is because fuzzy search can be much slower than doing a regular LIKE string search.

We also add a max distance parameter to our AutocompleteFilters class. This way we can adjust how strict/loose the fuzzy matching will be.

One thing to note in relation to Hibernate: In order to use our MySQL functions from HQL, we need to define a custom dialect. In the class CustomMysqlDialect, we register a Hibernate function which maps to the MySQL function.

Performance limitations

Unfortunately now for the bad news… The approach presented here is somewhat slow and should be used for relatively small data sets. The good news is there are ways to improve it:

  • Jozef Jarosciak has written about using the SOUNDEX function.聽 The idea would be to filter out potential matches by first using the fast soundex function, and then apply Levenshtein distance.
  • A few people have talked about using native compiled user-defined functions written in C++ that perform much faster than SQL functions.
    • https://samjlevy.com/mysql-levenshtein/
    • https://github.com/rljacobson/Levenshtein

Wicket test page

As always you can find a full working app to play around with on GitHub. Enjoy!

No Comments
Share
0

About Roman Sery

I've been a software developer for over 10 years and still loving Java!

You also might be interested in

Deploying Spring Boot app to AWS Beanstalk with Nginx customization

Jul 23, 2020

Step by step tutorial for deploying Spring Boot executable JAR to AWS beanstalk using Procfile and platform hooks/customization.

Creating single click buttons with Wicket

Aug 16, 2019

How to create a Wicket button component to prevent users from clicking buttons multiple times in succession.

Using MySQL JSON columns to simplify your data storage: Part 1

Nov 28, 2019

Simplify data storage in your apps by using JSON column types instead of relying on database normalization.

Categories

  • aws
  • customization
  • database
  • debugging
  • enum
  • java
  • models
  • performance
  • projects
  • react
  • software design
  • Spring
  • tool
  • Uncategorized
  • wicket

Recent Posts

  • Rent Day
  • Self-contained Wicket Fragments
  • Pros and cons of unit testing
  • Themeable React Monopoly board
  • Please dont use client-specific release branches

Recent Comments

  • TCI Express Thanks for sharing such insightful information. TCI Express truly stands out as the best air logistics company, offering fast, secure, and efficient air express and cold chain transportation services....

    Tracking down a bug in production Wicket application 路  March 25, 2025

  • Tom Error: A zip file cannot include itself Can you please correct the plugin part so it doesn't use the same folder as input?

    Deploying Spring Boot app to AWS Beanstalk with Nginx customization 路  September 3, 2021

  • Golfman: Reality always wins I've used both Wicket and front-end JS frameworks and, having worked extensively on both, I can tell you that "Speed of development" is definitely NOT with the JS frameworks. You basically end up...

    Five reasons you should use Apache Wicket 路  August 29, 2021

  • Kiriller Sorry can not agree with you, wicket might be a well built technical framework. But the advantages of using a front-end framework like react.js and vue.js can not be beaten by Wicket nowadays. - Speed...

    Five reasons you should use Apache Wicket 路  August 23, 2021

  • Bernd Lauert Sorry but i have to refute your claims with the following arguments: 1. the Wicket community may be small but it is also very responsive, you usually get a helpful answer from the core devs on the...

    Five reasons you should use Apache Wicket 路  July 1, 2021

Archives

  • May 2021
  • October 2020
  • September 2020
  • August 2020
  • July 2020
  • June 2020
  • May 2020
  • April 2020
  • March 2020
  • February 2020
  • January 2020
  • December 2019
  • November 2019
  • October 2019
  • September 2019
  • August 2019
  • July 2019

Contact Me

Send Message
Prev Next