Database-sourced websites provide powerful facilities
for searching according to complex criteria. However, generalised keyword-style
searches – Google-style – are not straightforward to implement.
SQL queries which are used to interrogate a database operate in quite
a different way from website search engines.
These notes set out a method for implementing such general keyword-search
facilities in database-sourced websites.
(Note: to make sense of the following, you will probably
need reasonable familiarity with SQL queries and regular expressions).
The problem
- SQL queries facilitate general substring searches, but not ‘whole
word’ searches
- Multiple-word searches across different fields are difficult
‘Whole word’ searching
SQL select statements allow substring searches along the lines
Select * From tbl Where fld
Like '*salt*'.
However, if users were looking for references to salt and this returned
results referring to, say, psalter, users might be a little surprised,
not to say confused.
To find complete words only, the query term could be surrounded by spaces,
as in
Select * From tbl Where fld
Like '* salt *'.
However, it wouldn’ t find ‘salt’ at the end of a sentence, where it is
followed by a full stop rather than a space - which would not be acceptable.
The same applies to all punctuation. Similarly, if the search term appears
at the beginning or at the end of the field, it will not be matched.
Multiple-word searches
While theorists agonise about boolean ‘and/or’ searches, Google has clearly
established that for straightforward searching, the most useful search
results are those including all search terms.
Within a single database field, this can be achieved by
Select * From tbl Where fld
Like '*salt*' And fld Like '*pepper*'
The trouble comes in covering more than one database field; if ‘salt’
is in one field, and ‘pepper’ in a different field, it would take an impractically
complex SQL query to find the correct results for an arbitrary number
of search terms across an arbitrary number of fields.
Whole-word searching
If punctuation undermines the strategy of surrounding search terms with
spaces, the obvious solution is to convert all punctuation to spaces,
after which surrounding search terms with spaces will work.
The trouble is that the matching against search terms is done within
the SQL query, inside the database. This limits one’s options, as neither
client-side nor server-side scripts can help; JavaScript & VBScript
are out of the picture. Standard SQL offers only the simplest wild-card
matching, and provides no regular-expression matching (otherwise full-words
could be matched by the simple regular expression "([^a-zA-Z]|^)salt([^a-zA-Z]|$)").
(Some databases, such as Oracle and DB2, do have proprietary extensions
for regular expression handling).
Search field
The solution is to have a separate database field which can hold a reformatted
version of the field to be searched, with all punctuation replaced by
spaces. I would sooner have avoided messing with the database, and have
implemented search facilities purely in the website, but I couldn't find
a way. Like this, it’s not enormously elegant, but it works.
Populating this ‘search’ field is done by an update query which replaces
all punctuation by spaces. This is a task beyond SQL queries: a quick
dip into some programming is required.
The query which populates the search field will be along the lines of
Update tbl Set srchfld = "
" & stripPunctuation(fld) & " "
This stripPunctuation function can be implemented in VisualBasic (I was
doing this in Access) as follows:
Public Function stripPunctuation(str)
str = Replace(str, ".", " ")
str = Replace(str, ",", " ")
str = Replace(str, ":", " ")
str = Replace(str, ";", " ")
str = Replace(str, "-", " ") ' hyphen
str = Replace(str, "–", " ") ' n-dash
str = Replace(str, "—", " ") ' m-dash
str = Replace(str, "?", " ")
str = Replace(str, "…", " ")
str = Replace(str, "(", " ")
str = Replace(str, ")", " ")
str = Replace(str, "[", " ")
str = Replace(str, "]", " ")
str = Replace(str, " '", " ") ' strip quotes around words but leave
str = Replace(str, "' ", " ") ' internal quotes eg O'Patrick
str = Replace(str, " ", " ") ' two spaces down to one
stripPunctuation = str
End Function
Or, perhaps more elegantly and flexibly, by implementing
a version of the Cold Fusion ReplaceList function as follows:
Public Function stripPunctuation(str)
str = ReplaceList(str, ".,:,;,-,–,—,?,_,(,),[,], ',' , ", _
" , , , , , , , , , , , , , , ")
str = Replace(str, ",", " ") ' do this separately as
stripPunctuation = str ' comma is list separator
End Function
Public Function replaceList(str As String,
list1 As String,
list2 As String) As String
' within 'str', replace all terms in comma-separated
' list 'list1' with matching term in 'list2'
Dim l1() As String, l2() As String, i As Integer
l1 = Split(list1, ",")
l2 = Split(list2, ",")
For i = 0 To UBound(l1) - 1
str = Replace(str, l1(i), l2(i))
Next i
replaceList = str
End Function
To take an example, if we had a database field containing
the text
‘Form_factor: small, power requirement (at
low consumption): 200w’
the stripPunctuation function would replace it with
‘Form factor small power requirement at low
consumption 200w’.
This would then enable the space-surrounded search term to operate correctly,
where it couldn’t against the original text.
This function could be run as often as required, depending on how often
data changes, using the update query above.
The only remaining problem is to implement the searching within this
transformed search field. Remember, the idea is that users can enter an
arbitrary number of search terms, and have results returned where all
search terms are matched.
Criteria builder
For converting a list of search terms into a SQL query, I used some of
the power of JavaScript to build the SQL search criteria I want. I’ve
switched to JavaScript for this because this is done on the web server,
so I can opt for JavaScript’s more powerful regular expression handling.
You may have to think about this. What we need to do is to transform
keywords such as ‘salt pepper’, or perhaps ‘salt, pepper’ into SQL query
criteria such as ‘Like '* salt *' And like '* pepper
*'’. JavaScript regular expressions can do this very cleverly,
though in the code here I replace the Windows ‘*’ wildcard with the SQL
standard ‘%’ wildcard (a more sophisticated version could consider terms
prefixed by ‘-’ as exclusion from the search, and terms in quotes as a
single phrase).
function buildAndConditions(terms, field)
{
// globally replace white-space and comma with 'And' term
return(field +
" Like '% " +
terms.replace(/[\s,]/g, " %' And " + field + " Like '% ") +
" %'");
}
This will replace any whitespace between individual search
terms with the text ‘ %' And fld Like '% ’.
If a user entered a single search term ‘salt’, the replace method
would do nothing to the search term, and the resulting criteria would
be ‘fld Like '% salt %'’. If the user entered
a pair of search terms, like ‘salt pepper’ (or ‘salt, pepper’), the replace
method would replace the whitespace with ‘ %' And fld
Like '% ’, resulting in criteria of ‘fld Like
'% salt %' And fld Like '% pepper
%'’ (where the underlined section was substituted by the replace
method). With more search terms, this creates appropriate ‘And’ criteria
between them all.
Using this JavaScript function, within my template web page I can build
a SQL query
strSQL = "SELECT ID, fld FROM tbl WHERE " & _
buildAndConditions(Request("srchTerm")),"srchfld")
– in this case, using the ASP Request function to
obtain the search terms from an HTML form or a URL parameter. The JavaScript
has to be run server-side, of course (runat="server"
in IIS).
If this query was invoked with a search term of ‘salt, pepper’, it would
generate the SQL query
SELECT ID, fld
FROM tbl
WHERE srchfld Like '* salt *'
AND srchfld Like '* pepper *'
It may have been a hard journey (at least it was for me),
but this approach gives us Google-style whole-word searching on multiple
search terms, within database-sourced websites.
Multiple search terms across multiple fields
Having incorporated a ‘search’ field into the database, it is now a simple
step to enable searching across multiple fields. All that is required
is to combine all fields which need to be searched across into a single
search field.
To do this, the update query which populates the search field becomes
Update tbl Set srchfld = "
" & stripPunctuation(fld1) & " " & stripPunctuation(fld2)
& " "
Further refinements - accented characters
For this project, I also had to enable searches for accented characters.
Many databases will provide case-insensitive searches, but most will not
match accented against un-accented characters. So for a user to search
for detente or détente, they would have to know how the
word had originally been entered into the database.
The ReplaceList function used above can also be used to convert
accented characters to their un-accented equivalents. In this case, it
has to be used to convert accented characters to their un-accented equivalents
both in the database ‘search’ field, and in the web page construcing the
search query. In this way, it doesn’t matter whether the database contains
accented characters or not, or whether the user’s search terms include
accented caracters or not – all the matching is done against un-accented
equivalents.
|