Improved search
13 years ago
Posts: 298
I'd like to request a "better" search engine.
The current one seems to only do a full-text search and display results ordered by their ranking.
i.e. via MySQL.
SELECT * FROM titles WHERE MATCH (title) AGAINST ('dragon')
A test against local cache confirms this.
Which is a start, but lacks some finishing touches.
What I'd like is:
- a grouping of titles by manga. Cursive text only indicates that it's an alias, but not of what
- results sorted by how closely related they are to the original search
- Optionally with (user-definable) minimal "match" value, i.e. 70%
- Optionally with small thumbnail on the side, I (and many others) are more inclined to recognise things by vision than by name (i.e. I remember album covers, not so much the title or artist).
How I would do it:
Or more like how I am currently doing it in my custom MU search.
SELECT id,((CASE WHEN MATCH(title) AGAINST('dragon') THEN 1 WHEN title LIKE '%dragon%' THEN 0.5 ELSE 0 END))relevance,title FROM titles WHERE title LIKE '%dragon%' HAVING relevance > 0 ORDER BY relevance DESC
(add more if necessary -- optionally limit maximum number of keywords for better performance, i.e.)
SELECT id,((CASE WHEN MATCH(title) AGAINST('dragon') THEN 1 WHEN title LIKE '%dragon%' THEN 0.5 ELSE 0 END) + (CASE WHEN MATCH(title) AGAINST('eye') THEN 1 WHEN title LIKE '%eye%' THEN 0.5 ELSE 0 END))relevance,title FROM titles WHERE title LIKE '%dragon%' AND title LIKE '%eye%' HAVING relevance > 0 ORDER BY relevance DESC
This gives me results in an understandable format: the ID, the title, and the relevance (the higher the value, the better the match). I do both a MATCH and a LIKE, MATCH only returns full matches, LIKE also partial ones.
Since many titles contain the word "dragon", many titles will have a relevance of 1 (not useful for sorting).
I then loop over all found titles, and calculate the percentage of the match (i.e. "dragon" = 100%, "dragon eye" = 67%). I've chosen to ignore all spaces, dashes, and punctuation marks for this (don't forget to convert accented characters to their base equivalents).
Example (note, I work with utf8-general-ci collated fields):
// $str: manga title from DB
// $with: array of search keywords
function getMatchPercentage($str, $with)
{
global $normalizeChars;
$str = strtr($str,$normalizeChars); // convert accents to ascii
if ( strpos($str,"(") !== false ) $str = substr($str,0,strpos($str,"(")); // remove a.o. (Novel)
$str = str_replace(array(" "," ","\t","~","-",":",",",""","&","?","!"),"",mb_convert_case($str,MB_CASE_LOWER,"utf-8")); // clean the title of all extraneous characters and convert to lower-case
$chars = mb_strlen($str); // no. of real characters (kanji/multibyte support)
$percentile = 100 / $chars;
$match = 0;
foreach ( $with as $term )
{
// calculate difference between title with term and without it
$term = strtr($term,$normalizeChars);
$term = stripslashes($term);
$tmp = str_replace($term,"",$str);
$match+= ( $chars - mb_strlen($tmp) );
}
return round($match * $percentile); // return nice round value between 0 and 100
}
Now simply all that needs to be done is the ordering of the results by match percentage and grouping them by ID.
An example:
Custom MangaUpdates Search (ugly style, prototype version) also displays all associated titles with returned manga.
or
MCD Search (only searches through what's available on MCD)
- SQL has proven to be working without any noticeable hitches and has good performance so far (not stress-tested it, but should hold up).
I hope I've incited anyone with access to the MangaUpdates source to make a few "small" modifications to improve (in my eyes) the search engine.
I'm willing to help out if necessary.
Until then, everyone is free to use my custom search (barring I receive a take-down request from the MU staff)... Which will improve with each search made (I don't go deeper than what's on the first search page, aliases are only added when they appear there).