manga04_jpg login_tab_left_jpg
Username:   Password:    Forgot Password?
App
Try out our new iPhone application!
App
Manga Poll
For background art in manga, you prefer...
Lots of detail
An average amount of detail
Very little detail
Sparkles and patterns that fit the mood is just fine
I don't really pay attention to the backgrounds
I don't care
 
See Old Polls

Manga is the Japanese equivalent of comics
with a unique style and following. Join the revolution! Read some manga today!

Coded in ConTEXT

Join #baka-updates @irc.irchighway.net

RSS Feed
 
center_left_tab Forums center_right_tab

You are now viewing a topic.

Improved search

Back to Suggestions & Bugs


You must be registered to post!
From User Message Body
Joentjuh
Post #553662
user avatar
Member

5:33 pm, May 30 2012
Posts: 293


I'd like to request a "better" search engine.

The current one seems to only do a full-text search and display results ordered by their ranking.
i.e. via MySQL.
SELECT * FROM titles WHERE MATCH (title) AGAINST ('dragon')
A test against local cache confirms this.

Which is a start, but lacks some finishing touches.

What I'd like is:
- a grouping of titles by manga. Cursive text only indicates that it's an alias, but not of what
- results sorted by how closely related they are to the original search
- Optionally with (user-definable) minimal "match" value, i.e. 70%
- Optionally with small thumbnail on the side, I (and many others) are more inclined to recognise things by vision than by name (i.e. I remember album covers, not so much the title or artist).

How I would do it:
Or more like how I am currently doing it in my custom MU search.

SELECT id,((CASE WHEN MATCH(title) AGAINST('dragon') THEN 1 WHEN title LIKE '%dragon%' THEN 0.5 ELSE 0 END))relevance,title FROM titles WHERE title LIKE '%dragon%' HAVING relevance > 0 ORDER BY relevance DESC

(add more if necessary -- optionally limit maximum number of keywords for better performance, i.e.)
SELECT id,((CASE WHEN MATCH(title) AGAINST('dragon') THEN 1 WHEN title LIKE '%dragon%' THEN 0.5 ELSE 0 END) + (CASE WHEN MATCH(title) AGAINST('eye') THEN 1 WHEN title LIKE '%eye%' THEN 0.5 ELSE 0 END))relevance,title FROM titles WHERE title LIKE '%dragon%' AND title LIKE '%eye%' HAVING relevance > 0 ORDER BY relevance DESC

This gives me results in an understandable format: the ID, the title, and the relevance (the higher the value, the better the match). I do both a MATCH and a LIKE, MATCH only returns full matches, LIKE also partial ones.
Since many titles contain the word "dragon", many titles will have a relevance of 1 (not useful for sorting).
I then loop over all found titles, and calculate the percentage of the match (i.e. "dragon" = 100%, "dragon eye" = 67%). I've chosen to ignore all spaces, dashes, and punctuation marks for this (don't forget to convert accented characters to their base equivalents).

Example (note, I work with utf8-general-ci collated fields):
// $str: manga title from DB
// $with: array of search keywords
function getMatchPercentage($str, $with)
{
global $normalizeChars;
$str = strtr($str,$normalizeChars); // convert accents to ascii
if ( strpos($str,"(") !== false ) $str = substr($str,0,strpos($str,"(")); // remove a.o. (Novel)
$str = str_replace(array(" "," ","\t","~" ,"-",":",",","\" ","&","?","!") ,"",mb_convert_case($str,MB_CASE_LOWER," utf-8")); // clean the title of all extraneous characters and convert to lower-case
$chars = mb_strlen($str); // no. of real characters (kanji/multibyte support)
$percentile = 100 / $chars;
$match = 0;
foreach ( $with as $term )
{
// calculate difference between title with term and without it
$term = strtr($term,$normalizeChars);
$term = stripslashes($term);
$tmp = str_replace($term,"",$str);
$match+= ( $chars - mb_strlen($tmp) );
}
return round($match * $percentile); // return nice round value between 0 and 100
}

Now simply all that needs to be done is the ordering of the results by match percentage and grouping them by ID.

An example:
Custom MangaUpdates Search (ugly style, prototype version) also displays all associated titles with returned manga.
or
MCD Search (only searches through what's available on MCD)

* SQL has proven to be working without any noticeable hitches and has good performance so far (not stress-tested it, but should hold up).


I hope I've incited anyone with access to the MangaUpdates source to make a few "small" modifications to improve (in my eyes) the search engine.
I'm willing to help out if necessary.
Until then, everyone is free to use my custom search (barring I receive a take-down request from the MU staff)... Which will improve with each search made (I don't go deeper than what's on the first search page, aliases are only added when they appear there).

________________
Who they, what are, and why?
- Manga Cover Database -
You must be registered to post!

Back to Suggestions & Bugs  Back to Top

Search This Topic:
 
Manga Search
MANGA Fu
MEMBERS
TEAM-BU


footer