One of my covert projects I was working on, was a Learning Site Search. The concept being that to help the search engine learn as it goes, rather than trying to force it to have all the answers at first.
Let's face it, most site searches are not going to be perfect. They can get somewhat close, but only after months of working on the data, and still it will have problems.
1. To create a good site search, you have to know what the customers want, and how they'll phrase their searches
2. You have to know how to identify when the customer is not getting the results they want.
I wanted this site search, to have some unusual features, that helped it be more usable and better marketing.
First the site search would be based on SQL Server 2000's Full-Text Indexing, which I was very comfortable with.
Then there would be the toolbars/configuration of that search to help a customer modify the search, such as going from page to page, highlighting keywords, etc.
Now my first objective, was to relevancy, I had decided, that after looking at the research on site search usability, that most customers' really do not look past the 2nd page. I mean how many of us have that much time to waste, looking for something that's hard to find or isn't organized well?
Then I decided to have 20 results per page, so that no matter what search result, the focus of the merchandising team was on helping to make those two pages the most relevant of results.
After having worked with other search firms, such as Celebros, for which they have some very nice features, such as spell replacement. I wanted to add that feature to this search, but considering that with a product line of 38400, it would be pretty hard to pre-determine all spelling corrections, synonyms, etc.
So this is the backbone of the Learning Site Search, the need for two forms of feedback. One set of feedback has to be from the customers, to help identify keywords that don't get proper results. Then the second has to be from internal programming identifying when we get too many results, or irrelevant results.
So we now have a site search that after being notified either by customer or internal programming when a keyword is returning bad results. Then there has to be a way to configure or customize the results.
I created several different tables.
Synonyms - What if this keyword in a different way of saying it, gets the better results, so show the original word, but use this substitute to get the better results.
Spelling - To show the original word, and use the correction to substitute it.
Rank Drop Off - This is a new feature, I remember reading where after rank % between each record gets too high, it just becomes too irrelevant. So that for example for x keyword, any result below 30% was of no value,then we could stop showing results 30% or below. It makes it easier to have us show only the most RELEVANT results.
Rank Multipliers - I am searching thru 2 different tables in a total of 3 different ways, so to provide another way to customize results, we can dynamically change the priority of which table, which method via these dynamic rank multipliers.
Keyword Status - To help identify which keywords have which problems, including a space for comments by Merchandising Team.
Now Full-Text Indexing has a built-in rank counting field, to determine the density of a keyword in certain fields, and in addition I was searching 3 different methods. What I needed next was to convert to a percentage mode of ranking.
So after complex querying, I grabbed the top ranking result item, and then divided, following results by that, giving me a 100-0% results in percentage. Which then prepares for my rank drop off needs.
Now back to feedback methods, we did not want to ask for comments for customers, but just to click on 1-4 links to help us identify if this result was incorrect in any way...
Now all that's setup, then it's just a matter of testing it, against your previous search.
Now this will grow in data, as it gets customized. But remember, this will learn, and grow, and require less effort over time.
P.S. As you may know, I am currently looking for work, I would love to be able to help you with your site search, to be more profitable and to be more relevant. Being a coder does not prevent you from identifying with your customers.