lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Wenca <we...@dovolenou.cz>
Subject Re: Search document design problem
Date Tue, 17 Aug 2010 09:52:02 GMT
Oops, it seems that the mailing list does not support attachments. 
Here's a link to the diagram image:

http://dl.dropbox.com/u/10214557/model.png

Wenca

Dne 17.8.2010 11:30, Wenca napsal(a):
> Hi all,
>
> I would like to use Solr to replace our site search based on MySQL but I
> am not sure how to map entities into the search index. The model is
> described byt the attached UML class diagram.
>
> I have a Hotel that resides in some City in some Country. The hotel has
> various Rooms. For each Room in a Hotel there are some Packages that can
> be purchased by the client.
>
> The entity returned from the search will be mainly the Hotel. E.g.:
> - all hotels in USA
> - all hotels in New York
> - all hotels with name containing "Hilton"
> - all hotels in Egypt with packages with all inclusive boarding
> and price lower than 400 and startDate between 2010-08-20
> and 2010-08-30
>
> Our application also uses faceting a lot. e.g:
> - # of hotels per country/city
> - # of hotels based on room size
> (# of beds - 1 bed - 100 hotels, 2 beds - 200 hotels, ...)
> - # of hotels based on all inclusive package prices
> (0-100 EUR, 100-200 EUR, ...)
>
> But there are also use cases when a search should return a Room or
> Package directly.
>
> I'd like to use Data Import Handler to index directly from our database.
> But which approach of mapping entities into the search index to use? It
> seems to me that there are at least 2 ways.
>
> 1) One index based on Hotel with multivalued fields for Rooms and
> multivalued fields for Packages. In DIH:
> <document>
> <entity name="hotel" ...>
> <field name="id" .../>
> <entity name="room" ...>
> <field name="room_id" .../>
> <entity name="package"...>
> <field .../>
> </entity>
> </entity>
> </entity>
> </document>
>
> But I am not sure whether this will work due to multivalued fields. The
> queries may span accross all the entities - I want only hotels that have
> room with 2 beds and the room has a package with all inclusive boarding
> and price lower than 400.
>
> 2) Denormalize data, so that there will be only one index based on
> Packages containing (duplicated) all the data from Room and Hotel and
> then use Field Collapsing on Hotel ID for search results and faceting too.
> This would enable also direct search for Packages or Rooms but I am not
> sure about Field Collapsing which is still a kind of beta functionality
> and about potential performance costs.
>
> Can anybody give me some advice or share their experiences?
>
> Thanks a lot
> Wenca

Mime
View raw message