lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Wenca <we...@dovolenou.cz>
Subject Search document design problem
Date Tue, 17 Aug 2010 09:30:37 GMT
Hi all,

I would like to use Solr to replace our site search based on MySQL but I 
am not sure how to map entities into the search index. The model is 
described byt the attached UML class diagram.

I have a Hotel that resides in some City in some Country. The hotel has 
various Rooms. For each Room in a Hotel there are some Packages that can 
be purchased by the client.

The entity returned from the search will be mainly the Hotel. E.g.:
- all hotels in USA
- all hotels in New York
- all hotels with name containing "Hilton"
- all hotels in Egypt with packages with all inclusive boarding
   and price lower than 400 and startDate between 2010-08-20
   and 2010-08-30

Our application also uses faceting a lot. e.g:
- # of hotels per country/city
- # of hotels based on room size
     (# of beds - 1 bed - 100 hotels, 2 beds - 200 hotels, ...)
- # of hotels based on all inclusive package prices
     (0-100 EUR, 100-200 EUR, ...)

But there are also use cases when a search should return a Room or 
Package directly.

I'd like to use Data Import Handler to index directly from our database. 
But which approach of mapping entities into the search index to use? It 
seems to me that there are at least 2 ways.

1) One index based on Hotel with multivalued fields for Rooms and 
multivalued fields for Packages. In DIH:
<document>
<entity name="hotel" ...>
    <field name="id" .../>
    <entity name="room" ...>
       <field name="room_id" .../>
       <entity name="package"...>
          <field .../>
       </entity>
    </entity>
</entity>
</document>

But I am not sure whether this will work due to multivalued fields. The 
queries may span accross all the entities - I want only hotels that have 
room with 2 beds and the room has a package with all inclusive boarding 
and price lower than 400.

2) Denormalize data, so that there will be only one index based on 
Packages containing (duplicated) all the data from Room and Hotel and 
then use Field Collapsing on Hotel ID for search results and faceting too.
This would enable also direct search for Packages or Rooms but I am not 
sure about Field Collapsing which is still a kind of beta functionality 
and about potential performance costs.

Can anybody give me some advice or share their experiences?

Thanks a lot
Wenca

Mime
  • Unnamed multipart/mixed (inline, None, 0 bytes)
View raw message