lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jayant Kumar Gandhi <jaya...@gmail.com>
Subject Indexing and searching of sharded/ partitioned databases and tables
Date Wed, 07 Oct 2009 08:31:58 GMT
Hi All,

I am new to Solr. I looking forward for Solr to index data that is
partitioned into multiple databases and tables and have questions
regarding dataconfig.xml. I have given the doubts at the end.

Lets say I have 3 mysql databases each with 3 tables.

Db1 : Tbl1, Tbl2, Tbl3
Db2 : Tbl1, Tbl2, Tbl3
Db3 : Tbl1, Tbl2, Tbl3

All databases have the same number of tables with same table names as
shown above. All tables have exactly the same structure as well. Each
table has three fields:
id, name, category

Since the data is distributed this way, I don't have a way to search
for a particular record using 'name'. I must look for it in all the 9
tables. This is not scalable when lets say I have 20 databases each
with 20 tables, meaning 400 queries needed to find a single record.

Solr seemed like the solution to help.

I followed the wiki tutorials:
http://wiki.apache.org/solr/DataImportHandler
http://wiki.apache.org/solr/DIHQuickStart
http://wiki.apache.org/solr/DataImportHandlerFaq

The following are my config files so far:
================
solrconfig.xml
================
<requestHandler name="/dataimport"
class="org.apache.solr.handler.dataimport.DataImportHandler">
<lst name="defaults">
  <str name="config">data-config.xml</str>
</lst>
</requestHandler>

================
dataconfig.xml (so far)
================
<dataConfig>
  <dataSource type="JdbcDataSource" name="ds1"
driver="com.mysql.jdbc.Driver" url="jdbc:mysql://localhost/Db1"
user="user-name" password="password" />
  <dataSource type="JdbcDataSource" name="ds2"
driver="com.mysql.jdbc.Driver" url="jdbc:mysql://localhost/Db2"
user="user-name" password="password" />
  <dataSource type="JdbcDataSource" name="ds3"
driver="com.mysql.jdbc.Driver" url="jdbc:mysql://localhost/Db3"
user="user-name" password="password" />
  <document>
    <entity name="record11" dataSource="ds1" query="select
id,name,category from Tbl1"></entity>
    <entity name="record12" dataSource="ds1" query="select
id,name,category from Tbl2"></entity>
    <entity name="record13" dataSource="ds1" query="select
id,name,category from Tbl3"></entity>
    <entity name="record21" dataSource="ds2" query="select
id,name,category from Tbl1"></entity>
    <entity name="record22" dataSource="ds2" query="select
id,name,category from Tbl2"></entity>
    <entity name="record23" dataSource="ds2" query="select
id,name,category from Tbl3"></entity>
    <entity name="record31" dataSource="ds3" query="select
id,name,category from Tbl1"></entity>
    <entity name="record32" dataSource="ds3" query="select
id,name,category from Tbl2"></entity>
    <entity name="record33" dataSource="ds3" query="select
id,name,category from Tbl3"></entity>
  </document>
</dataConfig>

================
Doubts/ Questions:
================

- Is this the right away to achieve indexing this data?
- Is there a better way to achieve this? Imagine 20 databases with 20
tables each translates to 400 lines in the XML. This doesn't scale for
something like 200 databases and 200 tables each. Will solr continue
to work/ index properly if I had 40000 entity rows without going out
of memory?
- I will really want that I can search thru the complete database for
a 'name' and do things like 'category' filtering etc easily
independent of the entity name/ datasource. For me they are all
records of the same type.

Thanks and Best Regards,
Jayant

-- 
www.jkg.in | http://www.jkg.in/contact-me/
Jayant Kr. Gandhi

Mime
View raw message