lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jan Høydahl <jan....@cominvent.com>
Subject Re: Can Solr solve this simple problem?
Date Tue, 17 Apr 2012 10:38:03 GMT
1. Just trust that Lucene will perform :)
   Incremental updates are actually stored in separate new index segments with own caches,
so all the old existing data is left un-touched with caches in place.

2. Please explain what you expect from "semantic search" which is an overloaded word.

3. On http://wiki.apache.org/solr/PublicServers the only one saying so explicitly is Jeeran
- I'm sure others can fill in with more examples

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com
Solr Training - www.solrtraining.com

On 17. apr. 2012, at 12:10, Alexandr Bocharov wrote:

> Thanks for your replies, you're good expert :)
> I've read documentation on Solr basicaly, I'm familiar with it around 2
> days.
> The documentation is very huge at first sight :). Me and my company is
> being deciding to use Solr or other solution.
> Maybe you're right about re-implementing our sorting functions to something
> new.
> 
> 1. If index is stored at disk, what way good performance is achieved (if
> index changes frequently, ~50,000 - 100,000 records are updating each 10
> minutes, so maybe caching won't be effective)?
> 2. What can you say about semantic search Solr capabilities? Are there any
> examples of it in production?
> 3. Can you please give some examples projects/sites with Solr 4.0 usage in
> production?
> 
> 
> 2012/4/17 Jan Høydahl <jan.asf@cominvent.com>
> 
>> Hi,
>> 
>> You have many basic questions about search. Can I recommend one of the
>> books? http://lucene.apache.org/solr/books.html
>> Also, you'll find a lot of answers on the Solr WIKI:
>> http://wiki.apache.org/solr/ if you're not aware of it.
>> 
>> I think Solr may solve your performance problems well.
>> Whether it's the right tool for the job depends on several factors.
>> Also, sometimes it is useful to step back and think fresh. Perhaps the
>> reason why you implemented things like you did was technical reasons driven
>> by your DB capabilities.
>> When re-implementing on top of Solr, perhaps there are better ways to do
>> what you REALLY wanted instead of limiting yourself to the ORDER BY syntax
>> etc.
>> One of Solr's strengths is relevancy and FunctionQueries and it can do
>> amazing things :)
>> 
>> Further answers below..
>> 
>> --
>> Jan Høydahl, search solution architect
>> Cominvent AS - www.cominvent.com
>> Solr Training - www.solrtraining.com
>> 
>> On 17. apr. 2012, at 07:20, Alexandr Bocharov wrote:
>> 
>>> Thanks for your reply :)
>>> I have some new questions now:
>>> 1. How stable is trunk version? Has anyone used it on any kind of
>> highload
>>> project in production?
>> It's stable. Used in production many places. Soon expected in alpha or
>> beta release
>>> 2. Does version 3.6 support near real time index update?
>> No
>>> 3. What is scheme of Solr index storing? Is it all in memory for each
>> shard
>>> or in disk with caching for frequently asked queries in memory?
>> On disk but with many caching optimizations
>>> 4. The best practice for index updating is - to do delta imports each 5
>>> minutes for example, and once a day - full rebuild index, does it take
>> long
>>> time for ~100 mln users? Am I right?
>> You can do deltas only, as often as you choose. Solr will handle the
>> backend details
>>> 5. Does sharding and replications have native support in Solr, so
>> everyting
>>> I need to care about is config file for nodes? Are there any limitations
>> of
>>> usage such sorting if we use sharding?
>> Yes, sharding and replication is natively supported. See the Wiki
>>> The reason why we want to move from our DB search scheme (data is sharded
>>> into small tables at several servers and managed in code) is that:
>>> 1. response time of our search isn't what we need (3-5 s now in
>> production,
>>> we want <1 s)
>>> 2. growing amount of data
>>> 3. we want automatically clustering any amount of data and search by it,
>>> without need to care about how data stores and does it has durability or
>> not
>>> 
>>> That's why we also looking other solutions with autosharding of huge
>> amount
>>> of data with ability to make such types of query and sorting (thinking
>>> about Mysql Cluster, but it's not stable yet, or Oracle Cluster). If
>> anyone
>>> can give advice for such technology, I'll be glad to hear it.
>> What do you expect from "Autosharding"?
>>> 
>>> 2012/4/17 Jan Høydahl <jan.asf@cominvent.com>
>>> 
>>>>> Hi everyone :)
>>>> 
>>>> Hi :)
>>>> 
>>>>> So, these are my 3 questions:
>>>>> 1. Does Solr provide searching among different count fields with
>>>> different
>>>>> types like in WHERE condition?
>>>> 
>>>> Yes. As long as these are not full-text you should use filter queries
>> for
>>>> these, e.g.
>>>> &q=*:*
>>>> &fq=country:USA
>>>> &fq=language:SPA
>>>> &fq=age:[30 TO 40]
>>>> &fq=(bool_field1:1 OR bool_field2:1)
>>>> 
>>>> The reason why I put multiple "fq" instead of one long is to optimize
>> for
>>>> caching of filters
>>>> 
>>>>> 2. Does Solr provide such sorting, that depends on other fields (like
>>>> sums
>>>>> in ORDER BY), other words - does it provide any kind of function, which
>>>> is
>>>>> used to sort results from q1?
>>>> 
>>>> Yes. In trunk version you can sort by function which can do sums and all
>>>> crezy things
>>>> &sort=sum(product(has_photo,10),if(exists(query($agequery)),50,0))
>>>> asc&agequery=age:[53 TO *]
>>>> See http://wiki.apache.org/solr/FunctionQuery for more functions
>>>> 
>>>> But you could also to much of this through boost queries
>>>> &sort=score desc
>>>> &bq=language:FRA^50
>>>> %bq=age:[53 TO *]^20
>>>> 
>>>>> 3. Does Solr provide realtime index updating or updating every N
>> minutes?
>>>> 
>>>> Sure, there is Near Real-time indexing in TRUNK (coming 4.0)
>>>> 
>>>> Jan
>> 
>> 


Mime
View raw message