lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Peter Spam <ps...@mac.com>
Subject Re: Very basic questions: Indexing text - working, but slow!
Date Wed, 30 Jun 2010 01:11:37 GMT
To follow up, I've found that my queries are very fast (even with &fq=), until I add &hl=true.
 What can I do to speed up highlighting?  Should I consider injecting a line at a time, rather
than the entire file as a field?


-Pete

On Jun 29, 2010, at 11:07 AM, Peter Spam wrote:

> Thanks for everyone's help - I have this working now, but sometimes the queries are incredibly
slow!!  For example, <int name="QTime">461360</int>.  Also, I had to bump up the
min/max RAM size to 1GB/3.5GB for things to inject without throwing heap memory errors.  However,
my data set is very small!  36 text files, for a total of 113MB.  (It will grow to many TB,
but for now, this is a test).  The largest file is 34MB.
> 
> Therefore, I'm sure I'm doing something wrong :-)  Here's my config:
> 
> -----------------------------------------------------------------------------------------------
> 
> For the schema.xml, <types> is all default.  For fields, here are the only lines
that aren't commented out:
> 
>   <field name="id" type="string" indexed="true" stored="true" required="true" />
>   <field name="body" type="text" indexed="true" stored="true" multiValued="true"/>
>   <field name="timestamp" type="date" indexed="true" stored="true" default="NOW" multiValued="false"/>
>   <field name="build" type="string" indexed="true" stored="true" multiValued="false"/>
>   <field name="device" type="string" indexed="true" stored="true" multiValued="false"/>
>   <dynamicField name="*" type="ignored" multiValued="true" />
> 
> ... then, for the rest:
> 
> <uniqueKey>id</uniqueKey>
> 
> <!-- field for the QueryParser to use when an explicit fieldname is absent -->
> <defaultSearchField>body</defaultSearchField>
> 
> <!-- SolrQueryParser configuration: defaultOperator="AND|OR" -->
> <solrQueryParser defaultOperator="AND"/>
> 
> 
> -----------------------------------------------------------------------------------------------
> 
> 
> Invoking:  java -Xmx3584M -Xms1024M -jar start.jar
> 
> 
> -----------------------------------------------------------------------------------------------
> 
> 
> Injecting:
> 
> #!/bin/sh
> 
> J=0
> for i in `find . -name \*.txt`; do 
> 	(( J++ ))
> 	curl "http://localhost:8983/solr/update/extract?literal.id=doc$J&fmap.content=body"
-F "myfile=@$i"; 
> done;
> 
> 
> echo "------------- Committing"
> curl "http://localhost:8983/solr/update/extract?commit=true"
> 
> 
> -----------------------------------------------------------------------------------------------
> 
> 
> Searching:
> 
> http://localhost:8983/solr/select?q=testing&hl=true&fl=id,score&hl.snippets=5&hl.mergeContiguous=true
> 
> 
> 
> 
> 
> -Pete
> 
> On Jun 28, 2010, at 5:22 PM, Erick Erickson wrote:
> 
>> try adding &hl.fl=text
>> to specify your highlight field. I don't understand why you're only
>> getting the ID field back though. Do note that the highlighting
>> is after the docs, related by the ID.
>> 
>> Try a (non highlighting) query of just * to verify that you're
>> pointing at the index you think you are. It's possible that
>> you've modified a different index with SolrJ than your web
>> server is pointing at.
>> 
>> Also, SOLR has no way of knowing you're modified your index
>> with SolrJ, so it may not be automatically reopening an
>> IndexReader so your recent changes may not be visible
>> until you force the SOLR reader to reopen.
>> 
>> HTH
>> Erick
>> 
>> On Mon, Jun 28, 2010 at 6:49 PM, Peter Spam <pspam@mac.com> wrote:
>> 
>>> On Jun 28, 2010, at 2:00 PM, Ahmet Arslan wrote:
>>> 
>>>>> 1) I can get my docs in the index, but when I search, it
>>>>> returns the entire document.  I'd love to have it only
>>>>> return the line (or two) around the search term.
>>>> 
>>>> Solr can generate Google-like snippets as you describe.
>>>> http://wiki.apache.org/solr/HighlightingParameters
>>> 
>>> Here's how I commit my documents:
>>> 
>>> J=0;
>>> for i in `find . -name \*.txt`; do
>>>      (( J++ ))
>>>      curl "http://localhost:8983/solr/update/extract?literal.id=doc$J"
>>> -F "myfile=@$i";
>>> done;
>>> 
>>> echo "------------- Committing"
>>> curl "http://localhost:8983/solr/update/extract?commit=true"
>>> 
>>> 
>>> Then, I try to query using
>>> http://localhost:8983/solr/select?rows=10&start=0&fl=*,score&hl=true&q=testing
>>> but I only get back the document ID rather than the snippet:
>>> 
>>> <doc>
>>> <float name="score">0.05030759</float>
>>> <arr name="content_type">
>>> <str>text/plain</str>
>>> </arr>
>>> <str name="id">doc16</str>
>>> </doc>
>>> 
>>> I'm using the schema.xml from the "lucid imagination: Indexing text and
>>> html files" tutorial.
>>> 
>>> 
>>> 
>>> -Pete
>>> 
> 


Mime
View raw message