lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Peter Spam <ps...@mac.com>
Subject Re: Very basic questions: Indexing text - working, but slow!
Date Tue, 29 Jun 2010 18:07:36 GMT
Thanks for everyone's help - I have this working now, but sometimes the queries are incredibly
slow!!  For example, <int name="QTime">461360</int>.  Also, I had to bump up the
min/max RAM size to 1GB/3.5GB for things to inject without throwing heap memory errors.  However,
my data set is very small!  36 text files, for a total of 113MB.  (It will grow to many TB,
but for now, this is a test).  The largest file is 34MB.

Therefore, I'm sure I'm doing something wrong :-)  Here's my config:

-----------------------------------------------------------------------------------------------

For the schema.xml, <types> is all default.  For fields, here are the only lines that
aren't commented out:

   <field name="id" type="string" indexed="true" stored="true" required="true" />
   <field name="body" type="text" indexed="true" stored="true" multiValued="true"/>
   <field name="timestamp" type="date" indexed="true" stored="true" default="NOW" multiValued="false"/>
   <field name="build" type="string" indexed="true" stored="true" multiValued="false"/>
   <field name="device" type="string" indexed="true" stored="true" multiValued="false"/>
   <dynamicField name="*" type="ignored" multiValued="true" />

... then, for the rest:

 <uniqueKey>id</uniqueKey>

 <!-- field for the QueryParser to use when an explicit fieldname is absent -->
 <defaultSearchField>body</defaultSearchField>

 <!-- SolrQueryParser configuration: defaultOperator="AND|OR" -->
 <solrQueryParser defaultOperator="AND"/>


-----------------------------------------------------------------------------------------------


Invoking:  java -Xmx3584M -Xms1024M -jar start.jar


-----------------------------------------------------------------------------------------------


Injecting:

#!/bin/sh

J=0
for i in `find . -name \*.txt`; do 
	(( J++ ))
	curl "http://localhost:8983/solr/update/extract?literal.id=doc$J&fmap.content=body" -F
"myfile=@$i"; 
done;


echo "------------- Committing"
curl "http://localhost:8983/solr/update/extract?commit=true"


-----------------------------------------------------------------------------------------------


Searching:

http://localhost:8983/solr/select?q=testing&hl=true&fl=id,score&hl.snippets=5&hl.mergeContiguous=true





-Pete

On Jun 28, 2010, at 5:22 PM, Erick Erickson wrote:

> try adding &hl.fl=text
> to specify your highlight field. I don't understand why you're only
> getting the ID field back though. Do note that the highlighting
> is after the docs, related by the ID.
> 
> Try a (non highlighting) query of just * to verify that you're
> pointing at the index you think you are. It's possible that
> you've modified a different index with SolrJ than your web
> server is pointing at.
> 
> Also, SOLR has no way of knowing you're modified your index
> with SolrJ, so it may not be automatically reopening an
> IndexReader so your recent changes may not be visible
> until you force the SOLR reader to reopen.
> 
> HTH
> Erick
> 
> On Mon, Jun 28, 2010 at 6:49 PM, Peter Spam <pspam@mac.com> wrote:
> 
>> On Jun 28, 2010, at 2:00 PM, Ahmet Arslan wrote:
>> 
>>>> 1) I can get my docs in the index, but when I search, it
>>>> returns the entire document.  I'd love to have it only
>>>> return the line (or two) around the search term.
>>> 
>>> Solr can generate Google-like snippets as you describe.
>>> http://wiki.apache.org/solr/HighlightingParameters
>> 
>> Here's how I commit my documents:
>> 
>> J=0;
>> for i in `find . -name \*.txt`; do
>>       (( J++ ))
>>       curl "http://localhost:8983/solr/update/extract?literal.id=doc$J"
>> -F "myfile=@$i";
>> done;
>> 
>> echo "------------- Committing"
>> curl "http://localhost:8983/solr/update/extract?commit=true"
>> 
>> 
>> Then, I try to query using
>> http://localhost:8983/solr/select?rows=10&start=0&fl=*,score&hl=true&q=testing
>> but I only get back the document ID rather than the snippet:
>> 
>> <doc>
>> <float name="score">0.05030759</float>
>> <arr name="content_type">
>> <str>text/plain</str>
>> </arr>
>> <str name="id">doc16</str>
>> </doc>
>> 
>> I'm using the schema.xml from the "lucid imagination: Indexing text and
>> html files" tutorial.
>> 
>> 
>> 
>> -Pete
>> 


Mime
View raw message