lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Peter Spam <ps...@mac.com>
Subject Re: Very basic questions: Faceted front-end?
Date Wed, 30 Jun 2010 22:59:48 GMT
Wow, thanks Lance - it's really fast now!

The last piece of the puzzle is setting up a nice front-end.  Are there any pre-built front-ends
available, that mimic Google (for example), with facets?


-Peter

On Jun 29, 2010, at 9:04 PM, Lance Norskog wrote:

> To highlight a field, Solr needs some extra Lucene values. If these
> are not configured for the field in the schema, Solr has to re-analyze
> the field to highlight it. If you want faster highlighting, you have
> to add term vectors to the schema. Here is the grand map of such
> things:
> 
> http://wiki.apache.org/solr/FieldOptionsByUseCase
> 
> On Tue, Jun 29, 2010 at 6:29 PM, Erick Erickson <erickerickson@gmail.com> wrote:
>> What are you actual highlighting requirements? you could try
>> things like maxAnalyzedChars, requireFieldMatch, etc....
>> 
>> http://wiki.apache.org/solr/HighlightingParameters
>> has a good list, but you've probably already seen that page....
>> 
>> Best
>> Erick
>> 
>> On Tue, Jun 29, 2010 at 9:11 PM, Peter Spam <pspam@mac.com> wrote:
>> 
>>> To follow up, I've found that my queries are very fast (even with &fq=),
>>> until I add &hl=true.  What can I do to speed up highlighting?  Should I
>>> consider injecting a line at a time, rather than the entire file as a field?
>>> 
>>> 
>>> -Pete
>>> 
>>> On Jun 29, 2010, at 11:07 AM, Peter Spam wrote:
>>> 
>>>> Thanks for everyone's help - I have this working now, but sometimes the
>>> queries are incredibly slow!!  For example, <int name="QTime">461360</int>.
>>>  Also, I had to bump up the min/max RAM size to 1GB/3.5GB for things to
>>> inject without throwing heap memory errors.  However, my data set is very
>>> small!  36 text files, for a total of 113MB.  (It will grow to many TB, but
>>> for now, this is a test).  The largest file is 34MB.
>>>> 
>>>> Therefore, I'm sure I'm doing something wrong :-)  Here's my config:
>>>> 
>>>> 
>>> -----------------------------------------------------------------------------------------------
>>>> 
>>>> For the schema.xml, <types> is all default.  For fields, here are the
>>> only lines that aren't commented out:
>>>> 
>>>>   <field name="id" type="string" indexed="true" stored="true"
>>> required="true" />
>>>>   <field name="body" type="text" indexed="true" stored="true"
>>> multiValued="true"/>
>>>>   <field name="timestamp" type="date" indexed="true" stored="true"
>>> default="NOW" multiValued="false"/>
>>>>   <field name="build" type="string" indexed="true" stored="true"
>>> multiValued="false"/>
>>>>   <field name="device" type="string" indexed="true" stored="true"
>>> multiValued="false"/>
>>>>   <dynamicField name="*" type="ignored" multiValued="true" />
>>>> 
>>>> ... then, for the rest:
>>>> 
>>>> <uniqueKey>id</uniqueKey>
>>>> 
>>>> <!-- field for the QueryParser to use when an explicit fieldname is
>>> absent -->
>>>> <defaultSearchField>body</defaultSearchField>
>>>> 
>>>> <!-- SolrQueryParser configuration: defaultOperator="AND|OR" -->
>>>> <solrQueryParser defaultOperator="AND"/>
>>>> 
>>>> 
>>>> 
>>> -----------------------------------------------------------------------------------------------
>>>> 
>>>> 
>>>> Invoking:  java -Xmx3584M -Xms1024M -jar start.jar
>>>> 
>>>> 
>>>> 
>>> -----------------------------------------------------------------------------------------------
>>>> 
>>>> 
>>>> Injecting:
>>>> 
>>>> #!/bin/sh
>>>> 
>>>> J=0
>>>> for i in `find . -name \*.txt`; do
>>>>       (( J++ ))
>>>>       curl "
>>> http://localhost:8983/solr/update/extract?literal.id=doc$J&fmap.content=body"
>>> -F "myfile=@$i";
>>>> done;
>>>> 
>>>> 
>>>> echo "------------- Committing"
>>>> curl "http://localhost:8983/solr/update/extract?commit=true"
>>>> 
>>>> 
>>>> 
>>> -----------------------------------------------------------------------------------------------
>>>> 
>>>> 
>>>> Searching:
>>>> 
>>>> 
>>> http://localhost:8983/solr/select?q=testing&hl=true&fl=id,score&hl.snippets=5&hl.mergeContiguous=true
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> -Pete
>>>> 
>>>> On Jun 28, 2010, at 5:22 PM, Erick Erickson wrote:
>>>> 
>>>>> try adding &hl.fl=text
>>>>> to specify your highlight field. I don't understand why you're only
>>>>> getting the ID field back though. Do note that the highlighting
>>>>> is after the docs, related by the ID.
>>>>> 
>>>>> Try a (non highlighting) query of just * to verify that you're
>>>>> pointing at the index you think you are. It's possible that
>>>>> you've modified a different index with SolrJ than your web
>>>>> server is pointing at.
>>>>> 
>>>>> Also, SOLR has no way of knowing you're modified your index
>>>>> with SolrJ, so it may not be automatically reopening an
>>>>> IndexReader so your recent changes may not be visible
>>>>> until you force the SOLR reader to reopen.
>>>>> 
>>>>> HTH
>>>>> Erick
>>>>> 
>>>>> On Mon, Jun 28, 2010 at 6:49 PM, Peter Spam <pspam@mac.com> wrote:
>>>>> 
>>>>>> On Jun 28, 2010, at 2:00 PM, Ahmet Arslan wrote:
>>>>>> 
>>>>>>>> 1) I can get my docs in the index, but when I search, it
>>>>>>>> returns the entire document.  I'd love to have it only
>>>>>>>> return the line (or two) around the search term.
>>>>>>> 
>>>>>>> Solr can generate Google-like snippets as you describe.
>>>>>>> http://wiki.apache.org/solr/HighlightingParameters
>>>>>> 
>>>>>> Here's how I commit my documents:
>>>>>> 
>>>>>> J=0;
>>>>>> for i in `find . -name \*.txt`; do
>>>>>>      (( J++ ))
>>>>>>      curl "http://localhost:8983/solr/update/extract?literal.id=doc$J"
>>>>>> -F "myfile=@$i";
>>>>>> done;
>>>>>> 
>>>>>> echo "------------- Committing"
>>>>>> curl "http://localhost:8983/solr/update/extract?commit=true"
>>>>>> 
>>>>>> 
>>>>>> Then, I try to query using
>>>>>> 
>>> http://localhost:8983/solr/select?rows=10&start=0&fl=*,score&hl=true&q=testing
>>>>>> but I only get back the document ID rather than the snippet:
>>>>>> 
>>>>>> <doc>
>>>>>> <float name="score">0.05030759</float>
>>>>>> <arr name="content_type">
>>>>>> <str>text/plain</str>
>>>>>> </arr>
>>>>>> <str name="id">doc16</str>
>>>>>> </doc>
>>>>>> 
>>>>>> I'm using the schema.xml from the "lucid imagination: Indexing text
and
>>>>>> html files" tutorial.
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> -Pete
>>>>>> 
>>>> 
>>> 
>>> 
>> 
> 
> 
> 
> -- 
> Lance Norskog
> goksron@gmail.com


Mime
View raw message