lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From tuedel <sep...@web.de>
Subject Distinct values in multivalued fields
Date Mon, 01 Jul 2013 13:34:54 GMT
Hello everybody,

i have tried to make use of the UniqFieldsUpdateProcessorFactory in 
order to achieve distinct values in multivalued fields. Example below: 

<updateRequestProcessorChain name="uniq_fields"> 
   <processor 
class="org.apache.solr.update.processor.UniqFieldsUpdateProcessorFactory"> 
     <lst name="fields"> 
       <str>title</str> 
       <str>tag_type</str> 
     </lst> 
   </processor> 
   <processor class="solr.RunUpdateProcessorFactory" /> 
</updateRequestProcessorChain> 

<requestHandler name="/update" class="solr.UpdateRequestHandler"> 
   <lst name="defaults"> 
      <str name="update.chain">uniq_fields</str> 
    </lst> 
  </requestHandler> 

However the data being is indexed one by one. This may happen, since a 
document may will get an additional tag in a future update. Unfortunately in 
order to ensure not having any duplicate tags, i was hoping, the 
UpdateProcessorFactory is doing what i want to achieve. In order to actually 
add a tag, i am sending an 

"tag_type" :{"add":"foo"}, which still adds the tag, without questioning if 
its already part of the field. How may i be able to achieve distinct values 
on solr side?! 

In order to achieve this behavior i suggest writing an own processor might
be a solution. However i am uncertain how to do and if it's the proper way. 
Imagine an incoming update - e.g. an update of an existing document having
several multivalued fields without specifying "add" or "set". This task
would cause the corresponding document to get dropped and re-indexed without
keeping any previously added values within the multivalued field. 
Therefore if a field is getting updated and not having the distinct value
being part of the index yet, shall add the value, otherwise ignore it. The
processor needs to define whether a field is getting added to the index or
not in condition of the existing index. Is that achievable on Solr side?! 
Below my current pretty empty processor class:

public class ConditionalSolrUniqFieldValuesProcessorFactory extends
UpdateRequestProcessorFactory {

    @Override
    public UpdateRequestProcessor getInstance(SolrQueryRequest sqr,
SolrQueryResponse sqr1, UpdateRequestProcessor urp) {
        return new ConditionalUniqFieldValuesProcessor(urp);
    }

    class ConditionalUniqFieldValuesProcessor extends UpdateRequestProcessor
{

        public ConditionalUniqFieldValuesProcessor(UpdateRequestProcessor
next) {
            super(next);
        }

        @Override
        public void processAdd(AddUpdateCommand cmd) throws IOException {
            SolrInputDocument doc = cmd.getSolrInputDocument();

            Collection<String> incomingFieldNames = doc.getFieldNames();
            for (String t : incomingFieldNames) {
                /*
                is multivalued
                if (doc.getField(t).) { 
                    If multivalued and already part of index, drop from
index. Otherwise add to multivalued field.
                }
                */
            }
         
        }
    }
}







--
View this message in context: http://lucene.472066.n3.nabble.com/Distinct-values-in-multivalued-fields-tp4074337.html
Sent from the Solr - User mailing list archive at Nabble.com.

Mime
View raw message