lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Wendy <w...@rcsb.rutgers.edu>
Subject Re: help with DIH transformer to add a suffix to column names
Date Thu, 25 Aug 2016 15:32:22 GMT
Hi Alex,

Thank you for your response.
It worked. I am very happy for the results. I reports the steps below. The
purpose is to create a dynamic field to simplify field definition in
managed-schema file and to simplify field rank in solrconfig.xml file.

STEPS:

1. file creation of db-data-config.xml
<dataConfig>

<dataSource name ="data_source_?????"
            type="JdbcDataSource"
            driver="com.mysql.jdbc.Driver"
           
url="jdbc:mysql://machineName:3306/databaseName?zeroDateTimeBehavior=convertToNull"
            user="?????"
            password="?????" /> 


<document name="db-fulltext-index">


<entity name="pdb_entry" pk="pdb_id_stem"
transformer="my.solr.transformer.FieldTransformer"
               query=" * from pdb_entry where status_code = 'REL' " >
           
  <entity  name="citation"  onError="continue" 
transformer="my.solr.transformer.FieldTransformer"
		query="select title  from citation where
Structure_ID='${pdb_entry.pdb_id_stem}' and id = 'primary' "  >
                         
  </entity>

  <entity  name="citation_author" onError="continue" 
transformer="my.solr.transformer.FieldTransformer"
		query="select name  from citation_author where
Structure_ID='${pdb_entry.pdb_id_stem}' and citation_id = 'primary' "  >        
  </entity>
</entity> 
  </document>
</dataConfig>

2. Modification of solrconfig.xml file: notice of the ranking

 

 <lib dir="${solr.install.dir:../../../..}/dist/" 
regex="solr-dataimporthandler-\d.*\.jar" />
 <lib dir="${solr.install.dir:../../../..}/dist/"
regex="mysql-connector-java-5.0.7-bin.jar" />
 <lib dir="${solr.install.dir:../../../..}/dist/"
regex="solr-rcsb-plugin.jar" />


<requestHandler name="/dataimport"
class="org.apache.solr.handler.dataimport.DataImportHandler">
<lst name="defaults">
<str name="config">db-data-config.xml</str>
</lst>
</requestHandler>

  <requestHandler name="/search" class="solr.SearchHandler">
  <lst name="defaults">
      <str name="indent">true</str>      
          <str name="echoParams">explicit</str>
              <str name="defType">edismax</str>
               <str name="qf">pdb_id_stem^20.0</str>
               <str name="qf">title_stem^20.0</str>
                <str name="qf">keywords_stem^10.0</str>
                <str name="qf">*_stem^0.3</str> 
                <str name="qf">rest_fields_stem ^0.3</str>                   
    
                <str name="mm">7</str>
                <int name="rows">1000</int>
                <str name="df">text</str> 
  </lst>
 </requestHandler>

3. Modification of managed-schema file: Notice of change <uniqueKey>,
creation of  a dynamic field "*_stem",,, 

 <field name="pdb_id_stem" type="string" indexed="true" stored="true"
required="true" multiValued="false" />

  <field name="rest_fields_stem" type="pdb_text_stem" indexed="true"
stored="true" multiValued="true"/>
  <copyField source="*_stem" dest="rest_fields_stem"/>

 <dynamicField name="*_stem"  type="pdb_text_stem"    indexed="true" 
stored="true"/>

<fieldtype name="pdb_text_stem" class="solr.TextField"
positionIncrementGap="100">
  <analyzer type="index">
    <tokenizer class="solr.WhitespaceTokenizerFactory"/> 
      <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt"
ignoreCase="true" expand="true" />
      <filter class="solr.StopFilterFactory" ignoreCase="true"/>
      <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1"
generateNumberParts="1" catenateWords="0" catenateNumbers="0"
catenateAll="0"/>
      <filter class="solr.LowerCaseFilterFactory"/>
      <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
      <filter class="solr.PorterStemFilterFactory"/>
  </analyzer>
  <analyzer type="query">
      <tokenizer class="solr.WhitespaceTokenizerFactory"/>
      <filter class="solr.StopFilterFactory" ignoreCase="true"/>
       <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1"
generateNumberParts="1" catenateWords="0" catenateNumbers="0"
catenateAll="0"/>
      
       <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt"
ignoreCase="true" expand="true" />
      <filter class="solr.LowerCaseFilterFactory"/>
      <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
      <filter class="solr.PorterStemFilterFactory"/>
  </analyzer>
 </fieldtype>

 <uniqueKey>pdb_id_stem</uniqueKey>

4. creation of a customer transformer:

package my.solr.transformer;

import java.util.ArrayList;
import java.util.List;
import java.util.Map;
import java.util.Set;

import org.apache.solr.handler.dataimport.Context;
import org.apache.solr.handler.dataimport.DataImporter;
import org.apache.solr.handler.dataimport.Transformer;

public class FieldTransformer extends Transformer  {
    public Map<String, Object> transformRow(Map<String, Object> row, Context
context) {
	
		List<Map&lt;String, String>> fields = ((Context)
context).getAllEntityFields();
		
		int rowSize = row.size();
		//System.out.println("row size to start = " + rowSize); 
		
		//Converting HashMap keys into ArrayList
        Set<String> keySet = row.keySet();
        List<String> keyList = new ArrayList<String>(keySet);
		
        for (int i = 0; i < rowSize; i++) {
			String columnName = keyList.get(i);
			Object value = row.get(columnName);
			if (value != null && !value.toString().trim().equals("")) {
	               row.put(columnName + "_stem", value.toString().trim());
	               //System.out.println("value  = " + value.toString().trim());
	               //System.out.println("row.size =   " + row.size());
	               
	         };
	         row.remove(columnName);
			
		}
		System.out.println("row size ended = " + row.size()); 
		
        return row;
        
    }
    
    
}

5. NOTE: when using customer transformer, need to add the following two jar
fiels to this destination:
cp  solr-dataimporthandler-6.1.0.jar  
  /opt/solr-6.1.0/server/solr-webapp/webapp/WEB-INF/lib

cp  solr-dataimporthandler-extras-6.1.0.jar 
  /opt/solr-6.1.0/server/solr-webapp/webapp/WEB-INF/lib

6. screen shot:
<http://lucene.472066.n3.nabble.com/file/n4293261/Screenshot-20.png> 




--
View this message in context: http://lucene.472066.n3.nabble.com/help-with-DIH-transformer-to-add-a-suffix-to-column-names-tp4292448p4293261.html
Sent from the Solr - User mailing list archive at Nabble.com.

Mime
View raw message