lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Koji Sekiguchi <k...@r.email.ne.jp>
Subject uima fieldMappings and solr dynamicField
Date Fri, 06 May 2011 11:15:32 GMT
Hello,

I'd like to use dynamicField in feature-field mapping of uima update
processor. It doesn't seem to be acceptable currently. Is it a bad idea
in terms of use of uima? If it is not so bad, I'd like to try a patch.

Background:

Because my uima annotator can generate many types of named entity from
a text, I don't want to implement so many types, but one type "NamedEntity":

<typeSystemDescription>
  <types>
    <typeDescription>
      <name>com.rondhuit.uima.next.NamedEntity</name>
      <description/>
      <supertypeName>uima.tcas.Annotation</supertypeName>
      <features>
        <featureDescription>
          <name>name</name>
          <description/>
          <rangeTypeName>uima.cas.String</rangeTypeName>
        </featureDescription>
        <featureDescription>
          <name>entity</name>
          <description/>
          <rangeTypeName>uima.cas.String</rangeTypeName>
        </featureDescription>
      </features>
    </typeDescription>
  </types>
</typeSystemDescription>

sample extracted named entities:

name="PERSON", entity="Barack Obama"
name="TITLE", entity="the President"

Now, I'd like to map these named entities to Solr fields like this:

PERSON_S:"Barack Obama"
TITLE_S:"the President"

Because the type of name (PERSON, TITLE, etc.) can be so many,
I'd like to use dynamicField *_s. And where * is replaced by the name
feature of NamedEntity.

I think this is natural requirement from Solr view point, but I'm
not sure my uima annotator implementation is correct or not. In other
words, should I implement many types for each entity types?
(e.g. PersonEntity, TitleEntity, ... instead of NamedEntity)

Thank you!

Koji
-- 
http://www.rondhuit.com/en/

Mime
View raw message