uima-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Richard Eckart de Castilho <...@apache.org>
Subject Re: UIMA shared external resources with config parameters
Date Sun, 23 Oct 2016 10:43:22 GMT
> On 19.10.2016, at 20:10, Marshall Schor <msa@schor.com> wrote:
> 
> 1) Specifying "object A" to a component.  My thinking did not go beyond what is
> done today for external shared resources.  UIMA provides an
> ExternalResourceDescription as part of a component; this is eventually fed to
> UIMA's "produceResource" methods to produce an instance of the resource.
> 
> So, I was thinking that you would specify "object A to a component" by just
> including the external resource description as part of the component's metadata.
> 
> 2) How to specify an "object B" as a parameter to "object A":
> 2a) Object A gets to define a key (or keys) for its parameters.  Let's say uses
> "myObjB" as the key.
> Object A gets to decide how to interpret the value for this key coming from the
> external settings file.  (Not architected by UIMA).
> 2b) At a time chosen by Object A, when object A is "running", it reads the value
> of the key "myObjB" from the external settings file, and then interprets this in
> any way it chooses, and then uses that to define Object B (again, this would be
> arbitrary, not architected by UIMA)

I would still like to see an example of how these parameters that are *not*
"external resource parameters" but "overrides" are specified in code.

For external resources, based on the current "external resource parameters"
mechanism, uimaFIT defines a convenient way of composing resources and 
components, specifically providing parameter values *locally* to each
single declared resource, i.e. there is *no chance for conflict* between
e.g. multiple instances of the same resource type being used in a pipeline.
Below is an example how external resources (even nested ones) can be bound
to an analysis engine. The parameter values of the external resources are
provided locally for each resource. Mind that calling createExternalResourceDescription
twice for the same class creates two distinct external resource instances of that
class which can be bound independently.

----
createEngineDescription(ExtractFeaturesConnector.class,
        ExtractFeaturesConnector.PARAM_OUTPUT_DIRECTORY, outputPath,
        ExtractFeaturesConnector.PARAM_DATA_WRITER_CLASS, WekaDataWriter.class,
        ExtractFeaturesConnector.PARAM_LEARNING_MODE, Constants.LM_SINGLE_LABEL,
        ExtractFeaturesConnector.PARAM_FEATURE_MODE, Constants.FM_DOCUMENT,
        ExtractFeaturesConnector.PARAM_ADD_INSTANCE_ID, true,
        ExtractFeaturesConnector.PARAM_FEATURE_FILTERS, new String[] {},
        ExtractFeaturesConnector.PARAM_IS_TESTING, false,
        ExtractFeaturesConnector.PARAM_FEATURE_EXTRACTORS,
==>     asList(createExternalResourceDescription(EmoticonRatio.class,
                EmoticonRatio.PARAM_UNIQUE_EXTRACTOR_NAME, "123"),
==>             createExternalResourceDescription(NumberOfHashTags.class,
                        NumberOfHashTags.PARAM_UNIQUE_EXTRACTOR_NAME, "1234"))));
---

My understanding is that you want to deprecate the existing "parameters" mechanism
in external resources in favor of the "external override" mechanism. Hence,
I would like to know how to implement creating resource descriptions, binding
them, and setting their parameters would be done programmatically relying just
on the "override" mechanism and not on the existing "parameters" mechanism.

More on this in the last section below.

> 3) how to set non-String parameters?  Both the external settings and the normal
> UIMA configuration parameter settings (I'm thinking of the XML descriptor)
> represent these as strings.  So the number 1.0 is represented as the string
> "1.0", and the code that gets configuration parameter settings is responsible
> for type conversions, for instance, converting the string to the declared
> configuration parameter type.
> 
> For accessing directly external settings, there is no architected place for
> specifying the "type" of the parameter, other than the configuration
> declarations (which could be used for simple UIMA types only);  the external
> settings API returns just the string (or an array of strings, which is
> supported) to the caller, and it's up to the caller to then do whatever
> interpretation of this string value is desired (not architected by UIMA).

The ConfigurableDataResourceSpecifier uses a ResourceMetaData object for
parameters. ResourceMetaData supports non-String parameter values via
ConfigurationParameterDeclarations and ConfigurationParameterSettings.
Types of parameters are declared in ConfigurationParameterDeclarations 
and the framework handles the conversion between external String form and
internal parameter values. It is not up to the component or resource to
implement a conversion mechanism for each parameter.

> 4) re: disambiguating parameters for multiple instances of a shared resource. 
> UIMA today has the ability to have multiple instances of a shared resource, e.g.
> a "dictionary" that is parameterized by "language";
> multiple instances of these can be loaded.  The "get resource" api for this
> includes specifying the parameter(s) to select the proper one, and each instance
> that is created gets a initial "load" call whose argument can identify the instance.
> 
> So, (not architected by UIMA) the implementation could, for example, define a
> set of "keys": e.g.
> my_thesaurus_en, my_thesaurus_de, ...  for some parameters that are dependent on
> a language code. 
> 
> Beyond this, External Resources doesn't support multiple instances, and I had
> not considered extending this (as part of this discussion, which was about how
> to read configuration parameters).

If I understand you correctly, you want that the implementer of a resource
defines some naming convention to ensure that override names can be manually
associated with resources configured in specific ways, e.g. (pseudocode)

----
setOverride "de_dictionary" = "german.lexicon"
setOverride "en_dictionary" = "english.lexicon"

class DictionaryResource {
  def initialize(UimaContext ctx) {
    def lang = ctx.getParameter("lang");
    def lexicon = ctx.getOverride("${lang}_dictionary");
    loadLexicon(lexicon);
  }
}
----

If I have understood it correctly, that looks like a nice option
of working e.g. with multi-language scenarios.

We have used external resources more in the context of machine
learning, specifically to model feature extractors. Here, we
define multiple instances of external resources, e.g. to
obtain n-grams of different sizes.

----

// Defining two instances of the NGramExtractorResource with
// different parameters.
def unigrams = createResource(NGramExtractorResource.class, 
  NGramExtractorResource.PARAM_SIZE, 1);
def bigrams = createResource(NGramExtractorResource.class, 
  NGramExtractorResource.PARAM_SIZE, 2);

def analysisEngine = createEngine(Analyzer.class,
  Analyzer.KEY_EXTRATORS, asList(unigrams, bigrams));
----

To that end, uimaFIT introduces a custom external resource type
"ResourceList" (extends Resource_ImplBase) [1] which is implicitly
created in the call above. So the "unigrams" and "bigrams" bind to the
implicitly created "resource list" and the "resource list" binds to the
analysis engine.

Can/should the setting of PARAM_SIZE in the example above be substituted
using the "external override" mechanism?

Cheers,

-- Richard

[1] https://svn.apache.org/repos/asf/uima/uimafit/trunk/uimafit-core/src/main/java/org/apache/uima/fit/internal/ResourceList.java
Mime
View raw message