tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ray Gauss II <ray.ga...@alfresco.com>
Subject Re: Sharing metadata logic between parsers
Date Mon, 30 Jan 2012 17:12:19 GMT
I personally like Nick's 3rd idea: Extending the Property class to support a converter.

Even in the case described here the Metadata property setters could be modified to something
like:

public void set(Property property, int value) {
        if(property.getPropertyType() != Property.PropertyType.SIMPLE) {
            throw new PropertyTypeException(Property.PropertyType.SIMPLE, property.getPropertyType());
        }
        if(property.getValueType() != Property.ValueType.INTEGER) {
            throw new PropertyTypeException(Property.ValueType.INTEGER, property.getValueType());
        }
        if (property.getConverter() != null) {
            set(property.getName(), property.getConverter().convert(this, value));
        } else {
            set(property.getName(), Integer.toString(value));
        }
    }

then the specific converter implementation could still look at other properties.

Obviously the ordering of calling those set methods would be critical since the dependency
properties need to be set first, but it seems like a pretty flexible implementation where
some powerful converters could be developed when needed.

This is also somewhat similar to a the concept of a mapper that I had to use for the tika-exiftool
parser, converting from a properties provided by the command-line tool to proper tika properties.


On Jan 30, 2012, at 10:52 AM, Jukka Zitting wrote:

> Hi,
> 
> On Mon, Jan 30, 2012 at 4:20 PM, Nick Burch <nick.burch@alfresco.com> wrote:
>> On Mon, 30 Jan 2012, Jukka Zitting wrote:
>>> What we might also consider as an extra convenience, are Metadata methods
>>> like: [...]
>> 
>> If we're doing that sort of thing, then I'd rather we put the logic onto the
>> Property for that. The Property already has a type and the closed list of
>> allowed values (as Strings, from the XMPDM specification). It would seem to
>> me that the logic for going to/from channel numbers would best live with the
>> strings themselves, on the property, rather than outside?
> 
> I'm thinking of cases where such a convenience methods could rely on
> more than just a single property. For example, a getNumberOfPages()
> convenience method could look something like this (with an extra
> getInt(String) helper method):
> 
>    int getNumberOfPages() {
>        Integer pages = metadata.getInt(PagedText.N_PAGES);
>        if (pages == null) {
>            pages = metadata.getInt(MSOffice.PAGE_COUNT);
>        }
>        if (pages == null) {
>            pages = metadata.getInt(MSOffice.SLIDE_COUNT);
>        }
>        if (pages != null) {
>            return pages;
>        } else {
>            return 0;
>        }
>    }
> 
> BR,
> 
> Jukka Zitting


Mime
View raw message