tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nick Burch <apa...@gagravarr.org>
Subject Re: How should video files with audio be handled by parsers?
Date Wed, 20 Aug 2014 10:24:39 GMT
OK, almost all of it looks fine to me now!

Taking just one bit:

> so in your example, setting a value of Audio on two essence tracks would currently look
like:
>
>    metadata.set(PBCore.ESSENCE_TRACK_TYPE(0), "Audio");
>    metadata.set(PBCore.ESSENCE_TRACK_TYPE(1), "Audio");
>
> That index related component could potentially live in the Metadata class itself but
if we choose to support multiple levels of structured properties, i.e.:
>
>    company[1]/contact[0]/phoneNumber[2]=555-1234
>
> that might prove difficult to support.

Could you nest the property definitions to solve this?

eg
Property contact = Contact.COMPANY_CONTACT(1);
Property phone = Contact.PHONE(contact, 2); 
System.out.println(phone.getName());
// -> company[1]/contact[2]/phone


Otherwise, if you promise to help me update the vorbis parsers with 
support for this, I'll vote +1 on adding it in to tika core in this 
form... ;-)

Nick

On Tue, 19 Aug 2014, Ray Gauss wrote:
> The PBCore metadata class [1] has the indexed essence track properties defined as:
>
>     public static Property ESSENCE_TRACK_TYPE(int index)
>     {
>         return getIndexedEssenceTrackProperty(index, "essenceTrackType");
>     }
>
> which resolve via:
>
>     protected static Property getIndexedEssenceTrackProperty(int index, String elementName)
>     {
>         return Property.internalText(
>                 MessageFormat.format(ELEMENT_INSTANTIATION_ESSENCE_TRACK_FORMAT,
index) +
>                 PREFIX_PBCORE + Metadata.NAMESPACE_PREFIX_DELIMITER + elementName);
>     }
>
> so in your example, setting a value of Audio on two essence tracks would currently look
like:
>
>    metadata.set(PBCore.ESSENCE_TRACK_TYPE(0), "Audio");
>    metadata.set(PBCore.ESSENCE_TRACK_TYPE(1), "Audio");
>
> That index related component could potentially live in the Metadata class itself but
if we choose to support multiple levels of structured properties, i.e.:
>
>    company[1]/contact[0]/phoneNumber[2]=555-1234
>
> that might prove difficult to support.
>
> Regards,
>
> Ray
>
>
> [1] https://github.com/AlfrescoLabs/tika-ffmpeg/blob/master/src/main/java/org/apache/tika/metadata/PBCore.java
>
>
>
> On August 7, 2014 at 6:21:37 AM, Nick Burch (apache@gagravarr.org) wrote:
>> On Wed, 6 Aug 2014, Ray Gauss wrote:
>>> I've updated tika-ffmpeg with a new file with 2 audio tracks and a
>>> subtitle track and added a test. The metadata looks as follows:
>>>
>>> pbcore:instantiationDataRate=3511 kb/s
>>> pbcore:instantiationDuration=00:00:01.03
>>> pbcore:instantiationEssenceTrack[0]/pbcore:essenceTrackType=Video
>>> pbcore:instantiationEssenceTrack[0]/pbcore:essenceTrackFrameSize=480x270
>>> pbcore:instantiationEssenceTrack[0]/pbcore:essenceTrackFrameRate=29.97
>> fps
>>> pbcore:instantiationEssenceTrack[0]/pbcore:essenceTrackDataRate=360 kb/s
>>> pbcore:instantiationEssenceTrack[0]/pbcore:essenceTrackEncoding=h264
>>> pbcore:instantiationEssenceTrack[0]/pbcore:essenceTrackLanguage=eng
>>> pbcore:instantiationEssenceTrack[1]/pbcore:essenceTrackType=Audio
>>> pbcore:instantiationEssenceTrack[1]/pbcore:essenceTrackSamplingRate=48000
>> Hz
>>
>> This actually looks better than I'd expected, so I have fewer resistances
>> now than before
>>
>>> A much more concise representation would be:
>>>
>>> Iptc4xmpExt:LocationCreated/Iptc4xmpExt:City
>>> Iptc4xmpExt:LocationCreated/Iptc4xmpExt:CountryName
>>> ...
>>> Iptc4xmpExt:LocationShown[0]/Iptc4xmpExt:City
>>> Iptc4xmpExt:LocationShown[0]/Iptc4xmpExt:CountryName
>>
>> This looks ok-ish to me too
>>
>>
>> One thing that I am wondering about though:
>>
>>> pbcore:instantiationDataRate=3511 kb/s
>>> pbcore:instantiationDuration=00:00:01.03
>>> stream[0]/pbcore:essenceTrackType=Video
>>> stream[0]/pbcore:essenceTrackFrameSize=480x270
>>> stream[0]/pbcore:essenceTrackFrameRate=29.97 fps
>>> stream[1]/pbcore:essenceTrackType=Audio
>>> stream[1]/pbcore:essenceTrackSamplingRate=48000 Hz
>>
>> I can see how we can farily easily modify Metadata to accept an optional
>> stream number when setting key/values, which would automatically prefix
>> them with stream[number]/
>>
>> For a property like pbcore:essenceTrackType, and your alternate scheme,
>> how would you see the method on Metadata look like to set a
>> pbcore:essenceTrackType to a value of Audio on two different tracks?
>>
>> Nick
>
Mime
  • Unnamed multipart/mixed (inline, None, 0 bytes)
View raw message