lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Where is Where <whis...@gmail.com>
Subject Re: problem indexing GPS metadata for video upload
Date Fri, 03 May 2019 01:28:47 GMT
Thank you very much Tim, I wonder how to make the Tika change apply to
Solr? I saw Tika core, parse and xml jar files tika-core.jar
tika-parsers.jar tika-xml.jar in solr contrib/extraction/lib folder. Do we
just  replace these files? Thanks!

On Thu, May 2, 2019 at 12:16 PM Where is Where <whisere@gmail.com> wrote:

> Thank you Alex and Tim.
> I have looked at the solrconfig.xml file (I am trying the techproducts
> demo config), the only related place I can find is the extract handle
>
> <requestHandler name="/update/extract"
>                   startup="lazy"
>                   class="solr.extraction.ExtractingRequestHandler" >
>     <lst name="defaults">
>       <str name="lowernames">true</str>
>       <!--<str name="uprefix">ignored_</str>-->
>
>       <!-- capture link hrefs but ignore div attributes -->
>       <str name="captureAttr">true</str>
>       <str name="fmap.a">links</str>
>       <str name="fmap.div">ignored_</str>
>     </lst>
>   </requestHandler>
>
> I am using this command bin/post -c techproducts
> example/exampledocs/1.mp4 -params "literal.id=mp4_1&uprefix=attr_"
>
> I have tried commenting out <str name="uprefix">ignored_</str> and
> changing to <str name="fmap.div">div</str>
> but still not working. I don't quite get why image is getting gps etc
> metadata but video is acting differently while it is using the same
> solrconfig and the gps metadata are in the same fields. There is no
> differentiation in solrconfig setting between image and video.
>
> Tim yes this is related to the TIKA link. Thank you!
>
> Here is the output in solr for mp4.
>
> {
>         "attr_meta":["stream_size",
>           "5721559",
>           "date",
>           "2019-03-29T04:36:39Z",
>           "X-Parsed-By",
>           "org.apache.tika.parser.DefaultParser",
>           "X-Parsed-By",
>           "org.apache.tika.parser.mp4.MP4Parser",
>           "stream_content_type",
>           "application/octet-stream",
>           "meta:creation-date",
>           "2019-03-29T04:36:39Z",
>           "Creation-Date",
>           "2019-03-29T04:36:39Z",
>           "tiff:ImageLength",
>           "1080",
>           "resourceName",
>           "/Volumes/Data/inData/App/solr/example/exampledocs/1.mp4",
>           "dcterms:created",
>           "2019-03-29T04:36:39Z",
>           "dcterms:modified",
>           "2019-03-29T04:36:39Z",
>           "Last-Modified",
>           "2019-03-29T04:36:39Z",
>           "Last-Save-Date",
>           "2019-03-29T04:36:39Z",
>           "xmpDM:audioSampleRate",
>           "1000",
>           "meta:save-date",
>           "2019-03-29T04:36:39Z",
>           "modified",
>           "2019-03-29T04:36:39Z",
>           "tiff:ImageWidth",
>           "1920",
>           "xmpDM:duration",
>           "2.64",
>           "Content-Type",
>           "video/mp4"],
>         "id":"mp4_4",
>         "attr_stream_size":["5721559"],
>         "attr_date":["2019-03-29T04:36:39Z"],
>         "attr_x_parsed_by":["org.apache.tika.parser.DefaultParser",
>           "org.apache.tika.parser.mp4.MP4Parser"],
>         "attr_stream_content_type":["application/octet-stream"],
>         "attr_meta_creation_date":["2019-03-29T04:36:39Z"],
>         "attr_creation_date":["2019-03-29T04:36:39Z"],
>         "attr_tiff_imagelength":["1080"],
>         "resourcename":"/Volumes/Data/inData/App/solr/example/exampledocs/1.mp4",
>         "attr_dcterms_created":["2019-03-29T04:36:39Z"],
>         "attr_dcterms_modified":["2019-03-29T04:36:39Z"],
>         "last_modified":"2019-03-29T04:36:39Z",
>         "attr_last_save_date":["2019-03-29T04:36:39Z"],
>         "attr_xmpdm_audiosamplerate":["1000"],
>         "attr_meta_save_date":["2019-03-29T04:36:39Z"],
>         "attr_modified":["2019-03-29T04:36:39Z"],
>         "attr_tiff_imagewidth":["1920"],
>         "attr_xmpdm_duration":["2.64"],
>         "content_type":["video/mp4"],
>         "content":[" \n \n  \n  \n  \n  \n  \n  \n  \n  \n  \n  \n  \n  \n  \n  \n  \n
 \n  \n  \n  \n  \n \n   "],
>         "_version_":1632383499325407232}]
>   }}
>
> JPEG is getting these:
> "attr_meta":[....
> "GPS Latitude",
>           "37° 47' 41.99\"",
> ....
> "attr_gps_latitude":["37° 47' 41.99\""],
>
>
> On Wed, May 1, 2019 at 2:57 PM Where is Where <whisere@gmail.com> wrote:
>
>> uploading video to solr via tika
>> https://lucene.apache.org/solr/guide/7_7/uploading-data-with-solr-cell-using-apache-tika.html
>> The index has no video GPS metadata which is extracted and indexed for
>> images such as jpeg. I have checked both MP4 and MOV files, the files I
>> checked all have GPS Exif data embedded in the same fields as image. Any
>> idea? Thanks!
>>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message