lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tim Allison <talli...@apache.org>
Subject Re: problem indexing GPS metadata for video upload
Date Fri, 10 May 2019 17:00:02 GMT
Unfortunately, It Depends(TM)*...these are the steps I take:
https://wiki.apache.org/tika/UpgradingTikaInSolr

There can be version conflicts and other awful, unforeseen things if
you don't get it right.

We're on the cusp of the release for 1.21 (I mean it this time)...I'll
upgrade Solr as soon as Tika is out (I also mean it this time).


*TM by Erick Erickson

On Fri, May 3, 2019 at 3:44 AM Where is Where <whisere@gmail.com> wrote:
>
> Thank you very much Tim, I wonder how to make the Tika change apply to
> Solr? I saw Tika core, parse and xml jar files tika-core.jar
> tika-parsers.jar tika-xml.jar in solr contrib/extraction/lib folder. Do we
> just  replace these files? Thanks!
>
> On Thu, May 2, 2019 at 12:16 PM Where is Where <whisere@gmail.com> wrote:
>
> > Thank you Alex and Tim.
> > I have looked at the solrconfig.xml file (I am trying the techproducts
> > demo config), the only related place I can find is the extract handle
> >
> > <requestHandler name="/update/extract"
> >                   startup="lazy"
> >                   class="solr.extraction.ExtractingRequestHandler" >
> >     <lst name="defaults">
> >       <str name="lowernames">true</str>
> >       <!--<str name="uprefix">ignored_</str>-->
> >
> >       <!-- capture link hrefs but ignore div attributes -->
> >       <str name="captureAttr">true</str>
> >       <str name="fmap.a">links</str>
> >       <str name="fmap.div">ignored_</str>
> >     </lst>
> >   </requestHandler>
> >
> > I am using this command bin/post -c techproducts
> > example/exampledocs/1.mp4 -params "literal.id=mp4_1&uprefix=attr_"
> >
> > I have tried commenting out <str name="uprefix">ignored_</str> and
> > changing to <str name="fmap.div">div</str>
> > but still not working. I don't quite get why image is getting gps etc
> > metadata but video is acting differently while it is using the same
> > solrconfig and the gps metadata are in the same fields. There is no
> > differentiation in solrconfig setting between image and video.
> >
> > Tim yes this is related to the TIKA link. Thank you!
> >
> > Here is the output in solr for mp4.
> >
> > {
> >         "attr_meta":["stream_size",
> >           "5721559",
> >           "date",
> >           "2019-03-29T04:36:39Z",
> >           "X-Parsed-By",
> >           "org.apache.tika.parser.DefaultParser",
> >           "X-Parsed-By",
> >           "org.apache.tika.parser.mp4.MP4Parser",
> >           "stream_content_type",
> >           "application/octet-stream",
> >           "meta:creation-date",
> >           "2019-03-29T04:36:39Z",
> >           "Creation-Date",
> >           "2019-03-29T04:36:39Z",
> >           "tiff:ImageLength",
> >           "1080",
> >           "resourceName",
> >           "/Volumes/Data/inData/App/solr/example/exampledocs/1.mp4",
> >           "dcterms:created",
> >           "2019-03-29T04:36:39Z",
> >           "dcterms:modified",
> >           "2019-03-29T04:36:39Z",
> >           "Last-Modified",
> >           "2019-03-29T04:36:39Z",
> >           "Last-Save-Date",
> >           "2019-03-29T04:36:39Z",
> >           "xmpDM:audioSampleRate",
> >           "1000",
> >           "meta:save-date",
> >           "2019-03-29T04:36:39Z",
> >           "modified",
> >           "2019-03-29T04:36:39Z",
> >           "tiff:ImageWidth",
> >           "1920",
> >           "xmpDM:duration",
> >           "2.64",
> >           "Content-Type",
> >           "video/mp4"],
> >         "id":"mp4_4",
> >         "attr_stream_size":["5721559"],
> >         "attr_date":["2019-03-29T04:36:39Z"],
> >         "attr_x_parsed_by":["org.apache.tika.parser.DefaultParser",
> >           "org.apache.tika.parser.mp4.MP4Parser"],
> >         "attr_stream_content_type":["application/octet-stream"],
> >         "attr_meta_creation_date":["2019-03-29T04:36:39Z"],
> >         "attr_creation_date":["2019-03-29T04:36:39Z"],
> >         "attr_tiff_imagelength":["1080"],
> >         "resourcename":"/Volumes/Data/inData/App/solr/example/exampledocs/1.mp4",
> >         "attr_dcterms_created":["2019-03-29T04:36:39Z"],
> >         "attr_dcterms_modified":["2019-03-29T04:36:39Z"],
> >         "last_modified":"2019-03-29T04:36:39Z",
> >         "attr_last_save_date":["2019-03-29T04:36:39Z"],
> >         "attr_xmpdm_audiosamplerate":["1000"],
> >         "attr_meta_save_date":["2019-03-29T04:36:39Z"],
> >         "attr_modified":["2019-03-29T04:36:39Z"],
> >         "attr_tiff_imagewidth":["1920"],
> >         "attr_xmpdm_duration":["2.64"],
> >         "content_type":["video/mp4"],
> >         "content":[" \n \n  \n  \n  \n  \n  \n  \n  \n  \n  \n  \n  \n  \n  \n 
\n  \n  \n  \n  \n  \n  \n \n   "],
> >         "_version_":1632383499325407232}]
> >   }}
> >
> > JPEG is getting these:
> > "attr_meta":[....
> > "GPS Latitude",
> >           "37° 47' 41.99\"",
> > ....
> > "attr_gps_latitude":["37° 47' 41.99\""],
> >
> >
> > On Wed, May 1, 2019 at 2:57 PM Where is Where <whisere@gmail.com> wrote:
> >
> >> uploading video to solr via tika
> >> https://lucene.apache.org/solr/guide/7_7/uploading-data-with-solr-cell-using-apache-tika.html
> >> The index has no video GPS metadata which is extracted and indexed for
> >> images such as jpeg. I have checked both MP4 and MOV files, the files I
> >> checked all have GPS Exif data embedded in the same fields as image. Any
> >> idea? Thanks!
> >>
> >

Mime
View raw message