tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mattmann, Chris A (388J)" <chris.a.mattm...@jpl.nasa.gov>
Subject Re: Gdal Integration (TIKA 605)
Date Sun, 26 Feb 2012 19:13:10 GMT
Hi Joe,

On Feb 26, 2012, at 11:06 AM, Joe White wrote:

> Hi, Chris,
> I would agree that we probably should come up with a more comprehensive solution for
this wrt the metadata object and the resulting XHTML.  That would make this feel a little
more like the geospatial stuff is more of a first class citizen in the metadata hierarchy.

+1.

> 
> We will probably need to support more coordinate systems than just WGS 84, as there are
a number of systems that either have no transformation to WGS 84.  

+1, agreed, WGS84 was just the first one that came to mind.


> The encoding of the WKT is also pretty important.  Would you rather break it down to
it's component parts, probably datum and projection for starters, or leave it whole?  Obviously,
the more metadata we have, the more powerful Tika becomes, but there is a point where you
have too much data that is not as useful.

Let's start out with its component parts, datum and projection, and encode those as metadata
fields. So we'd likely update the existing Geographic metadata interface
with these new keys as a starter.

> 
> On another note, I took a look at the code for your 605 patch, and I have a suggestion.
Reading the notes on the checkins for the patch, I noticed that no one had suggested using
the in-memory Dataset as the default type.  There is no reason why the stream used to open
the Tika parser could not be used to fill a buffer with the file data, and then use that to
create a dataset.

Hmm, so your suggestion is to use the in-memory Dataset API and that would be streamable via
Tika? Hmm, that would be great, I just wasn't as familiar with GDAL
to know how to do that, so a coding example if you have one in Java would help me to wrap
my head around it.

> 
> As it is, I'm trying to get GDAL to cooperate with me on my Mac.  Being a newcomer to
Mac seems to be a drawback when trying to be productive.  It just takes a little more fight
to get the bits to do what I really want.
> 

Heh, yeah I was trying to do this too. At one point I had it running but a few OS upgrades
have nixed that. Let's see if I can get it up
and running again too so we can co-develop this.

> In any case, once I get GDAL whipped into shape, I'll see if I can't get a test file
to recognize any geospatial data, and then we will be off and running.

Great!

Cheers,
Chris

> On Feb 26, 2012, at 1:10 PM, Mattmann, Chris A (388J) wrote:
> 
>> Hi Joe,
>> 
>> Awesome! Thanks for picking this up and getting interested in this work. Right now,
the only use cases we've had so far
>> is to represent lats and lons (WGS84). It would be great to extract more information
and come up with a policy for representing
>> more WKTs and so forth. We should probably start by coming up with a scheme for encoding
the extracted information in the 
>> Tika metadata object and in its output XHTML. Do you have any ideas about how to
do that? Right now in the existing patch
>> on TIKA-605, I simply was intended to use the met object and its key-multi-value
structure to represent the extracted information
>> but to take advantage of streaming and of content handlers, we ought to encode this
information in the output XHTML.
>> 
>> Thoughts?
>> 
>> Cheers,
>> Chris
>> 
>> On Feb 26, 2012, at 9:39 AM, Joe White wrote:
>> 
>>> Hi,
>>> I'm looking into implementing a bridge/link between Tika and GDAL so that geospatial
information can be saved from georeferenced images and vector types.  One thing that I have
noticed while going through the code is that the code only defines geographic coordinate types,
using latitudes and longitudes.  Is this by design?  If GDAL is wrapped into Tika, and a projected
image is imported, are the geospatial extents meant to be held in the metadata as geographic
points, possibly as WGS 84?  
>>> 
>>> Thanks
>>> 
>>> Joe White
>> 
>> 
>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> Chris Mattmann, Ph.D.
>> Senior Computer Scientist
>> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>> Office: 171-266B, Mailstop: 171-246
>> Email: chris.a.mattmann@nasa.gov
>> WWW:   http://sunset.usc.edu/~mattmann/
>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> Adjunct Assistant Professor, Computer Science Department
>> University of Southern California, Los Angeles, CA 90089 USA
>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> 
> 


++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: chris.a.mattmann@nasa.gov
WWW:   http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++


Mime
View raw message