tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Thilo Goetz <twgo...@gmx.de>
Subject Re: Tika discussions in Amsterdam
Date Thu, 03 May 2007 08:24:33 GMT
Jukka Zitting wrote:
> Hi,
> Quick summary of the Tika discussions from yesterday's text analysis
> BOF at the ApacheCon EU. It's the next morning now, so I'm probably
> missing a lot of stuff...

One other thing that we discussed was that it would make sense for some 
input formats (such as html) if Tika could produce output that allows 
mapping back to the input.  In other words, it should be possible 
(optionally) to know for each character in the output text where this 
character originated in the input.  This is useful, for example, for 
result highlighting.

This may not be something for the early releases, but it would be good 
if we could keep this option in the back of our heads when designing the 


View raw message