tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nick Burch <apa...@gagravarr.org>
Subject Re: Tika OneNote Support
Date Sun, 25 Nov 2012 19:33:14 GMT
On Wed, 14 Nov 2012, 122jxgcn wrote:
> Is there anyone who worked on extracting contents from MS OneNote file? 
> (*.one) It will be great if someone can tell me how to work with parsing 
> OneNote files programatically.

I'm not aware of anything. The good news is that the file format is fully 
documented:
http://msdn.microsoft.com/en-us/library/dd924743%28v=office.12%29.aspx
http://msdn.microsoft.com/en-us/library/dd951288%28v=office.12%29.aspx

You'll need to use the specification to write some code to read the 
format, then you can feed it to Tika. My hunch is you're looking at 5-15 
days of work.

Apache POI would probably be a good home for most of the OneNote code if 
you do get it working, please consider contributing it there if you make 
progress!

Nick

Mime
View raw message