tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jukka Zitting" <jukka.zitt...@gmail.com>
Subject Fwd: [Aperture-devel] announcement: Aperture 1.2.0 release
Date Tue, 28 Oct 2008 08:57:27 GMT

Major new release from the Aperture guys. The OSL license used the
Aperture implementation has been cleared as a category B license (see
[1]), so we could now reuse some of their work.

[1] https://issues.apache.org/jira/browse/LEGAL-32


Jukka Zitting

---------- Forwarded message ----------
From: Antoni Mylka <antoni.mylka@gmail.com>
Date: Tue, Oct 28, 2008 at 4:02 AM
Subject: [Aperture-devel] announcement: Aperture 1.2.0 release
To: Aperture Devel <aperture-devel@lists.sourceforge.net>,
rdf2go-devel@ontoware.org, "Sauermann, Leo"
<sauermann@dfki.uni-kl.de>, "Fluit, Christiaan"
<christiaan.fluit@aduna-software.com>, Aduna - Alltech


Aperture is a Java framework for extracting full-text content and
metadata from various information systems (e.g. file systems, web sites,
mail boxes) and the file formats (e.g. documents, images) occurring in
these systems.

Download URL:

After three years of development Aperture is stable enough
to drop the .beta suffix from the release. 1.2.0 leverages
architectural improvements made in 1.1.0.beta to bring
support for compressed archives and to streamline
email processing. A completely new service - the
DataSourceDetector allows  applications to provide
suggestions to users about the data sources on their
desktops. A host of bugfixes and minor improvements rounds
the image of the leanest and meanest version of Aperture
ever made. Enjoy.

What's new?
- a completely new Aperture service - the
 DataSourceDetectors - can be used  to provide advice to
 the user about the data sources on the desktop
- new subcrawlers for .zip, .gzip, tar and bzip2 compressed
- unification of the email handling - now the ImapCrawler,
 MboxCrawler and the MimeSubCrawler use the same code in
 the DataObjectFactory to convert emails to RDF. The
 MimeExtractor has been deprecated, switch to
- some bugfixes in the email handling code, plain text, and
 xml attachments are treated correctly, threads are
 reflected in the resulting rdf
- the pdf extractor has some basic support for XMP metadata
 (thanks to JempBox)
- a completely new XmlSafetyUtil class that helps to deal
 with characters that are valid in RDF, but invalid in XML
 thus breaking the serialization
- the uris of subcrawled resources follow the pattern
 established by the Apache Commons VFS project.
- new Sesame 2.2.1 bundled with Aperture features dramatic
 performance  optimizations, e.g. the aperture test suite
 is 2 times faster, this may also be a boost for your

Best regards
Leo Sauermann
Christiaan Fluit
Antoni Mylka

This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
Aperture-devel mailing list

View raw message