From tika-dev-return-36-apmail-incubator-tika-dev-archive=incubator.apache.org@incubator.apache.org Wed May 09 17:49:52 2007 Return-Path: Delivered-To: apmail-incubator-tika-dev-archive@locus.apache.org Received: (qmail 55490 invoked from network); 9 May 2007 17:49:51 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 9 May 2007 17:49:51 -0000 Received: (qmail 98034 invoked by uid 500); 9 May 2007 17:49:58 -0000 Delivered-To: apmail-incubator-tika-dev-archive@incubator.apache.org Received: (qmail 98006 invoked by uid 500); 9 May 2007 17:49:58 -0000 Mailing-List: contact tika-dev-help@incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: tika-dev@incubator.apache.org Delivered-To: mailing list tika-dev@incubator.apache.org Received: (qmail 97997 invoked by uid 99); 9 May 2007 17:49:58 -0000 Received: from herse.apache.org (HELO herse.apache.org) (140.211.11.133) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 09 May 2007 10:49:57 -0700 X-ASF-Spam-Status: No, hits=0.0 required=10.0 tests= X-Spam-Check-By: apache.org Received-SPF: pass (herse.apache.org: local policy) Received: from [137.78.160.214] (HELO nmta1.jpl.nasa.gov) (137.78.160.214) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 09 May 2007 10:49:50 -0700 Received: from xmta3.jpl.nasa.gov (xmta3.jpl.nasa.gov [137.78.160.111]) by nmta1.jpl.nasa.gov (Switch-3.2.6/Switch-3.2.6) with ESMTP id l49HnStG011016 for ; Wed, 9 May 2007 10:49:29 -0700 Received: from [137.79.16.80] (terra.jpl.nasa.gov [137.79.16.80]) by xmta3.jpl.nasa.gov (Switch-3.2.6/Switch-3.2.6) with ESMTP id l49HnSGs030395 for ; Wed, 9 May 2007 10:49:28 -0700 User-Agent: Microsoft-Entourage/11.3.3.061214 Date: Wed, 09 May 2007 10:49:23 -0700 Subject: Re: Second Tika report From: Chris Mattmann To: Message-ID: Thread-Topic: Second Tika report Thread-Index: AceSYmCwnymysf5VEdutrwAX8gN7tA== In-Reply-To: <4641F14C.90507@apache.org> Mime-version: 1.0 Content-type: text/plain; charset="US-ASCII" Content-transfer-encoding: 7bit X-Source-IP: terra.jpl.nasa.gov [137.79.16.80] X-Source-Sender: chris.mattmann@jpl.nasa.gov X-AUTH: Authorized X-Virus-Checked: Checked by ClamAV on apache.org +1, thanks for putting this together, Jukka. I plan on moving over the parse plugins stuff and the metadata container sometime this month into the Tika codebase, where it can be maintained. Cheers, Chris On 5/9/07 9:05 AM, "Doug Cutting" wrote: > +1 This jibes with the activity I've seen. Thanks for writing this! > > Doug > > Jukka Zitting wrote: >> Hi, >> >> I've prepared the following as the Tika report for this month. >> >> >> Tika is a toolkit for detecting and extracting metadata and structured >> text content from various documents using existing parser libraries. >> Tika entered incubation on March 22nd, 2007. >> >> Community >> >> We had a good project bootstrap meeting as a part of the text analysis >> BOF at the ApacheCon EU in Amsterdam. The resulting ideas were >> summarized on the project mailing list, and the first design threads >> have started. >> >> Development >> >> We've started discussing the design of the Tika toolkit. It seems like >> we will select one of the existing codebases listed in the project >> proposal as the basis of an early 0.1 release, and start refactoring >> the code into a more generic toolkit. The Tika svn tree is still >> empty, but I expect us to see the first code commits before the next >> report. >> >> Infrastructure >> >> All the initial infrastructure is now in place. There is still some >> activity on the temporary Tika wiki on the Google Project hosting >> service, so we may end up requesting a Tika wiki to be set up on the >> ASF infrastructure. >> >> Issues before graduation >> >> The Tika project is still at an early stage of incubation. The most >> important tasks before graduation are to develop and release the Tika >> codebase and to grow a diverse and sustainable project community. >> >> >> BR, >> >> Jukka Zitting ______________________________________________ Chris A. Mattmann Chris.Mattmann@jpl.nasa.gov Key Staff Member Modeling and Data Management Systems Section (387) Data Management Systems and Technologies Group _________________________________________________ Jet Propulsion Laboratory Pasadena, CA Office: 171-266B Mailstop: 171-246 _______________________________________________________ Disclaimer: The opinions presented within are my own and do not reflect those of either NASA, JPL, or the California Institute of Technology.