nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ferdy Galema <ferdy.gal...@kalooga.com>
Subject Re: VOTE Apache Nutch 2.0 RC1
Date Wed, 13 Jun 2012 09:06:44 GMT
Hmm please ignore "the parse text limited to 100 chars", this is actually
not the case. (Only in our branch that has a fix for limiting anchor texts;
not yet present in in the nutchgora branch because it still needs
polishing). So no need to wait for commits on my part.

On Wed, Jun 13, 2012 at 11:00 AM, Ferdy Galema <ferdy.galema@kalooga.com>wrote:

> Findings about Nutch-2.0 RC 1.
>
> The Nutch job jar is not present in the binary archive. This means
> distributed running of jobs is not supported. I'm not sure if this is a
> problem (since users can always build one themselves), merely pointing it
> out. The recently released 1.5 also lacks this job jar, so at least no
> difference there.
>
> Parse text is limited to 100 characters for html. We noticed this when our
> index wasn't showing enough terms for some documents. This is a pretty
> severe bug that I will commit a fix for right away.
>
> Building runtime with the default SqlStore and HBaseStore works fine. Will
> perform some more functionality tests when there is a new RC.
>
> Ferdy.
>
> On Wed, Jun 13, 2012 at 4:24 AM, Mattmann, Chris A (388J) <
> chris.a.mattmann@jpl.nasa.gov> wrote:
>
>> Hey Guys,
>>
>> #2 is probably reason enough for a respin.
>>
>> Lewis if you don't have time to do it before Thursday, I could probably
>> give it a whack. Let me know.
>>
>> Cheers,
>> Chris
>>
>> On Jun 12, 2012, at 3:33 PM, Sebastian Nagel wrote:
>>
>> > Hi Lewis,
>> >
>> > my first steps with 2.0 (to be continued, still struggling).
>> >
>> > Two points (I'll try to give a final vote tomorrow):
>> >
>> > 1 some guidance would be nice. README.txt points
>> > to http://wiki.apache.org/nutch/NutchTutorial which refers to 1.x
>> > (I'm using
>> http://sujitpal.blogspot.de/2012/01/exploring-nutch-gora-with-cassandra.html
>> )
>> >
>> > 2 the package contains your nutch-site.xml:
>> >    <name>http.agent.email</name>
>> >    <value>lewismc@apache.org</value>
>> > I guess that's not intended :)
>> >
>> > Cheers,
>> > Sebastian
>> >
>> > On 06/12/2012 10:16 PM, Lewis John Mcgibbney wrote:
>> >> Hi Everyone,
>> >>
>> >> I appreciate that most of the core dev's are using trunk, however I
>> >> would appeal to you guys to at least check out the artifacts and check
>> >> sigs, tests, license headers if possible. Although this does not fully
>> >> satisfy the requirements of a thoroughly reviewed RC, hopefully the
>> >> thorough stuff can be undertaken by those directly using the artifacts
>> >> and code in development/production.
>> >>
>> >> Thanks very much in advance
>> >>
>> >> Best
>> >>
>> >> Lewis
>> >>
>> >> On Fri, Jun 8, 2012 at 3:49 PM, lewis john mcgibbney <
>> lewismc@apache.org> wrote:
>> >>> Good Evening Everyone,
>> >>>
>> >>> A candidate for the Apache Nutch 2.0 RC1 is available at:
>> >>>
>> >>> http://people.apache.org/~lewismc/nutch-2.0
>> >>>
>> >>> The release candidate is a src.zip, bin.zip, src.tar.gz and bin.tar.gz
>> >>> archive of the sources in:
>> >>>
>> >>> http://svn.apache.org/repos/asf/nutch/tags/release-2.0rc1
>> >>>
>> >>> Further, a staged Maven repository of the 2.0 jar, sources.jar and
>> >>> javadoc.jar is available here:
>> >>>
>> >>> https://repository.apache.org/content/repositories/orgapachenutch-215
>> >>>
>> >>> Please vote on releasing this package as Apache Nutch 2.0.
>> >>> The vote is open for the next 72 hours and passes if a majority of at
>> >>> least three +1 Nutch PMC votes are cast.
>> >>>
>> >>> [ ] +1 Release this package as Apache Nutch 2.0
>> >>> [ ] -1 Do not release this package because...
>> >>>
>> >>> Many Thanks and heres to plenty more.
>> >>>
>> >>> Have a great weekend, Kind Regards,
>> >>> Lewis
>> >>>
>> >>> P.S. Here's my +1.
>> >>
>> >>
>> >>
>> >
>>
>>
>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> Chris Mattmann, Ph.D.
>> Senior Computer Scientist
>> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>> Office: 171-266B, Mailstop: 171-246
>> Email: chris.a.mattmann@nasa.gov
>> WWW:   http://sunset.usc.edu/~mattmann/
>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> Adjunct Assistant Professor, Computer Science Department
>> University of Southern California, Los Angeles, CA 90089 USA
>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>
>>
>

Mime
View raw message