gora-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From lewis john mcgibbney <lewis.mcgibb...@gmail.com>
Subject Re: Future of Nutch 2.0 [Was: Unresolved dependencies org.apache.gora#gora-hbase;0.1: not found in Nutch trunk]
Date Wed, 10 Aug 2011 10:25:02 GMT
Hi,

Without changing the flow of conversation and the points which have already
been touched upon, I would like to add:

I am really split here between a couple of decisions. I like the abstraction
that Gora provides, even though it is somewhat of a pain to configure, this
also presents a barrier to adoption for dev's. This being said, Gora is a
fundamental component for Nutch 2.0 and once you get to grips with the
config and the flexibility which it offers you are then presented with an
excellent setup for Nutch 2.0. I understand people's concerns and why they
would wish to hardwire to HBase however I would like to point to a (rather
lengthy) thread I found last night as I was thinking about my position in
this whole affair [1]. In essence this reflects exactly what Julien has
mentioned below as well as adding a hellish lot more! I am also with Markus
on this one, however there is also no point in me being anything other than
totally honest, some of the bugs in trunk 2.0 we are talking about are
pretty substantial (I don't even know them all), especially when the API
changes are taken into account, therefore I would be learning as I chipped
in my part... this would inevitably lead to slower progression on Nutch 2.0
than we all would hope for. Bearing in mind several dev's other commitments
both in and out of the ASF. Is this something which can be tolerated or are
we to put suggestions in place which adhere to the release early release
often ethos and try to get something out of the door. If we could get an
official release for Nutch 2.0 then it would mean community testing could
commence and instead of improvement suggestions resulting within JIRA
tickets we would be getting bugs specifically for 2.0 as independent issues,
this would inevitably lead to a better trunk development environment for us
all. One inverse aspect of veering towards option A) is that we had a small
amount of resistance when Nutch 1.3 was release... would making Nutch 2.0
mainstream, the de facto for Nutch users be a step too far for some of them?

I am a firm believer that we should do whatever necessary to get trunk
building under Hudson. It seems like a waste of resources that we have the
potential to have a stable build environment but it is not being taken
advantage of. Obviously I am unaware of exactly what is preventing this,
hence my keenness to get it sorted out, but surely we all must agree that
this would be beneficial, from a mental point of view as well. If we see
that trunk is building successfully then there might be a better feeling
about people developing not only on trunk 2.0 but also on Gora and other
components upon which trunk 2.0 depends.

Further to this, is there any consensus to get a jenkins build established
for branch 1.X? It is quite clear that this is our working development
strand therefore would this not make sense? I have been looking through the
wiki [2] and any committer can get it set up once the PMC chair makes some
minor requests on people.apache,org

Finally, with regards to the ant/ivy configuration, I am quite happy with
the current set up, if someone puts forward a reasonable argument for
changing to ant/maven or any other configuration then I will certainly be
interested if it adds value to the project. I must agree that changing
something which is not broken is far from the direction I had envisaged we
were moving... quite the opposite infact.

[1] http://www.mail-archive.com/dev@nutch.apache.org/msg00216.html
[2] http://wiki.apache.org/general/Hudson


On Wed, Aug 10, 2011 at 10:20 AM, Markus Jelsma
<markus.jelsma@openindex.io>wrote:

> Julien, devs, users,
>
> I'd like to see bugs fixed in 2.0 but some of them are way out of my league
> or
> would cost me an absurd amount of time. I'd also really like to use Gora
> but
> Gora must be maintained. Gora will play a fundamental role in 2.0 and if
> something is broken there it is not trivial to fix it for us Nutch devs as
> it
> is yet another component to worry about.
>
> Tika goes well, it's worked on and there is good enough progress to rely on
> from our perspective. If this is not going to be the case with Gora we
> should
> maybe decide to drop it and hardwire HBASE in it.
>
> Maintaining 1.x and 2.x is a pain indeed. I'd prefer option A) but i'm not
> sure the currently active Nutch devs are going to fix it just like that.
>
> Cheers,
>
>
> >
> > a) put some effort into it, fix the bugs and make so that it can be used
> > instead of 1.x
> > b) shelve it and leave it for enthusiasts to play with + make 1.x the
> trunk
> > again
> > c) do nothing : keep 2.0 and 1.x in parallel  (but having to maintain two
> > branches is quite a pain)
> > d) abandon the idea of a neutral storage layer with Gora and hardwire it
> to
> > e.g. HBase
> >
> > Option (a) has not happened in the last 12 months and I am not very
> hopeful
> > about it.
> >
> > What do you guys think?
> >
> > Julien
>
> --
> Markus Jelsma - CTO - Openindex
> http://www.linkedin.com/in/markus17
> 050-8536620 / 06-50258350
>



-- 
*Lewis*

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message