tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Mattmann <chris.mattm...@jpl.nasa.gov>
Subject Re: [general discussion, moved from TIKA-7]
Date Wed, 13 Jun 2007 22:13:38 GMT

> See my other message on this. I'm a bit concerned about our ability to
> have a productive "pure" design discussion without at least some code
> to base it on. We've already had a few design threads, but each seems
> to have died with no real conclusions. I believe that having some
> concrete code that people can play with will have a positive impact
> also on higher level discussions.

I completely disagree with this. You're saying, "we've tried to have design
discussions, and no one replied, so rather than attacking that issue, we're
going to just move ahead to prototyping." Screeech. I don't think that's the
right approach at all. We need to revisit the design discussions, otherwise
Rida (BTW, I'm not picking on you, just using you as an example) will start
checking Luis code, Chris will start checking in Nutch code, Bertrand will
start checking in Code for Apache project XX, and Doug will jump in and
commit some code he wrote to handling parsing from 10 years ago, and what
will be left with? One huge mess.

The solution to not getting a response on the design discussion is to
properly vet it on the mailing list again, track people down, those who were
interested in the project, those folks who should care, throw darts at them,
get them back to the mailing lists, and discuss discuss discuss :).
Admittedly I haven't really been participating in the discussions on the
mailing list until recently, but I'm here now, and seemingly from the
response today, so are a lot of people. So, I don't agree with your
strategy. No disrespect, just don't agree.

>> I'm fine with having code for Tika, however, we at least need to have:
>> 1. use cases for Tika (how does a user interact with it?)
>> 2. generic interfaces and extension points that will support these use cases
>> 3. implementations of those interfaces and concrete classes
>> We have a few cases for item #1, however, there are no specs for #2 and #3,
>> which must come at least during this time when new code is getting attached,
>> no?
> No. :-) Having a shared area where we can prototype and discuss
> alternatives (I regard code as another means of communication) is
> quite valuable when coming up with answers to the open design issues.
> We can also always refactor, rewrite, or simply dump existing code if
> and when needed since we aren't yet making any backwards compatibility
> promises.

Code committed to the trunk should be reasonably high quality, and should
conform to standard interfaces and exchange standard data structures that we
come up with. It shouldn't be a hodgepodge holding area where code gets
dumped, thrown away, dumped back, etc. In my (admittedly) short experience
as an Apache developer, and (admittedly) *long* experience as a developer
for a large organization, CM shouldn't be treated like a file system. It
certainly shouldn't have immeasurably strict rules on it, but also, it
shouldn't just be our internet-based zip drive either.

>> From the Tika proposal:
>> "No existing codebase is selected as "the" starting point of Tika to avoid
>> inheriting the world view and design limitations of any single project. "
>> Am I off base here?
> I very much agree with that statement, and I don't think we are
> breaking it here. I think it's quite clear to everyone that the code
> we have now (and will have for the months to come) is an early draft
> that can and will be dropped if needed. I also quite like the way Rida
> has started merging code from both Lius and Nutch.

I guess I need to review the patch for TIKA-7 more and see what's there. I
will do that, and then comment on this further. Again, I'm not trying to be
argumentative, just trying to get my point across. I don't want a bad
precedent to be started here, because I don't think that the project will
live long if we adopt that strategy.


> BR,
> Jukka Zitting

View raw message