nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Markus Jelsma <markus.jel...@openindex.io>
Subject Re: Update schema to get solrdedup working again
Date Thu, 05 May 2011 13:58:00 GMT
Don't worry, the sun is shining! The change is committed. We still need to do 
something about the moreindexing filter.

https://issues.apache.org/jira/browse/NUTCH-985

On Thursday 05 May 2011 15:34:56 Julien Nioche wrote:
> Hi Markus,
> 
> Sorry for the late reply. Definitely +1 to change to Date in the schema, it
> is the right thing to do and it's also the right time to do it
> 
> Thanks
> 
> Julien
> 
> On 28 April 2011 12:43, Markus Jelsma <markus.jelsma@openindex.io> wrote:
> > Hi devs,
> > 
> > The Solr schema must be updated as well to get dedup to work in 1.3. This
> > is
> > because in december last year index-basic seems to have been updated to
> > write
> > proper formatted dates to Solr but the schema field was still a long.
> > 
> > Somehow Solr accepted (this is a bug) the input but cannot cope with the
> > output, nor could Nutch convert the date to the internally used long
> > (which it
> > now can). The remaining issue is to update the field to use date instead
> > of long. But this will break existing Solr set ups for sure because of
> > field incompatibility.
> > 
> > I propose to update the field, regardless of current Solr set ups because
> > of
> > the assumption that 1) an index can always be recreated from segments and
> > 2)
> > the current indexer assumes the Solr bug remains in 3.1 and higher as
> > well.
> > 
> > I haven't tested it with 3.1 but the bug is in 1.4.1 for sure.
> > 
> > Thoughts?
> > 
> > Cheers,
> > --
> > Markus Jelsma - CTO - Openindex
> > http://www.linkedin.com/in/markus17
> > 050-8536620 / 06-50258350

-- 
Markus Jelsma - CTO - Openindex
http://www.linkedin.com/in/markus17
050-8536620 / 06-50258350

Mime
View raw message