lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Daniel Pitts" <Daniel.Pi...@cnet.com>
Subject RE: Computing an md5 of a text field.
Date Tue, 24 Jul 2007 00:39:01 GMT
XML escaping is probably the best approach. Either surround the whole
thing with "<[CDATA[" and "]]>", or do use one of the many libraries out
there that will escape the string for you.

While  an MD5 is designed to be cryptographically secure one way
function, it is NOT guaranteed to be a one-to-one (invertible) function.
You could theoretically have two distinct URLs that have the same MD5.

> -----Original Message-----
> From: Nuno Leitao [mailto:nuno@scaletrix.com] 
> Sent: Monday, July 23, 2007 5:22 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Computing an md5 of a text field.
> 
> Thanks Yonik,
> 
> Basically, I am indexing a number of items where the unique 
> ID is a URL. Because URL's can contain invalid XML 
> characters, and I will be doing some XSLT postprocessing, I 
> was thinking that a good way to solve the problem would be to 
> store these unique ID's as md5's instead.
> 
> I think I found another alternative - it follows the 
> pre-processing avenue you suggested.
> 
> Best Regards.
> 
> --Nuno
> 
> On 23 Jul 2007, at 18:25, Yonik Seeley wrote:
> 
> > On 7/23/07, Nuno Leitao <nuno@scaletrix.com> wrote:
> >> I would like to be able to compute and store the MD5 sum 
> for a given 
> >> text in a field (in my case, I am talking about a URL string). For 
> >> example, if I have a field called 'url' the following would happen:
> >>
> >> 'http://wiki.apache.org' -> 'cb4f7e6ca1a0c00b146894b75d9f98dc'
> >
> > First, what are you trying to achieve by this?  If you give 
> people the 
> > higher level problem, they might be able to suggest a better way.
> >
> > Since you construct the XML document to send to Solr, 
> simply compute 
> > the MD5 and add that also:
> >
> > <field name="url">http://wiki.apache.org</field>
> > <field name="urlMD5">cb4f7e6ca1a0c00b146894b75d9f98dc</field>
> >
> > Or did you want to store the MD5 instead of the URL?  Did 
> you want it 
> > searchable somehow?
> >
> > -Yonik
> 

Mime
View raw message