lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erick Erickson <erickerick...@gmail.com>
Subject Re: Extracting URLs while indexing
Date Wed, 20 Jan 2010 17:03:42 GMT
I guess it depends on what you mean by "extract". There's
nothing that I know of that, say, stores them to a file or
separate field, or even does anything special with them.

I think StandardTokenizerFactory tries to keep URLs
together as a token in the field, but it's just another
token... You should check though....

FWIW
Erick

On Wed, Jan 20, 2010 at 9:52 AM, Bogdan Vatkov <bogdan.vatkov@gmail.com>wrote:

> Sorry, I meant completely server-side - even more I want that at indexing
> time (I do not care about query-time as I am reading later the whole index
> anyway).
>
> On Wed, Jan 20, 2010 at 2:40 AM, Erick Erickson <erickerickson@gmail.com
> >wrote:
>
> > Do you mean you want the URLs to be extracted on the client?
> > If so, no. Filters/analyzers reside on the server, not the client.
> > You'll have to do it with custom code....
> >
> > Erick
> >
> > On Tue, Jan 19, 2010 at 5:48 PM, Bogdan Vatkov <bogdan.vatkov@gmail.com
> > >wrote:
> >
> > > Hi,
> > >
> > > I want to extract URLs (http://..., as well as file://... or even
> > //.....)
> > > while pushing documents into Solr.
> > > Is it possible with the Filters/Analyzers available nowadays?
> > > I looked into the doc but could not find anything related to it.
> > >
> > > Best regards,
> > > Bogdan
> > >
> >
>
>
>
> --
> Best regards,
> Bogdan
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message