lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aurélien MAZOYER <aurelien.mazo...@francelabs.com>
Subject Re: Query regarding URL Analysers
Date Thu, 21 Aug 2014 12:46:16 GMT
Hi,

Maybe I am wrong but I am not that you can find such a tokenizer in solr 
out-of-the-box.
I can suggest to have a look to PatternTokenizer and PathTokenizer. Note 
that you can also implement your own tokenizer and add it to Solr as a 
plugin.

Regards,

Aurélien MAZOYER

Le 21/08/2014 14:35, Sathyam a écrit :
> Hi,
>
> I needed to generate tokens out of a URL such that I am able to get
> hierarchical units of the URL as well as each individual entity as tokens.
> For example:
> *Given a URL : *
>
> http://www.google.com/abcd/efgh/ijkl/mnop.php?a=10&b=20&c=30#xyz
>
> The tokens that I need are :
>
> *Hierarchical subsets of the URL*
>
> 1 http://
>
> 2 http://www.google.com/
>
> 3 http://www.google.com/abcd/
>
> 4 http://www.google.com/abcd/efgh/
>
> 5 http://www.google.com/abcd/efgh/ijkl/
>
>   6 h ttp://www.google.com/abcd/efgh/ijkl/mnop.php
>
> *Individual elements in the path to the resource*
>
> 7 abcd
>
> 8 efgh
>
> 9 ijkl
>
> 10 mnop.php
>
> *Query Terms*
>
> 11 a=10
>
> 12 b=20
>
> 13 c=30
>
> *Fragment*
> 14 xyz
>
> This comes to a total of 14 tokens for the given URL.
> Basically a URL analyzer that creates tokens based on the categories
> mentioned in bold. Also a separate token for port(if mentioned).
>
> I would like to know how this can be achieved by using a single analyzer
> that uses a combination of the tokenizers and filters provided by solr.
> Also curious to know why there is a restriction of only *one  *tokenizer to
> be used in an analyzer.
> Looking forward to a response from your side telling the best possible way
> to achieve the closest to what I need.
>
> Thanks.


Mime
View raw message