lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sathyam <sathyam.dorasw...@gmail.com>
Subject Query regarding URL Analysers
Date Thu, 21 Aug 2014 12:35:29 GMT
Hi,

I needed to generate tokens out of a URL such that I am able to get
hierarchical units of the URL as well as each individual entity as tokens.
For example:
*Given a URL : *

http://www.google.com/abcd/efgh/ijkl/mnop.php?a=10&b=20&c=30#xyz

The tokens that I need are :

*Hierarchical subsets of the URL*

1 http://

2 http://www.google.com/

3 http://www.google.com/abcd/

4 http://www.google.com/abcd/efgh/

5 http://www.google.com/abcd/efgh/ijkl/

 6 h ttp://www.google.com/abcd/efgh/ijkl/mnop.php

*Individual elements in the path to the resource*

7 abcd

8 efgh

9 ijkl

10 mnop.php

*Query Terms*

11 a=10

12 b=20

13 c=30

*Fragment*
14 xyz

This comes to a total of 14 tokens for the given URL.
Basically a URL analyzer that creates tokens based on the categories
mentioned in bold. Also a separate token for port(if mentioned).

I would like to know how this can be achieved by using a single analyzer
that uses a combination of the tokenizers and filters provided by solr.
Also curious to know why there is a restriction of only *one  *tokenizer to
be used in an analyzer.
Looking forward to a response from your side telling the best possible way
to achieve the closest to what I need.

Thanks.
-- 
Sathyam Doraswamy

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message