lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sathyam <>
Subject Query regarding URL Analysers
Date Thu, 21 Aug 2014 12:35:29 GMT

I needed to generate tokens out of a URL such that I am able to get
hierarchical units of the URL as well as each individual entity as tokens.
For example:
*Given a URL : *

The tokens that I need are :

*Hierarchical subsets of the URL*

1 http://





 6 h ttp://

*Individual elements in the path to the resource*

7 abcd

8 efgh

9 ijkl

10 mnop.php

*Query Terms*

11 a=10

12 b=20

13 c=30

14 xyz

This comes to a total of 14 tokens for the given URL.
Basically a URL analyzer that creates tokens based on the categories
mentioned in bold. Also a separate token for port(if mentioned).

I would like to know how this can be achieved by using a single analyzer
that uses a combination of the tokenizers and filters provided by solr.
Also curious to know why there is a restriction of only *one  *tokenizer to
be used in an analyzer.
Looking forward to a response from your side telling the best possible way
to achieve the closest to what I need.

Sathyam Doraswamy

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message