lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jan Høydahl <jan....@cominvent.com>
Subject Re: URL search and indexing
Date Tue, 25 Jun 2013 12:47:52 GMT
Sure you can query the url directly. Or if you choose you can split it up in multiple components,
e.g. using http://lucene.apache.org/solr/4_3_0/solr-core/org/apache/solr/update/processor/URLClassifyProcessor.html

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com

25. juni 2013 kl. 14:10 skrev Flavio Pompermaier <pompermaier@okkam.it>:

> Sorry but maybe I miss something here..could I declare url as key field and
> query it too..?
> At the moment, my schema.xml looks like:
> 
> <fields>
>     <field name="url" type="string" indexed="true" stored="true"
> required="true" multiValued="false" />
> 
>   <field name="category" type="string" indexed="true" stored="true"/>
>   <field name="language" type="string" indexed="true" stored="true"/>
>  ...
>   <field name="_version_" type="long" indexed="true" stored="true"/>
> 
> </fields>
> <uniqueKey>url</uniqueKey>
> 
> Is it ok? or should I add a "baseurl" field of some kind to be able to
> query all url coming from a certain domain (1st or 2nd level as well)?
> 
> Best,
> Flavio
> 
> 
> On Tue, Jun 25, 2013 at 12:28 PM, Jan Høydahl <jan.asf@cominvent.com> wrote:
> 
>> Probably a good match for the RegExp feature of Solr (given that your url
>> is not tokenized)
>> e.g. q=url:/.*\.it$/
>> 
>> --
>> Jan Høydahl, search solution architect
>> Cominvent AS - www.cominvent.com
>> 
>> 25. juni 2013 kl. 12:17 skrev Flavio Pompermaier <pompermaier@okkam.it>:
>> 
>>> Hi to everybody,
>>> I'm quite new to Solr so maybe my question could be trivial for you..
>>> In my use case I have to index stuff contained in some URL so i use url
>> as
>>> key of my document and I treat it like a string.
>>> 
>>> However I'd like to be able to query by domain name, like *.it or *.
>>> somesite.com, what's the best strategy? I tought to made a URL to path
>>> transfromation and indexed using solr.PathHierarchyTokenizerFactory but
>>> maybe there's a simpler solution..isn't it?
>>> 
>>> Best,
>>> Flavio
>>> 
>>> --
>>> 
>>> Flavio Pompermaier
>>> *Development Department
>>> *_______________________________________________
>>> *OKKAM**Srl **- www.okkam.it*
>>> 
>>> *Phone:* +(39) 0461 283 702
>>> *Fax:* + (39) 0461 186 6433
>>> *Email:* f.pompermaier@okkam.it
>>> *Headquarters:* Trento (Italy), fraz. Villazzano, Salita dei Molini 2
>>> *Registered office:* Trento (Italy), via Segantini 23
>>> 
>>> Confidentially notice. This e-mail transmission may contain legally
>>> privileged and/or confidential information. Please do not read it if you
>>> are not the intended recipient(S). Any use, distribution, reproduction or
>>> disclosure by any other person is strictly prohibited. If you have
>> received
>>> this e-mail in error, please notify the sender and destroy the original
>>> transmission and its attachments without reading or saving it in any
>> manner.
>> 
>> 
> 
> 
> -- 
> 
> Flavio Pompermaier
> *Development Department
> *_______________________________________________
> *OKKAM**Srl **- www.okkam.it*
> 
> *Phone:* +(39) 0461 283 702
> *Fax:* + (39) 0461 186 6433
> *Email:* f.pompermaier@okkam.it
> *Headquarters:* Trento (Italy), fraz. Villazzano, Salita dei Molini 2
> *Registered office:* Trento (Italy), via Segantini 23
> 
> Confidentially notice. This e-mail transmission may contain legally
> privileged and/or confidential information. Please do not read it if you
> are not the intended recipient(S). Any use, distribution, reproduction or
> disclosure by any other person is strictly prohibited. If you have received
> this e-mail in error, please notify the sender and destroy the original
> transmission and its attachments without reading or saving it in any manner.


Mime
View raw message