lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Iker Huerga <iker.hue...@gmail.com>
Subject Re: Issue while searching text with special characters like @,#
Date Tue, 06 Sep 2016 17:29:00 GMT
here is the thing, you are probably using the StandardAnalyzer so those
special characters are going to be removed at indexing time

If you don't want that to happen, if you don't want that to happen you can
try with KeywordAnalyzer or just create your own Analyzer

You can test with the following sample code

Hope that helps

String PATH = "src/main/resources";
String  FIELD_NAME = "text";
String FIELD_CONTENT = "iker#";
try{
Directory dir = FSDirectory.open(Paths.get(PATH));
Analyzer analyzer = new KeywordAnalyzer();
IndexWriterConfig iwc = new IndexWriterConfig(analyzer);
iwc.setOpenMode(OpenMode.CREATE);
IndexWriter writer = new IndexWriter(dir, iwc);
Document doc = new Document();
doc.add(new TextField( FIELD_NAME, new StringReader( FIELD_CONTENT )) );
writer.addDocument(doc);
writer.commit();
IndexReader reader =
DirectoryReader.open(FSDirectory.open(Paths.get(PATH)));
IndexSearcher searcher = new IndexSearcher(reader);
QueryParser parser = new QueryParser( FIELD_NAME, analyzer);
Query query = parser.parse("+text:iker#");
ScoreDoc[] docs = searcher.search(query, 2).scoreDocs;
for( ScoreDoc d : docs ){
System.out.println(d.doc);
}

2016-09-06 10:42 GMT-04:00 Chaitanya Kumar Ch <chaitu381923@gmail.com>:

> Do you suggest me to pass matching string by encoding.
> Ex:
> .onField("body").ignoreFieldBridge().ignoreAnalyzer().matching(
> URLEncoder.encode("#chaitu"))
>
> On Tue, Sep 6, 2016 at 7:58 PM, Iker Huerga <iker.huerga@gmail.com> wrote:
>
> > # and @ are Reserved Characters as per RFC 3986
> > https://tools.ietf.org/html/rfc3986 see section 2.2 so you would have to
> > URL encode them
> >
> > My 2 cents
> >
> > 2016-09-06 10:20 GMT-04:00 Chaitanya Kumar Ch <chaitu381923@gmail.com>:
> >
> > > Thanks for the reply.
> > > I have tried that but didn't work.
> > > Also please note that *@,# are not part of current special characters
> > > list*.
> > >
> > > On Tue, Sep 6, 2016 at 7:47 PM, Iker Huerga <iker.huerga@gmail.com>
> > wrote:
> > >
> > > > I'd try scaping the characters as in
> > > > https://lucene.apache.org/core/2_9_4/queryparsersyntax.
> > > > html#Escaping%20Special%20Characters
> > > >
> > > > 2016-09-06 10:02 GMT-04:00 Chaitanya Kumar Ch <
> chaitu381923@gmail.com
> > >:
> > > >
> > > > > Hi All!
> > > > >
> > > > > I am facing issue while trying to match a fields content with some
> > > > keywords
> > > > > which contains symbols like @,#
> > > > >
> > > > > I have annotated field "body" which is configured as below :
> > > > >
> > > > > @Field(analyze = Analyze.YES)private String body;
> > > > >
> > > > > only of the body column content as follows:
> > > > >
> > > > > Thursday PM Clicks: Jessica Alba; Happy birthday...
> > > > > https://t.co/VlZkSF0IUb #johndaly #baby @chaitu @chai @hey
> > > > >
> > > > > I am trying to search text of body field with below query but it's
> > not
> > > > > giving any results:
> > > > >
> > > > >  +(+body:#johndaly +body:#baby)
> > > > >
> > > > > "#" symbol is coming in the query only if I add ignoreFieldBridge()
> > to
> > > > the
> > > > > field but I am not getting results.
> > > > >
> > > > > Below query is generated If i am remove ignoreFieldBridge()
> > > > >
> > > > > +(+body:johndaly +body:baby)
> > > > >
> > > > >
> > > > > Stack overflow link
> > > > > <http://stackoverflow.com/questions/39350676/hibernate-
> > > > > search-lucene-search-text-with-special-characters-like>
> > > > > --
> > > > > Thank You,
> > > > > Chaitanya Kumar Ch,
> > > > > +91 9550837582
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > > Iker Huerga
> > > > http://www.ikerhuerga.com/
> > > >
> > >
> > >
> > >
> > > --
> > > Thank You,
> > > Chaitanya Kumar Ch,
> > > +91 9550837582
> > >
> >
> >
> >
> > --
> > Iker Huerga
> > http://www.ikerhuerga.com/
> >
>
>
>
> --
> Thank You,
> Chaitanya Kumar Ch,
> +91 9550837582
>



-- 
Iker Huerga
http://www.ikerhuerga.com/

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message