lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chaitanya Kumar Ch <chaitu381...@gmail.com>
Subject Re: Issue while searching text with special characters like @,#
Date Wed, 07 Sep 2016 09:12:42 GMT
Thank you.
Can you please share me code snippet to deal with these chars.
I tried but couldn't achieve.

On Tue, Sep 6, 2016 at 10:59 PM, Iker Huerga <iker.huerga@gmail.com> wrote:

> here is the thing, you are probably using the StandardAnalyzer so those
> special characters are going to be removed at indexing time
>
> If you don't want that to happen, if you don't want that to happen you can
> try with KeywordAnalyzer or just create your own Analyzer
>
> You can test with the following sample code
>
> Hope that helps
>
> String PATH = "src/main/resources";
> String  FIELD_NAME = "text";
> String FIELD_CONTENT = "iker#";
> try{
> Directory dir = FSDirectory.open(Paths.get(PATH));
> Analyzer analyzer = new KeywordAnalyzer();
> IndexWriterConfig iwc = new IndexWriterConfig(analyzer);
> iwc.setOpenMode(OpenMode.CREATE);
> IndexWriter writer = new IndexWriter(dir, iwc);
> Document doc = new Document();
> doc.add(new TextField( FIELD_NAME, new StringReader( FIELD_CONTENT )) );
> writer.addDocument(doc);
> writer.commit();
> IndexReader reader =
> DirectoryReader.open(FSDirectory.open(Paths.get(PATH)));
> IndexSearcher searcher = new IndexSearcher(reader);
> QueryParser parser = new QueryParser( FIELD_NAME, analyzer);
> Query query = parser.parse("+text:iker#");
> ScoreDoc[] docs = searcher.search(query, 2).scoreDocs;
> for( ScoreDoc d : docs ){
> System.out.println(d.doc);
> }
>
> 2016-09-06 10:42 GMT-04:00 Chaitanya Kumar Ch <chaitu381923@gmail.com>:
>
> > Do you suggest me to pass matching string by encoding.
> > Ex:
> > .onField("body").ignoreFieldBridge().ignoreAnalyzer().matching(
> > URLEncoder.encode("#chaitu"))
> >
> > On Tue, Sep 6, 2016 at 7:58 PM, Iker Huerga <iker.huerga@gmail.com>
> wrote:
> >
> > > # and @ are Reserved Characters as per RFC 3986
> > > https://tools.ietf.org/html/rfc3986 see section 2.2 so you would have
> to
> > > URL encode them
> > >
> > > My 2 cents
> > >
> > > 2016-09-06 10:20 GMT-04:00 Chaitanya Kumar Ch <chaitu381923@gmail.com
> >:
> > >
> > > > Thanks for the reply.
> > > > I have tried that but didn't work.
> > > > Also please note that *@,# are not part of current special characters
> > > > list*.
> > > >
> > > > On Tue, Sep 6, 2016 at 7:47 PM, Iker Huerga <iker.huerga@gmail.com>
> > > wrote:
> > > >
> > > > > I'd try scaping the characters as in
> > > > > https://lucene.apache.org/core/2_9_4/queryparsersyntax.
> > > > > html#Escaping%20Special%20Characters
> > > > >
> > > > > 2016-09-06 10:02 GMT-04:00 Chaitanya Kumar Ch <
> > chaitu381923@gmail.com
> > > >:
> > > > >
> > > > > > Hi All!
> > > > > >
> > > > > > I am facing issue while trying to match a fields content with
> some
> > > > > keywords
> > > > > > which contains symbols like @,#
> > > > > >
> > > > > > I have annotated field "body" which is configured as below :
> > > > > >
> > > > > > @Field(analyze = Analyze.YES)private String body;
> > > > > >
> > > > > > only of the body column content as follows:
> > > > > >
> > > > > > Thursday PM Clicks: Jessica Alba; Happy birthday...
> > > > > > https://t.co/VlZkSF0IUb #johndaly #baby @chaitu @chai @hey
> > > > > >
> > > > > > I am trying to search text of body field with below query but
> it's
> > > not
> > > > > > giving any results:
> > > > > >
> > > > > >  +(+body:#johndaly +body:#baby)
> > > > > >
> > > > > > "#" symbol is coming in the query only if I add
> ignoreFieldBridge()
> > > to
> > > > > the
> > > > > > field but I am not getting results.
> > > > > >
> > > > > > Below query is generated If i am remove ignoreFieldBridge()
> > > > > >
> > > > > > +(+body:johndaly +body:baby)
> > > > > >
> > > > > >
> > > > > > Stack overflow link
> > > > > > <http://stackoverflow.com/questions/39350676/hibernate-
> > > > > > search-lucene-search-text-with-special-characters-like>
> > > > > > --
> > > > > > Thank You,
> > > > > > Chaitanya Kumar Ch,
> > > > > > +91 9550837582
> > > > > >
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > Iker Huerga
> > > > > http://www.ikerhuerga.com/
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > > Thank You,
> > > > Chaitanya Kumar Ch,
> > > > +91 9550837582
> > > >
> > >
> > >
> > >
> > > --
> > > Iker Huerga
> > > http://www.ikerhuerga.com/
> > >
> >
> >
> >
> > --
> > Thank You,
> > Chaitanya Kumar Ch,
> > +91 9550837582
> >
>
>
>
> --
> Iker Huerga
> http://www.ikerhuerga.com/
>



-- 
Thank You,
Chaitanya Kumar Ch,
+91 9550837582

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message