lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Zheng Lin Edwin Yeo <edwinye...@gmail.com>
Subject Re: Injecting synonymns into Solr
Date Tue, 05 May 2015 00:42:33 GMT
Yes, the underlying mechanism uses java. But the collection isn't able to
load when the Solr starts up, so it didn't return anything even if I use
url.
Is it just due to my machine not having enough memory?

Regards,
Edwin
On 4 May 2015 20:12, "Roman Chyla" <roman.chyla@gmail.com> wrote:

> It shouldn't matter.  Btw try a url instead of a file path. I think the
> underlying loading mechanism uses java File , it could work.
> On May 4, 2015 2:07 AM, "Zheng Lin Edwin Yeo" <edwinyeozl@gmail.com>
> wrote:
>
> > Would like to check, will this method of splitting the synonyms into
> > multiple files use up a lot of memory?
> >
> > I'm trying it with about 10 files and that collection is not able to be
> > loaded due to insufficient memory.
> >
> > Although currently my machine only have 4GB of memory, but I only have
> > 500,000 records indexed, so not sure if there's a significant impact in
> the
> > future (even with larger memory) when my index grows and other things
> like
> > faceting, highlighting, and carrot tools are implemented.
> >
> > Regards,
> > Edwin
> >
> >
> >
> > On 1 May 2015 at 11:08, Zheng Lin Edwin Yeo <edwinyeozl@gmail.com>
> wrote:
> >
> > > Thank you for the info. Yup this works. I found out that we can't load
> > > files that are more than 1MB into zookeeper, as it happens to any files
> > > that's larger than 1MB in size, not just the synonyms files.
> > > But I'm not sure if there will be an impact to the system, as the
> number
> > > of synonym text file can potentially grow up to more than 20 since my
> > > sample synonym file size is more than 20MB.
> > >
> > > Currently I only have less than 500,000 records indexed in Solr, so not
> > > sure if there will be a significant impact as compared to one which has
> > > millions of records.
> > > Will try to get more records indexed and will update here again.
> > >
> > > Regards,
> > > Edwin
> > >
> > >
> > > On 1 May 2015 at 08:17, Philippe Soares <soares@genomequest.com>
> wrote:
> > >
> > >> Split your synonyms into multiple files and set the
> SynonymFilterFactory
> > >> with a coma-separated list of files. e.g. :
> > >> synonyms="syn1.txt,syn2.txt,syn3.txt"
> > >>
> > >> On Thu, Apr 30, 2015 at 8:07 PM, Zheng Lin Edwin Yeo <
> > >> edwinyeozl@gmail.com>
> > >> wrote:
> > >>
> > >> > Just to populate it with the general synonym words. I've managed to
> > >> > populate it with some source online, but is there a limit to what
it
> > can
> > >> > contains?
> > >> >
> > >> > I can't load the configuration into zookeeper if the synonyms.txt
> file
> > >> > contains more than 2100 lines.
> > >> >
> > >> > Regards,
> > >> > Edwin
> > >> > On 1 May 2015 05:44, "Chris Hostetter" <hossman_lucene@fucit.org>
> > >> wrote:
> > >> >
> > >> > >
> > >> > > : There is a possible solution here:
> > >> > > : https://issues.apache.org/jira/browse/LUCENE-2347 (Dump WordNet
> > to
> > >> > SOLR
> > >> > > : Synonym format).
> > >> > >
> > >> > > If you have WordNet synonyms you do't need any special code/tools
> to
> > >> > > convert them -- the current solr.SynonymFilterFactory supports
> > wordnet
> > >> > > files (just specify format="wordnet")
> > >> > >
> > >> > >
> > >> > > : > > Does anyone knows any faster method of populating
the
> > >> synonyms.txt
> > >> > > file
> > >> > > : > > instead of manually typing in the words into the
file, which
> > >> there
> > >> > > could
> > >> > > : > be
> > >> > > : > > thousands of synonyms around?
> > >> > >
> > >> > > populate from what?  what is hte source of your data?
> > >> > >
> > >> > > the default solr synonym file format is about as simple as it
> could
> > >> > > possibly be -- pretty trivial to generate it from scripts --
the
> > hard
> > >> > part
> > >> > > is usually selecting the synonym data you want to use and parsing
> > >> > whatever
> > >> > > format it is already in.
> > >> > >
> > >> > >
> > >> > >
> > >> > > -Hoss
> > >> > > http://www.lucidworks.com/
> > >> > >
> > >> >
> > >>
> > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message