lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Roman Chyla <roman.ch...@gmail.com>
Subject Re: Injecting synonymns into Solr
Date Mon, 04 May 2015 12:11:02 GMT
It shouldn't matter.  Btw try a url instead of a file path. I think the
underlying loading mechanism uses java File , it could work.
On May 4, 2015 2:07 AM, "Zheng Lin Edwin Yeo" <edwinyeozl@gmail.com> wrote:

> Would like to check, will this method of splitting the synonyms into
> multiple files use up a lot of memory?
>
> I'm trying it with about 10 files and that collection is not able to be
> loaded due to insufficient memory.
>
> Although currently my machine only have 4GB of memory, but I only have
> 500,000 records indexed, so not sure if there's a significant impact in the
> future (even with larger memory) when my index grows and other things like
> faceting, highlighting, and carrot tools are implemented.
>
> Regards,
> Edwin
>
>
>
> On 1 May 2015 at 11:08, Zheng Lin Edwin Yeo <edwinyeozl@gmail.com> wrote:
>
> > Thank you for the info. Yup this works. I found out that we can't load
> > files that are more than 1MB into zookeeper, as it happens to any files
> > that's larger than 1MB in size, not just the synonyms files.
> > But I'm not sure if there will be an impact to the system, as the number
> > of synonym text file can potentially grow up to more than 20 since my
> > sample synonym file size is more than 20MB.
> >
> > Currently I only have less than 500,000 records indexed in Solr, so not
> > sure if there will be a significant impact as compared to one which has
> > millions of records.
> > Will try to get more records indexed and will update here again.
> >
> > Regards,
> > Edwin
> >
> >
> > On 1 May 2015 at 08:17, Philippe Soares <soares@genomequest.com> wrote:
> >
> >> Split your synonyms into multiple files and set the SynonymFilterFactory
> >> with a coma-separated list of files. e.g. :
> >> synonyms="syn1.txt,syn2.txt,syn3.txt"
> >>
> >> On Thu, Apr 30, 2015 at 8:07 PM, Zheng Lin Edwin Yeo <
> >> edwinyeozl@gmail.com>
> >> wrote:
> >>
> >> > Just to populate it with the general synonym words. I've managed to
> >> > populate it with some source online, but is there a limit to what it
> can
> >> > contains?
> >> >
> >> > I can't load the configuration into zookeeper if the synonyms.txt file
> >> > contains more than 2100 lines.
> >> >
> >> > Regards,
> >> > Edwin
> >> > On 1 May 2015 05:44, "Chris Hostetter" <hossman_lucene@fucit.org>
> >> wrote:
> >> >
> >> > >
> >> > > : There is a possible solution here:
> >> > > : https://issues.apache.org/jira/browse/LUCENE-2347 (Dump WordNet
> to
> >> > SOLR
> >> > > : Synonym format).
> >> > >
> >> > > If you have WordNet synonyms you do't need any special code/tools
to
> >> > > convert them -- the current solr.SynonymFilterFactory supports
> wordnet
> >> > > files (just specify format="wordnet")
> >> > >
> >> > >
> >> > > : > > Does anyone knows any faster method of populating the
> >> synonyms.txt
> >> > > file
> >> > > : > > instead of manually typing in the words into the file,
which
> >> there
> >> > > could
> >> > > : > be
> >> > > : > > thousands of synonyms around?
> >> > >
> >> > > populate from what?  what is hte source of your data?
> >> > >
> >> > > the default solr synonym file format is about as simple as it could
> >> > > possibly be -- pretty trivial to generate it from scripts -- the
> hard
> >> > part
> >> > > is usually selecting the synonym data you want to use and parsing
> >> > whatever
> >> > > format it is already in.
> >> > >
> >> > >
> >> > >
> >> > > -Hoss
> >> > > http://www.lucidworks.com/
> >> > >
> >> >
> >>
> >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message