lucy-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Anil Pachuri <>
Subject Re: [lucy-user] Input format to Lucy
Date Sun, 21 Apr 2013 04:50:19 GMT
Finally, I have been able to run Lucy..:). Thanks a lot Peter for your help.
Is there a way in Lucy to generate a tag map/cloud of specific types of terms/phrases that
might be present in the search results/documents returned by Lucy for a particular query?
For example, I want to generate a tag map showing all gene-names and also cell-tissue names with
their document frequency (from their respective name-lists) that might be co-mentioned in
the search results/documents returned by Lucy for a query gene (e.g. nuclear factor 1)?
One other question, how can I change the default size of text excerpt reported in the search
Thank you much.
--- On Thu, 2/21/13, Peter Karman <> wrote:

From: Peter Karman <>
Subject: Re: [lucy-user] Input format to Lucy
Date: Thursday, February 21, 2013, 2:55 PM

Anil Pachuri wrote on 2/21/13 3:22 PM:
> Hi,
> Does Lucy have a utility to accept raw XML files as input? I have 50 XML files and I
need to index selected fields in them using Lucy.

If you install SWISH::Prog::Lucy from CPAN, you get the swish3 tool installed
which will index XML (and HTML et al) files for Lucy. You can specify which XML
elements you want treated as Lucy fields with a configuration file. For example:

# a document like

# a config file like
MetaNames foo
PropertyNames foo

# and then index the file like:

% swish3 -F lucy -c configfile -i doc.xml

# and search like:

% swish3 -q foo:bar

The configuration docs are at:

You might also want to look at Dezi, which does the same thing with a
server/client setup.

> Also, is there any general perl utility to merge multiple XML files or convert these
into tabular format?

CPAN has many XML handling tools. I'm sure there's something there that will do
most or all of what you want.

Peter Karman  .  .

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message