lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andi Vajda <>
Subject Re: PHP-Lucene Integration
Date Tue, 05 Apr 2005 17:58:46 GMT

As an alternative, you could also take the approach taken for PyLucene: 
compile the Java code with GCJ and generate bindings for Python with SWIG.
SWIG supports a number of languages in addition to Python such as Ruby, PHP, 
Perl, and a bunch more.

For more information, see:

As a matter of fact, a team of people is working on such a construction for 
Ruby at the moment.


On Tue, 5 Apr 2005, Giovanni Novelli wrote:

> As Lucene native language is Java it should be more natural to access its
> functionalities through JSP; anyway the idea of accessing Lucene
> functionalities seems interesting as PHP is perhaps most widely deployed
> server side scripting language.
> I think that the way to provide access to Lucene API in PHP development
> should be more general and clean as possible, so in my opinion the natural
> way should be based on a single layer that interoperates with Lucene API: an
> Apache module. Then should be needed a PHP API to call such module from PHP.
> Having an Apache module for Lucene as a component of Lucene project should
> allow the spread of Lucene not only in PHP development arena.
> On Mar 27, 2005 5:49 AM, Owen Densmore <> wrote:
>> Thanks all for the interesting responses. Sorry for being a bit late
>> in responding!
>> -- Owen
>> Owen Densmore - - -
>> Begin forwarded message:
>>> From: "Philippe Ombredanne" <>
>>> Subject: RE: PHP-Lucene Integration
>>> Owen,
>>> very interesting!
>>> Anything (code) you can share?
>> Hi Philippe. We will definitely make our code available. I suspect,
>> however, it is not terribly interesting. But if simply useful as a
>> "case study" that would still be good.
>>> From: Dawid Weiss <>
>>> Subject: Re: PHP-Lucene Integration
>>> Your implementation and ideas sound very interesting, Owen. Can we see
>>> the system anywhere in public (and play with it?)
>> We'll send a link to the site fairly soon. We're having our final
>> review tomorrow, and should have a good idea when we can let folks look
>> at it.
>>>> We are hoping the institute can afford to have us work on true
>>>> clustering techniques such as Carrot2 uses. (Thanks to Dawid and all
>>>> the Poznan University folks who's papers were so stimulating!)
>>> You are very welcome. We are also academic, so in the feeling of
>>> brotherhood we might help you set up a demo on-line clustering server
>>> free of charge. There really is not better clustering technique than
>>> the one devised to a particular problem and it seems like you found
>>> that niche. Although it's always worth experimenting with other stuff
>>> just for the sake of comparison. Just let me know if you're interested
>>> (if we can access the 'feed' of those plain search results I can set
>>> up the clustering demo in a few minutes, really).
>> This would be really great! Indeed, we'd like to help SFI to be a bit
>> more involved with exploring their collection with innovative, research
>> oriented methods.
>> Some of the staff at SFI are excited by DSpace, for example, and we'd
>> be interested in helping them explore its use in the lucene/clustering
>> context. That, and their use of Dublin Core for cataloging their
>> future work might be of general interest here in the mail list too.
>>>> We did do a
>>>> quick LSA SVD on a random set of the papers to see what the
>>>> performance (both CPU and good clustering) would be like. Our
>>>> results are encouraging, and I think the frequent phrases approach
>>>> would be best for this collection.
>>> It is always going to be challanging if you attempt to cluster the
>>> entire collection, you know. I'm (or rather: I will be) working on
>>> algorithm's extensions to deal with full text documents.
>> We're mainly using Abstracts and other meta data (Title, Authors, Key
>> phrases, Abstracts, Dates, and so on). These are reasonably small:
>> Abstracts are 150 words on the average over the current 1122 document
>> collection. If we include the title and key phrases, we get 172
>> words/doc.
>> I suspect we could safely limit the abstracts to the first few
>> sentences too, getting us to a much smaller number. Indeed, if we
>> tossed the abstracts altogether, and used just titles and key phrases,
>> we're down to less than 20 words/doc! I bet simply using reasonable
>> preprocessing we could get small enough "snippets" as to be workable.
>>> Dawid
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail:
>> For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message