nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rida Benjelloun <rida.benjell...@doculibre.com>
Subject Re: The Constellio team is proud to release its version 1.1
Date Thu, 06 Jan 2011 17:13:48 GMT
Hi,

We developed our own crawler.

It's a lightweight crawler, conforming to the Google Connector Manager
architecture.

However, some neat features of the crawler:
- Near real-time indexing. New pages are indexed seconds after they are
crawled.
- On demand pages. These pages are crawled in higher priority.
- Depth control between recrawls (prevents loops)
- Based on HtmlUnit, which supports JavaScript.

Regards.

On Thu, Jan 6, 2011 at 3:19 AM, Otis Gospodnetic <ogjunk-nutch@yahoo.com>wrote:

> I think this is a good question and I'd be curious what the answer is, too.
> Rida, could you please shed some light on this crawler side of Constellio?
>
> This is also interesting because LWE chose Aperture's crawler instead of
> Nutch, even though Andrzej works for Lucid.  How come?  Is Nutch simply too
> big and complex, while Aperture's stuff is more suitable for typical
> non-Web-scale crawling needs of a typical enterprise/LWE customer?
>
> Thanks,
> Otis
> ----
> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
> Lucene ecosystem search :: http://search-lucene.com/
>
>
> *From:* Davide Cavalaglio <davide.cavalaglio@desktopsrl.com>
> *To:* dev@nutch.apache.org
> *Sent:* Tue, December 28, 2010 7:08:27 AM
> *Subject:* Re: The Constellio team is proud to release its version 1.1
>
> Hi,
> but the crawler used by Constellio is Nutch?
>
> 2010/12/20 Rida Benjelloun <rida.benjelloun@doculibre.com>
>
>> The Constellio team is proud to release its version 1.1
>>
>> Constellio Open Source Enterprise Search is based on Apache Solr and using
>> Google Search Appliances connectors architecture, it allows, with a single
>> click, to find all relevant content in your organization (Web, email, ECM,
>> CRM etc.).
>>
>> Please be advised that the GPL v.3.0 Constellio licence has been changed
>> for the version LGPL v.3.0.
>>
>> The new licence LGPL v.3.0 gives more flexibility to developers interested
>> in plugs-in/modules development or the integration of Constellio to other
>> solutions. The SVN (svn.constellio.com) and the issue tracker (
>> issues.constellio.com) are now also open.
>>
>> Many important changes have been done in this new version.
>>
>> Here are some of new features developed in the 1.1 version:
>>
>>    - Constellio multi-platform installer
>>    - Federeted search
>>    - Document security
>>    - Autocomplete for simple search base on most popular queries
>>    - Configurable advanced search interface and autocomplete based on
>> field content
>>    - Solr connector (upload your schema.xml and content - xml and binary -
>> files)
>>    - Activation of Solr HTTP Web services and make Constellio spell
>> checker available through these services
>>    - Implementation of multiselect faceting
>>    - Configuration of display fields
>>    - Documents consultation used in the relevance calculation of search
>> results
>>    - Add field boost, document boost, and Solr dismax (relevance)
>>    - Add Carrot2 for faceting
>>    - Web crawler improvements
>>    - Add new theme
>>    - and more ...
>>  Your comments/suggestions are also welcomed !
>>
>>
>>
>> --
>> ---------------------------------------------------------
>> Rida Benjelloun
>> Constellio -  Doculibre
>> ridabenjelloun@apache.org
>> rida.benjelloun@doculibre.com
>> ---------------------------------------------------------
>>
>
>


-- 
---------------------------------------------------------
Rida Benjelloun
Constellio -  Doculibre
ridabenjelloun@apache.org
rida.benjelloun@doculibre.com
---------------------------------------------------------

Mime
View raw message