lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jorge Luis Betancourt Gonzalez <jlbetanco...@uci.cu>
Subject Re: Best way to index wordpress blogs in solr
Date Tue, 07 Oct 2014 21:45:45 GMT
If you’re talking about a generic web crawl you could use something like Nutch [1] keep in
mind that his a full web crawler and it does a pretty good job. I’ve been using it for over
more than 2 years now and I’m very happy, although I don’t crawl just a couple of sites
but a more wide spectrum (think a country web scale). But with Nutch you just have to configure
a couple of options in an xml file and it will crawl the web and index the content into Solr.

Regards,

[1] http://nutch.apache.org 

On Oct 7, 2014, at 4:53 PM, Vishal Sharma <vishals@grazitti.com> wrote:

> Makes sense.
> 
> I'll just dive in now. Thanks so much.
> 
> *Vishal Sharma**TL, Grazitti Interactive*T: +1 650­ 641 1754
> E: vishals@grazitti.com
> www.grazitti.com [image: Description: LinkedIn]
> <http://www.linkedin.com/company/grazitti-interactive>[image: Description:
> Twitter] <https://twitter.com/grazitti>[image: fbook]
> <https://www.facebook.com/grazitti.interactive>*dreamforce®*Oct 13-16,
> 2014 *Meet
> us at the Cloud Expo*
> Booth N2341 Moscone North,
> San Francisco
> Schedule a Meeting
> <http://www.vcita.com/v/grazittiinteractive/online_scheduling#/schedule>
>   |   Follow us <https://twitter.com/grazitti>ZakCalendar
> Dreamforce® Featured
> App
> <https://appexchange.salesforce.com/listingDetail?listingId=a0N3000000B5UPKEA3>
> 
> 
> 
> 
> 
> 
> On Tue, Oct 7, 2014 at 1:44 PM, Alexandre Rafalovitch <arafalov@gmail.com>
> wrote:
> 
>> I am pretty sure Swift is not Solr. That's why I was asking whether
>> you were starting from scratch.
>> 
>> As to the other items, please re-read my original response. Solr has
>> an example reading in RSS feeds, you could probably use that. Or a
>> generic XML using DataImportHandler's mapping. Or directly from
>> database, again with DIH.
>> 
>> Basically, it sounds totally doable. So, it's hard to advise anything
>> specific beyond "go, do it" and wait for you to come back with a lot
>> more specific issue once you get going. Most of the issues will be
>> related to your schema and your WordPress configuration, so no
>> abstract advice is available.
>> 
>> Regards,
>>    Alex.
>> 
>> On 7 October 2014 16:36, Vishal Sharma <vishals@grazitti.com> wrote:
>>> Hey Alex,
>>> 
>>> Thanks for the prompt response.
>>> 
>>> Here is what I am trying to solve: I am showing search results from
>> content
>>> coming from 3 different places on a single site. And, I have done that by
>>> pumping all this content to Solr server running on single flat schema by
>>> using different APIs of these platforms. Now, I need to index blog posts
>>> written in word press also. I was wondering if there is any solution
>>> already availablw which can help me crawl and pump this posst to my
>> running
>>> solr instance. Otherwise I might have to write few more scripts to do
>> that.
>>> 
>>> BTW, Is Swift using Solr on the backend? Because I thought its a paid
>>> enterprise solution.
>>> 
>> 

Concurso "Mi selfie por los 5". Detalles en http://justiciaparaloscinco.wordpress.com

Mime
View raw message