lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jorge Luis Betancourt Gonzalez <>
Subject Re: Best way to index wordpress blogs in solr
Date Tue, 07 Oct 2014 21:45:45 GMT
If you’re talking about a generic web crawl you could use something like Nutch [1] keep in
mind that his a full web crawler and it does a pretty good job. I’ve been using it for over
more than 2 years now and I’m very happy, although I don’t crawl just a couple of sites
but a more wide spectrum (think a country web scale). But with Nutch you just have to configure
a couple of options in an xml file and it will crawl the web and index the content into Solr.



On Oct 7, 2014, at 4:53 PM, Vishal Sharma <> wrote:

> Makes sense.
> I'll just dive in now. Thanks so much.
> *Vishal Sharma**TL, Grazitti Interactive*T: +1 650­ 641 1754
> E:
> [image: Description: LinkedIn]
> <>[image: Description:
> Twitter] <>[image: fbook]
> <>*dreamforce®*Oct 13-16,
> 2014 *Meet
> us at the Cloud Expo*
> Booth N2341 Moscone North,
> San Francisco
> Schedule a Meeting
> <>
>   |   Follow us <>ZakCalendar
> Dreamforce® Featured
> App
> <>
> On Tue, Oct 7, 2014 at 1:44 PM, Alexandre Rafalovitch <>
> wrote:
>> I am pretty sure Swift is not Solr. That's why I was asking whether
>> you were starting from scratch.
>> As to the other items, please re-read my original response. Solr has
>> an example reading in RSS feeds, you could probably use that. Or a
>> generic XML using DataImportHandler's mapping. Or directly from
>> database, again with DIH.
>> Basically, it sounds totally doable. So, it's hard to advise anything
>> specific beyond "go, do it" and wait for you to come back with a lot
>> more specific issue once you get going. Most of the issues will be
>> related to your schema and your WordPress configuration, so no
>> abstract advice is available.
>> Regards,
>>    Alex.
>> On 7 October 2014 16:36, Vishal Sharma <> wrote:
>>> Hey Alex,
>>> Thanks for the prompt response.
>>> Here is what I am trying to solve: I am showing search results from
>> content
>>> coming from 3 different places on a single site. And, I have done that by
>>> pumping all this content to Solr server running on single flat schema by
>>> using different APIs of these platforms. Now, I need to index blog posts
>>> written in word press also. I was wondering if there is any solution
>>> already availablw which can help me crawl and pump this posst to my
>> running
>>> solr instance. Otherwise I might have to write few more scripts to do
>> that.
>>> BTW, Is Swift using Solr on the backend? Because I thought its a paid
>>> enterprise solution.

Concurso "Mi selfie por los 5". Detalles en

View raw message