lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vishal Sharma <>
Subject Re: Best way to index wordpress blogs in solr
Date Tue, 07 Oct 2014 20:36:07 GMT
Hey Alex,

Thanks for the prompt response.

Here is what I am trying to solve: I am showing search results from content
coming from 3 different places on a single site. And, I have done that by
pumping all this content to Solr server running on single flat schema by
using different APIs of these platforms. Now, I need to index blog posts
written in word press also. I was wondering if there is any solution
already availablw which can help me crawl and pump this posst to my running
solr instance. Otherwise I might have to write few more scripts to do that.

BTW, Is Swift using Solr on the backend? Because I thought its a paid
enterprise solution.

*Vishal Sharma**TL, Grazitti Interactive*T: +1 650­ 641 1754
E: [image: Description: LinkedIn]
<>[image: Description:
Twitter] <>[image: fbook]
<>*dreamforce®*Oct 13-16,
2014 *Meet
us at the Cloud Expo*
Booth N2341 Moscone North,
San Francisco
Schedule a Meeting
   |   Follow us <>ZakCalendar
Dreamforce® Featured

On Tue, Oct 7, 2014 at 11:21 AM, Alexandre Rafalovitch <>

> On 7 October 2014 14:08, Vishal Sharma <> wrote:
> > Hi,
> >
> > I am trying to get some help on finding out if there is any best practice
> > to index wordpress blogs in solr index? Can someone help with
> architecture
> > I shoudl be setting up?
> >
> > Do, I need to write separate scripts to crawl wordpress and then pump
> posts
> > back to Solr using its API?
> Is your goal WordPress indexing or specifically indexing into Solr.
> Because there are services such as:
> Otherwise, the question is the level of access you have to the
> WordPress. You could index feeds WordPress produces (there is an
> example in the distribution for RSS parsing). Or you could pull it
> directly from the database. Or - if the real-time is not important,
> you could periodically do WordPress export (to XML) and parse that.
> I would NOT parse the HTML and try to recreate that.
> As to the rest of the architecture, you need to know whether you are
> just indexing generic WordPress or also extensions such as custom
> taxonomies, custom values, etc.
> These are all important questions because they will drive the Solr
> architecture more than the original question you seem to be asking.
> Regards,
>    Alex.
> Personal: and @arafalov
> Solr resources and newsletter: and @solrstart
> Solr popularizers community:

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message