lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vishal Sharma <vish...@grazitti.com>
Subject Re: Best way to index wordpress blogs in solr
Date Tue, 07 Oct 2014 20:36:07 GMT
Hey Alex,

Thanks for the prompt response.

Here is what I am trying to solve: I am showing search results from content
coming from 3 different places on a single site. And, I have done that by
pumping all this content to Solr server running on single flat schema by
using different APIs of these platforms. Now, I need to index blog posts
written in word press also. I was wondering if there is any solution
already availablw which can help me crawl and pump this posst to my running
solr instance. Otherwise I might have to write few more scripts to do that.

BTW, Is Swift using Solr on the backend? Because I thought its a paid
enterprise solution.






*Vishal Sharma**TL, Grazitti Interactive*T: +1 650­ 641 1754
E: vishals@grazitti.com
www.grazitti.com [image: Description: LinkedIn]
<http://www.linkedin.com/company/grazitti-interactive>[image: Description:
Twitter] <https://twitter.com/grazitti>[image: fbook]
<https://www.facebook.com/grazitti.interactive>*dreamforce®*Oct 13-16,
2014 *Meet
us at the Cloud Expo*
Booth N2341 Moscone North,
San Francisco
Schedule a Meeting
<http://www.vcita.com/v/grazittiinteractive/online_scheduling#/schedule>
   |   Follow us <https://twitter.com/grazitti>ZakCalendar
Dreamforce® Featured
App
<https://appexchange.salesforce.com/listingDetail?listingId=a0N3000000B5UPKEA3>






On Tue, Oct 7, 2014 at 11:21 AM, Alexandre Rafalovitch <arafalov@gmail.com>
wrote:

> On 7 October 2014 14:08, Vishal Sharma <vishals@grazitti.com> wrote:
> > Hi,
> >
> > I am trying to get some help on finding out if there is any best practice
> > to index wordpress blogs in solr index? Can someone help with
> architecture
> > I shoudl be setting up?
> >
> > Do, I need to write separate scripts to crawl wordpress and then pump
> posts
> > back to Solr using its API?
>
>
> Is your goal WordPress indexing or specifically indexing into Solr.
> Because there are services such as:
> https://wordpress.org/plugins/swiftype-search/
>
> Otherwise, the question is the level of access you have to the
> WordPress. You could index feeds WordPress produces (there is an
> example in the distribution for RSS parsing). Or you could pull it
> directly from the database. Or - if the real-time is not important,
> you could periodically do WordPress export (to XML) and parse that.
>
> I would NOT parse the HTML and try to recreate that.
>
> As to the rest of the architecture, you need to know whether you are
> just indexing generic WordPress or also extensions such as custom
> taxonomies, custom values, etc.
>
> These are all important questions because they will drive the Solr
> architecture more than the original question you seem to be asking.
>
> Regards,
>    Alex.
>
> Personal: http://www.outerthoughts.com/ and @arafalov
> Solr resources and newsletter: http://www.solr-start.com/ and @solrstart
> Solr popularizers community: https://www.linkedin.com/groups?gid=6713853
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message