lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alexandre Rafalovitch <arafa...@gmail.com>
Subject Re: Best way to index wordpress blogs in solr
Date Tue, 07 Oct 2014 18:21:52 GMT
On 7 October 2014 14:08, Vishal Sharma <vishals@grazitti.com> wrote:
> Hi,
>
> I am trying to get some help on finding out if there is any best practice
> to index wordpress blogs in solr index? Can someone help with architecture
> I shoudl be setting up?
>
> Do, I need to write separate scripts to crawl wordpress and then pump posts
> back to Solr using its API?


Is your goal WordPress indexing or specifically indexing into Solr.
Because there are services such as:
https://wordpress.org/plugins/swiftype-search/

Otherwise, the question is the level of access you have to the
WordPress. You could index feeds WordPress produces (there is an
example in the distribution for RSS parsing). Or you could pull it
directly from the database. Or - if the real-time is not important,
you could periodically do WordPress export (to XML) and parse that.

I would NOT parse the HTML and try to recreate that.

As to the rest of the architecture, you need to know whether you are
just indexing generic WordPress or also extensions such as custom
taxonomies, custom values, etc.

These are all important questions because they will drive the Solr
architecture more than the original question you seem to be asking.

Regards,
   Alex.

Personal: http://www.outerthoughts.com/ and @arafalov
Solr resources and newsletter: http://www.solr-start.com/ and @solrstart
Solr popularizers community: https://www.linkedin.com/groups?gid=6713853

Mime
View raw message