manifoldcf-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kayak28 <>
Subject [No Subject]
Date Thu, 21 Feb 2019 02:17:00 GMT
Hello, falks:

I have a question about crawling and scraping in Manifold CF.
I want to the following sequence of tasks by using MCF.

1. crawling data from RESTful api
2. scraping data
3. insert the data to Apache Solr

In this case, how I need to setup Manifold CF is:
1. define output connector to access RESTful api (by using Web crawler
connector or Generic connector? )

2. define transformer connector to scrap html (by using html-extractor
transformer connector...?)
3. define output connector to be Solr

OR do I have to use other software such as Apache Nifi to control the
sequence of these tasks?

I appreciate for any comments and replays.


View raw message