I have a question about crawling and scraping in Manifold CF.
I want to the following sequence of tasks by using MCF.
1. crawling data from RESTful api
2. scraping data
3. insert the data to Apache Solr
In this case, how I need to setup Manifold CF is:
1. define output connector to access RESTful api (by using Web crawler connector or Generic connector? )
2. define transformer connector to scrap html (by using html-extractor transformer connector...?)
3. define output connector to be Solr
OR do I have to use other software such as Apache Nifi to control the sequence of these tasks?
I appreciate for any comments and replays.