lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Joel Bernstein (JIRA)" <j...@apache.org>
Subject [jira] [Created] (SOLR-10017) Add the crawl Streaming Expression
Date Sat, 21 Jan 2017 20:11:26 GMT
Joel Bernstein created SOLR-10017:
-------------------------------------

             Summary: Add the crawl Streaming Expression
                 Key: SOLR-10017
                 URL: https://issues.apache.org/jira/browse/SOLR-10017
             Project: Solr
          Issue Type: New Feature
      Security Level: Public (Default Security Level. Issues are Public)
            Reporter: Joel Bernstein


The crawl Streaming Expression will wrap a stream that emits root URL's to crawl. It will
then crawl the URL's using a library such as Crawl4j. It will emit tuples that can be indexed
into a Solr Cloud collection using the update function. Solr's classifier can be used to curate
content as it's being crawled or classify sites based on the content which it contains. The
links between pages and sites can be indexed as graphs and then explored and visualized with
graph expressions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message