lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Joel Bernstein (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (SOLR-10017) Add the crawl Streaming Expression
Date Sat, 21 Jan 2017 20:13:26 GMT

     [ https://issues.apache.org/jira/browse/SOLR-10017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Joel Bernstein updated SOLR-10017:
----------------------------------
    Description: The crawl Streaming Expression will wrap a stream that emits root URL's to
crawl. It will then crawl the URL's using a library such as Crawler4j. It will emit tuples
that can be indexed into a Solr Cloud collection using the update function. Solr's classifier
can be used to curate content as it's being crawled or classify sites based on the content
which it contains. The links between pages and sites can be indexed as graphs and then explored
and visualized with graph expressions.  (was: The crawl Streaming Expression will wrap a stream
that emits root URL's to crawl. It will then crawl the URL's using a library such as Crawl4j.
It will emit tuples that can be indexed into a Solr Cloud collection using the update function.
Solr's classifier can be used to curate content as it's being crawled or classify sites based
on the content which it contains. The links between pages and sites can be indexed as graphs
and then explored and visualized with graph expressions.)

> Add the crawl Streaming Expression
> ----------------------------------
>
>                 Key: SOLR-10017
>                 URL: https://issues.apache.org/jira/browse/SOLR-10017
>             Project: Solr
>          Issue Type: New Feature
>      Security Level: Public(Default Security Level. Issues are Public) 
>            Reporter: Joel Bernstein
>
> The crawl Streaming Expression will wrap a stream that emits root URL's to crawl. It
will then crawl the URL's using a library such as Crawler4j. It will emit tuples that can
be indexed into a Solr Cloud collection using the update function. Solr's classifier can be
used to curate content as it's being crawled or classify sites based on the content which
it contains. The links between pages and sites can be indexed as graphs and then explored
and visualized with graph expressions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message