lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tom Hirschfeld <tomhirschf...@gmail.com>
Subject Lucene, Spark, HDFS question
Date Tue, 13 Mar 2018 23:31:30 GMT
Hello!


*Background*: My team is running a machine learning pipeline, and part of
the pipeline is an http scrape of a web based Lucene application via
http calls. The scrape outputs a CSV file that we then upload to HDFS and
use it as input to run a spark ML job.

*Question: *Is there a way for our spark application to read from a lucene
index stored in HDFS?  Specifically, I see here
<http://lucene.apache.org/solr/6_5_0/solr-core/org/apache/solr/store/hdfs/HdfsDirectory.html>
that
solr-core has an hdfs directory type that seems to be compatible with our
lucene indexreader. Is this compatible? Are we able to store our index in
HDFS and read from a spark job?


Best,
Tom Hirschfeld

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message