spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Xiangrui Meng <men...@gmail.com>
Subject Re: How to create distributed matrixes from hive tables.
Date Tue, 20 Jan 2015 20:45:03 GMT
You can get a SchemaRDD from the Hive table, map it into a RDD of
Vectors, and then construct a RowMatrix. The transformations are lazy,
so there is no external storage requirement for intermediate data.
-Xiangrui

On Sun, Jan 18, 2015 at 4:07 AM, guxiaobo1982 <guxiaobo1982@qq.com> wrote:
> Hi,
>
> We have large datasets with data format for Spark MLLib matrix, but there
> are pre-computed by Hive and stored inside Hive, my question is can we
> create a distributed matrix such as IndexedRowMatrix directlly from Hive
> tables, avoiding reading data from Hive tables and feed them into an empty
> Matrix.
>
> Regards
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org


Mime
View raw message