Here is the hdfs storage definition and query I am using. Same query runs fine if run off
local filesystem with dfs storage prefix. All I am doing is swapping dfs for hdfs.
{
"type": "file",
"connection": "hdfs://host18-namenode:8020/",
"config": null,
"workspaces": {
"tmp": {
"location": "/tmp",
"writable": true,
"defaultInputFormat": null,
"allowAccessOutsideWorkspace": false
},
"root": {
"location": "/",
"writable": false,
"defaultInputFormat": null,
"allowAccessOutsideWorkspace": false
}
},
"formats": null,
"enabled": true
}
select s.application_id, get_spark_attrs(s.spark_event,'spark.executor.memory') as spark_attributes
from hdfs.`/user/hive/spark_data/dt=2019-01-25/part-00004-ae91cbe2-5410-4bec-ad68-10a053fb2b68.json`
s where (REGEXP_REPLACE(REGEXP_REPLACE(substr(s.spark_event,11), '[^0-9A-Za-z]"', ''),'(".*)','')
= 'SparkListenerEnvironmentUpdate' or REGEXP_REPLACE(REGEXP_REPLACE(substr(s.spark_event,11),
'[^0-9A-Za-z]"', ''),'(".*)','') = 'SparkListenerApplicationStart' or REGEXP_REPLACE(REGEXP_REPLACE(substr(s.spark_event,11),
'[^0-9A-Za-z]"', ''),'(".*)','') = 'SparkListenerApplicationEnd') group by application_id,
spark_attributes order by application_id;
On Tuesday, February 12, 2019, 3:04:40 PM PST, Abhishek Girish <agirish@apache.org>
wrote:
This message is eligible for Automatic Cleanup! (agirish@apache.org) Add cleanup rule | More
info
Hey Krishnanand,
As mentioned by other folks in earlier threads, can you make sure to
include ALL RELEVANT details in your emails? That includes the query,
storage plugin configuration, data format, sample data / description of the
data, the full log for the query failure? It's necessary if one needs to be
able to understand the issue or offer help.
Regards,
Abhishek
On Tue, Feb 12, 2019 at 2:37 PM Krishnanand Khambadkone
<kkhambadkone@yahoo.com.invalid> wrote:
> I have defined a hdfs storage type with all the required properties.
> However, when I try to use that in the query it returns
> Error: VALIDATION ERROR: null
>
|