A better explanation from my last: Looking to use CTAS queries as HDFS ingest.
> On May 26, 2015, at 8:04 PM, Andries Engelbrecht <aengelbrecht@maprtech.com> wrote:
>
> Perhaps I’m missing something here.
>
> Why not create a DFS plug in for HDFS and put the file in HDFS?
>
>
>
>> On May 26, 2015, at 4:54 PM, Matt <bsg075@gmail.com> wrote:
>>
>> New installation with Hadoop 2.7 and Drill 1.0 on 4 nodes, it appears text files
need to be on all nodes in a cluster?
>>
>> Using the dfs config below, I am only able to query if a csv file is on all 4 nodes.
If the file is only on the local node and not others, I get errors in the form of:
>>
>> ~~~
>> 0: jdbc:drill:zk=es05:2181> select * from root.`customer_reviews_1998.csv`;
>> Error: PARSE ERROR: From line 1, column 15 to line 1, column 18: Table 'root.customer_reviews_1998.csv'
not found
>> ~~~
>>
>> ~~~
>> {
>> "type": "file",
>> "enabled": true,
>> "connection": "file:///",
>> "workspaces": {
>> "root": {
>> "location": "/localdata/hadoop/stage",
>> "writable": false,
>> "defaultInputFormat": null
>> },
>> ~~~
>>
>>> On 25 May 2015, at 20:39, Kristine Hahn wrote:
>>>
>>> The storage plugin "location" needs to be the full path to the localdata
>>> directory. This partial storage plugin definition works for the user named
>>> mapr:
>>>
>>> {
>>> "type": "file",
>>> "enabled": true,
>>> "connection": "file:///",
>>> "workspaces": {
>>> "root": {
>>> "location": "/home/mapr/localdata",
>>> "writable": false,
>>> "defaultInputFormat": null
>>> },
>>> . . .
>>>
>>> Here's a working query for the data in localdata:
>>>
>>> 0: jdbc:drill:> SELECT COLUMNS[0] AS Ngram,
>>> . . . . . . . > COLUMNS[1] AS Publication_Date,
>>> . . . . . . . > COLUMNS[2] AS Frequency
>>> . . . . . . . > FROM dfs.root.`mydata.csv`
>>> . . . . . . . > WHERE ((columns[0] = 'Zoological Journal of the Linnean')
>>> . . . . . . . > AND (columns[2] > 250)) LIMIT 10;
>>>
>>> An complete example, not yet published on the Drill site, shows in detail
>>> the steps involved:
>>> http://tshiran.github.io/drill/docs/querying-plain-text-files/#example-of-querying-a-tsv-file
>>>
>>>
>>> Kristine Hahn
>>> Sr. Technical Writer
>>> 415-497-8107 @krishahn
>>>
>>>
>>>> On Sun, May 24, 2015 at 1:56 PM, Matt <bsg075@gmail.com> wrote:
>>>>
>>>> I have used a single node install (unzip and run) to query local text /
>>>> csv files, but on a 3 node cluster (installed via MapR CE), a query with
>>>> local files results in:
>>>>
>>>> ~~~
>>>> sqlline version 1.1.6
>>>> 0: jdbc:drill:> select * from dfs.`testdata.csv`;
>>>> Query failed: PARSE ERROR: From line 1, column 15 to line 1, column 17:
>>>> Table 'dfs./localdata/testdata.csv' not found
>>>>
>>>> 0: jdbc:drill:> select * from dfs.`/localdata/testdata.csv`;
>>>> Query failed: PARSE ERROR: From line 1, column 15 to line 1, column 17:
>>>> Table 'dfs./localdata/testdata.csv' not found
>>>> ~~~
>>>>
>>>> Is there a special config for local file querying? An initial doc search
>>>> did not point me to a solution, but I may simply not have found the
>>>> relevant sections.
>>>>
>>>> I have tried modifying the default dfs config to no avail:
>>>>
>>>> ~~~
>>>> "type": "file",
>>>> "enabled": true,
>>>> "connection": "file:///",
>>>> "workspaces": {
>>>> "root": {
>>>> "location": "/localdata",
>>>> "writable": false,
>>>> "defaultInputFormat": null
>>>> }
>>>> ~~~
>
|