Hello,
I have a repository of files relatively well organised and containing a
mix of medical images and csv files produced from those images in a
neuroscience lab.
The csv files contain some interesting data that I would like to
aggregate with Drill, but the naming convention is quite special - file
names contain some id, then a prefix or suffix to identify the category
of the file and all that is nested into a folder structure organised by
subjects, for example ID1/processing1/ID1-mx.csv.
How can I use Drill to filter out the files that I do not need and keep
only the files containing my data?
For example, I would like to write something like
SELECT * FROM dfs.data.`/` where dir1 = "processing1" and file like
"%-mx.csv";
Thanks
|