hawq-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kyle Dunn <kd...@pivotal.io>
Subject Questions about filesystem / filespace / tablespace
Date Mon, 13 Mar 2017 23:57:23 GMT
Hello devs -

I'm doing some reading about HAWQ tablespaces here:

I want to understand the flow of things, please correct me on the following

1) Create a filesystem (not *really* supported after HAWQ init) - the
default is obviously [lib]HDFS[3]:
      SELECT * FROM pg_filesystem;

2) Create a filespace, referencing the above file system:
      CREATE FILESPACE testfs ON hdfs
      ('localhost:8020/fs/testfs') WITH (NUMREPLICA = 1);

3) Create a tablespace, reference the above filespace:
      CREATE TABLESPACE fastspace FILESPACE testfs;

4) Create objects referencing the above table space, or set it as the
database's default:

Given this set of steps, it it true (*in theory*) an arbitrary filesystem
(i.e. storage backend) could be added to HAWQ using *existing* APIs?

I realize the nuances of this are significant, but conceptually I'd like to
gather some details, mainly in support of this
<https://issues.apache.org/jira/browse/HAWQ-1270> ongoing JIRA discussion.
I'm daydreaming about whether this neat tool:
https://github.com/s3fs-fuse/s3fs-fuse could be useful for an S3 spike
(which also seems to kind of work on Google Cloud, when interoperability
mode is enabled). By it's Linux FUSE nature, it implements the lion's share
of required pg_filesystem functions; in fact, maybe we could actually use
system calls from glibc (somewhat <http://www.linux-mag.com/id/7814/>)
directly in this situation.

Curious to get some feedback.

*Kyle Dunn | Data Engineering | Pivotal*
Direct: 303.905.3171 <3039053171> | Email: kdunn@pivotal.io

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message