I’ve written an additional parquet-tools helper script to access files in the hadoop environment.
It is along the lines of the existing parquet-schema etc. scripts provided as part of the
parquet-tools distribution, and is designed to live in the same location as these scripts.
Hopefully others will also find it useful.
Cheers — Chris
—— cut ——
#!/usr/bin/env bash
#
# Author:
# Chris Mathews
# CTO and Co-Founder, SysMech
# www.sysmech.co.uk
#
# Determine the path to this script's directory
APPPATH=$( cd "$(dirname "$0")" ; pwd -P )
# NOTE: pre-requsite
# 1. HADOOP_HOME must be defined
# 2. create link to current parquet-tools-<version>.jar library
# eg: ln -s parquet-tools-1.8.1.jar parquet-tools.jar
#
PARQUET_TOOLS="${APPPATH}/lib/parquet-tools.jar"
if [ -z "${HADOOP_HOME}" ]
then
echo ""
echo "warning: HADOOP_HOME not define!"
echo ""
elif (! [ -f ${PARQUET_TOOLS} ] )
then
echo ""
echo "warning: file ${PARQUET_TOOLS} not found!"
echo ""
echo "info: create a link to the current parquet-tools library."
echo "info: e.g.: ln -s ${APPPATH}/lib/parquet-tools-1.8.1.jar ${APPPATH}/lib/parquet-tools.jar"
echo ""
else
# Run the application
${HADOOP_HOME}/bin/hadoop jar ${PARQUET_TOOLS} "$@"
fi
—— cut ——
|