Actually, what we needed to use inside of the Hive date functions to use our timestamp columns was this:
You can use --map-column-hive to manually specify which columns should be mapped to a timestamp instead of a string. See the sqoop guide for more info.A couple caveats:
- I don't know of any way of changing the default sqoop behavior so you don't need to manually name all relevant columns in every table you're sqooping.
- MySQL's date columns break when converted with this mapping (datetime and timestamp work, but not date). In order to map a MySQL date column, I had to use a custom query that converted the column to a datetime (with 00:00:00 time values) inside of MySQL so that sqoop doesn't choke on it. Obviously, this is even more annoying than having to set --map-column-hive, since it makes each sqoop command even more ad-hoc.
- In the end, although all of that worked (in a very ad-hoc manner), we decided to drop it all and just go with string columns in Hive, since all (or most) of the Hive date-related functions take only strings as input parameters, and not timestamps, so having timestamp columns meant we had to use to_date(string timestamp) inside of each Hive date function, which made all of our queries a bit unwieldy, and also (I assume) precluded us from gaining any speed advantages from having timestamps instead of strings.The last point makes me think Hive timestamps are not really ready for prime time yet... They seem to be tacked on top of the system in a way that's fully well integrated.Please let me know if you think I'm drawing the wrong conclusions from this :) ...!--
FelixOn Sun, Apr 7, 2013 at 6:23 AM, Santosh Achhra <email@example.com> wrote:Hello,I am using hive version 0.9 which supports timestamp fields. When I import data using sqoop using hive-import option, sqoop converts timestamp fields to string.sqoop version is Sqoop 1.4.1-cdh4.1.0Good wishes,always !Santosh