hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (Jira)" <j...@apache.org>
Subject [jira] [Work logged] (HIVE-25297) Refactor GenericUDFDateDiff
Date Thu, 01 Jul 2021 08:42:00 GMT

     [ https://issues.apache.org/jira/browse/HIVE-25297?focusedWorklogId=617473&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-617473
]

ASF GitHub Bot logged work on HIVE-25297:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 01/Jul/21 08:41
            Start Date: 01/Jul/21 08:41
    Worklog Time Spent: 10m 
      Work Description: zabetak commented on a change in pull request #2437:
URL: https://github.com/apache/hive/pull/2437#discussion_r662091884



##########
File path: ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFDateDiff.java
##########
@@ -64,121 +66,46 @@
         + "  1")
 @VectorizedExpressions({VectorUDFDateDiffColScalar.class, VectorUDFDateDiffColCol.class,
VectorUDFDateDiffScalarCol.class})
 public class GenericUDFDateDiff extends GenericUDF {
-  private transient Converter inputConverter1;
-  private transient Converter inputConverter2;
+  private final transient Converter[] tsConverters = new Converter[2];
   private IntWritable output = new IntWritable();
-  private transient PrimitiveCategory inputType1;
-  private transient PrimitiveCategory inputType2;
-  private IntWritable result = new IntWritable();
+  private final transient PrimitiveCategory[] tsInputTypes = new PrimitiveCategory[2];
+
 
   public GenericUDFDateDiff() {
   }
 
   @Override
   public ObjectInspector initialize(ObjectInspector[] arguments) throws UDFArgumentException
{
-    if (arguments.length != 2) {
-      throw new UDFArgumentLengthException(
-        "datediff() requires 2 argument, got " + arguments.length);
-    }
-    inputConverter1 = checkArguments(arguments, 0);
-    inputConverter2 = checkArguments(arguments, 1);
-    inputType1 = ((PrimitiveObjectInspector) arguments[0]).getPrimitiveCategory();
-    inputType2 = ((PrimitiveObjectInspector) arguments[1]).getPrimitiveCategory();
-    ObjectInspector outputOI = PrimitiveObjectInspectorFactory.writableIntObjectInspector;
-    return outputOI;
+    checkArgsSize(arguments,2,2);
+    checkArgPrimitive(arguments, 0);
+    checkArgPrimitive(arguments, 1);
+    checkArgGroups(arguments, 0, tsInputTypes, STRING_GROUP, DATE_GROUP);
+    checkArgGroups(arguments, 1, tsInputTypes, STRING_GROUP, DATE_GROUP);
+    obtainTimestampConverter(arguments,0, tsInputTypes, tsConverters);
+    obtainTimestampConverter(arguments,1, tsInputTypes, tsConverters);

Review comment:
       Does it make sense to refactor as below?
   ```
   for (int i = 0; i < arguments.length; i++) {
     checkArgPrimitive(...)
     chechArgGroups(...)
     obtainTimestampConverter(...)
   }
   ```

##########
File path: ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFDateDiff.java
##########
@@ -64,121 +66,46 @@
         + "  1")
 @VectorizedExpressions({VectorUDFDateDiffColScalar.class, VectorUDFDateDiffColCol.class,
VectorUDFDateDiffScalarCol.class})
 public class GenericUDFDateDiff extends GenericUDF {
-  private transient Converter inputConverter1;
-  private transient Converter inputConverter2;
+  private final transient Converter[] tsConverters = new Converter[2];
   private IntWritable output = new IntWritable();
-  private transient PrimitiveCategory inputType1;
-  private transient PrimitiveCategory inputType2;
-  private IntWritable result = new IntWritable();
+  private final transient PrimitiveCategory[] tsInputTypes = new PrimitiveCategory[2];
+
 
   public GenericUDFDateDiff() {
   }
 
   @Override
   public ObjectInspector initialize(ObjectInspector[] arguments) throws UDFArgumentException
{
-    if (arguments.length != 2) {
-      throw new UDFArgumentLengthException(
-        "datediff() requires 2 argument, got " + arguments.length);
-    }
-    inputConverter1 = checkArguments(arguments, 0);
-    inputConverter2 = checkArguments(arguments, 1);
-    inputType1 = ((PrimitiveObjectInspector) arguments[0]).getPrimitiveCategory();
-    inputType2 = ((PrimitiveObjectInspector) arguments[1]).getPrimitiveCategory();
-    ObjectInspector outputOI = PrimitiveObjectInspectorFactory.writableIntObjectInspector;
-    return outputOI;
+    checkArgsSize(arguments,2,2);
+    checkArgPrimitive(arguments, 0);
+    checkArgPrimitive(arguments, 1);
+    checkArgGroups(arguments, 0, tsInputTypes, STRING_GROUP, DATE_GROUP);
+    checkArgGroups(arguments, 1, tsInputTypes, STRING_GROUP, DATE_GROUP);
+    obtainTimestampConverter(arguments,0, tsInputTypes, tsConverters);
+    obtainTimestampConverter(arguments,1, tsInputTypes, tsConverters);
+    return PrimitiveObjectInspectorFactory.writableIntObjectInspector;
   }
 
   @Override
   public IntWritable evaluate(DeferredObject[] arguments) throws HiveException {
-    output = evaluate(convertToDate(inputType1, inputConverter1, arguments[0]),
-      convertToDate(inputType2, inputConverter2, arguments[1]));
-    return output;
-  }
 
-  @Override
-  public String getDisplayString(String[] children) {
-    return getStandardDisplayString("datediff", children);
-  }
+    Timestamp ts1 = getTimestampValue(arguments, 0, tsConverters);
+    Timestamp ts2 = getTimestampValue(arguments, 1, tsConverters);
 
-  @Nullable
-  private Date convertToDate(PrimitiveCategory inputType, Converter converter, DeferredObject
argument)
-    throws HiveException {
-    assert(converter != null);
-    assert(argument != null);
-    if (argument.get() == null) {
-      return null;
-    }
-    switch (inputType) {
-    case STRING:
-    case VARCHAR:
-    case CHAR: {
-      String dateString = converter.convert(argument.get()).toString();
-      Date date = DateParser.parseDate(dateString);
-      if (date != null) {
-        return date;
-      }
-      Timestamp ts = PrimitiveObjectInspectorUtils.getTimestampFromString(dateString);
-      if (ts != null) {
-        return Date.ofEpochMilli(ts.toEpochMilli());
-      }
+    if (ts1 == null || ts2 == null) {
       return null;
     }
-    case TIMESTAMP:
-      Timestamp ts = ((TimestampWritableV2) converter.convert(argument.get()))
-        .getTimestamp();
-      return Date.ofEpochMilli(ts.toEpochMilli());
-    case DATE:
-      DateWritableV2 dw = (DateWritableV2) converter.convert(argument.get());
-      return dw.get();
-    case TIMESTAMPLOCALTZ:
-      TimestampTZ tsz = ((TimestampLocalTZWritable) converter.convert(argument.get()))
-          .getTimestampTZ();
-      return Date.ofEpochMilli(tsz.getEpochSecond() * 1000l);
-    default:
-      throw new UDFArgumentException(
-        "TO_DATE() only takes STRING/TIMESTAMP/TIMESTAMPLOCALTZ types, got " + inputType);
-    }
-  }
 
-  private Converter checkArguments(ObjectInspector[] arguments, int i) throws UDFArgumentException
{
-    if (arguments[i].getCategory() != ObjectInspector.Category.PRIMITIVE) {
-      throw new UDFArgumentTypeException(0,
-        "Only primitive type arguments are accepted but "
-        + arguments[i].getTypeName() + " is passed. as first arguments");
-    }
-    final PrimitiveCategory inputType =
-        ((PrimitiveObjectInspector) arguments[i]).getPrimitiveCategory();
-    switch (inputType) {
-    case STRING:
-    case VARCHAR:
-    case CHAR:
-      return ObjectInspectorConverters.getConverter(arguments[i],
-        PrimitiveObjectInspectorFactory.writableStringObjectInspector);
-    case TIMESTAMP:
-      return new TimestampConverter((PrimitiveObjectInspector) arguments[i],
-        PrimitiveObjectInspectorFactory.writableTimestampObjectInspector);
-    case TIMESTAMPLOCALTZ:
-      return new PrimitiveObjectInspectorConverter.TimestampLocalTZConverter(
-          (PrimitiveObjectInspector) arguments[i],
-          PrimitiveObjectInspectorFactory.writableTimestampTZObjectInspector
-      );
-    case DATE:
-      return ObjectInspectorConverters.getConverter(arguments[i],
-        PrimitiveObjectInspectorFactory.writableDateObjectInspector);
-    default:
-      throw new UDFArgumentException(
-          " DATEDIFF() only takes STRING/TIMESTAMP/DATEWRITABLE/TIMESTAMPLOCALTZ types as
" + (i + 1)
-              + "-th argument, got " + inputType);

Review comment:
       Same comment here, do we have test coverage for this?

##########
File path: ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFDateDiff.java
##########
@@ -64,121 +66,46 @@
         + "  1")
 @VectorizedExpressions({VectorUDFDateDiffColScalar.class, VectorUDFDateDiffColCol.class,
VectorUDFDateDiffScalarCol.class})
 public class GenericUDFDateDiff extends GenericUDF {
-  private transient Converter inputConverter1;
-  private transient Converter inputConverter2;
+  private final transient Converter[] tsConverters = new Converter[2];
   private IntWritable output = new IntWritable();
-  private transient PrimitiveCategory inputType1;
-  private transient PrimitiveCategory inputType2;
-  private IntWritable result = new IntWritable();
+  private final transient PrimitiveCategory[] tsInputTypes = new PrimitiveCategory[2];
+
 
   public GenericUDFDateDiff() {
   }
 
   @Override
   public ObjectInspector initialize(ObjectInspector[] arguments) throws UDFArgumentException
{
-    if (arguments.length != 2) {
-      throw new UDFArgumentLengthException(
-        "datediff() requires 2 argument, got " + arguments.length);
-    }
-    inputConverter1 = checkArguments(arguments, 0);
-    inputConverter2 = checkArguments(arguments, 1);
-    inputType1 = ((PrimitiveObjectInspector) arguments[0]).getPrimitiveCategory();
-    inputType2 = ((PrimitiveObjectInspector) arguments[1]).getPrimitiveCategory();
-    ObjectInspector outputOI = PrimitiveObjectInspectorFactory.writableIntObjectInspector;
-    return outputOI;
+    checkArgsSize(arguments,2,2);
+    checkArgPrimitive(arguments, 0);
+    checkArgPrimitive(arguments, 1);
+    checkArgGroups(arguments, 0, tsInputTypes, STRING_GROUP, DATE_GROUP);
+    checkArgGroups(arguments, 1, tsInputTypes, STRING_GROUP, DATE_GROUP);
+    obtainTimestampConverter(arguments,0, tsInputTypes, tsConverters);
+    obtainTimestampConverter(arguments,1, tsInputTypes, tsConverters);
+    return PrimitiveObjectInspectorFactory.writableIntObjectInspector;
   }
 
   @Override
   public IntWritable evaluate(DeferredObject[] arguments) throws HiveException {
-    output = evaluate(convertToDate(inputType1, inputConverter1, arguments[0]),
-      convertToDate(inputType2, inputConverter2, arguments[1]));
-    return output;
-  }
 
-  @Override
-  public String getDisplayString(String[] children) {
-    return getStandardDisplayString("datediff", children);
-  }
+    Timestamp ts1 = getTimestampValue(arguments, 0, tsConverters);
+    Timestamp ts2 = getTimestampValue(arguments, 1, tsConverters);
 
-  @Nullable
-  private Date convertToDate(PrimitiveCategory inputType, Converter converter, DeferredObject
argument)
-    throws HiveException {
-    assert(converter != null);
-    assert(argument != null);
-    if (argument.get() == null) {
-      return null;
-    }
-    switch (inputType) {
-    case STRING:
-    case VARCHAR:
-    case CHAR: {
-      String dateString = converter.convert(argument.get()).toString();
-      Date date = DateParser.parseDate(dateString);
-      if (date != null) {
-        return date;
-      }
-      Timestamp ts = PrimitiveObjectInspectorUtils.getTimestampFromString(dateString);
-      if (ts != null) {
-        return Date.ofEpochMilli(ts.toEpochMilli());
-      }
+    if (ts1 == null || ts2 == null) {
       return null;
     }
-    case TIMESTAMP:
-      Timestamp ts = ((TimestampWritableV2) converter.convert(argument.get()))
-        .getTimestamp();
-      return Date.ofEpochMilli(ts.toEpochMilli());
-    case DATE:
-      DateWritableV2 dw = (DateWritableV2) converter.convert(argument.get());
-      return dw.get();
-    case TIMESTAMPLOCALTZ:
-      TimestampTZ tsz = ((TimestampLocalTZWritable) converter.convert(argument.get()))
-          .getTimestampTZ();
-      return Date.ofEpochMilli(tsz.getEpochSecond() * 1000l);
-    default:
-      throw new UDFArgumentException(
-        "TO_DATE() only takes STRING/TIMESTAMP/TIMESTAMPLOCALTZ types, got " + inputType);
-    }
-  }
 
-  private Converter checkArguments(ObjectInspector[] arguments, int i) throws UDFArgumentException
{
-    if (arguments[i].getCategory() != ObjectInspector.Category.PRIMITIVE) {
-      throw new UDFArgumentTypeException(0,
-        "Only primitive type arguments are accepted but "
-        + arguments[i].getTypeName() + " is passed. as first arguments");
-    }
-    final PrimitiveCategory inputType =
-        ((PrimitiveObjectInspector) arguments[i]).getPrimitiveCategory();
-    switch (inputType) {
-    case STRING:
-    case VARCHAR:
-    case CHAR:
-      return ObjectInspectorConverters.getConverter(arguments[i],
-        PrimitiveObjectInspectorFactory.writableStringObjectInspector);
-    case TIMESTAMP:
-      return new TimestampConverter((PrimitiveObjectInspector) arguments[i],
-        PrimitiveObjectInspectorFactory.writableTimestampObjectInspector);
-    case TIMESTAMPLOCALTZ:
-      return new PrimitiveObjectInspectorConverter.TimestampLocalTZConverter(
-          (PrimitiveObjectInspector) arguments[i],
-          PrimitiveObjectInspectorFactory.writableTimestampTZObjectInspector
-      );
-    case DATE:
-      return ObjectInspectorConverters.getConverter(arguments[i],
-        PrimitiveObjectInspectorFactory.writableDateObjectInspector);
-    default:
-      throw new UDFArgumentException(
-          " DATEDIFF() only takes STRING/TIMESTAMP/DATEWRITABLE/TIMESTAMPLOCALTZ types as
" + (i + 1)
-              + "-th argument, got " + inputType);
-    }
-  }
+    Date date1 = Date.ofEpochMilli(ts1.toEpochMilli());
+    Date date2 = Date.ofEpochMilli(ts2.toEpochMilli());
 
-  private IntWritable evaluate(Date date, Date date2) {
-
-    if (date == null || date2 == null) {
-      return null;
-    }
+    output.set(DateWritableV2.dateToDays(date1) - DateWritableV2.dateToDays(date2));
+    return output;

Review comment:
       Do we need to transform `Timestamp` to `Date` in order to get the days difference.
Inside timestamp there is `LocalDateTime` so can't we use the new Java APIs (such as `Duration`
or `ChronoUnit`) to get the difference?

##########
File path: ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFDateDiff.java
##########
@@ -64,121 +66,46 @@
         + "  1")
 @VectorizedExpressions({VectorUDFDateDiffColScalar.class, VectorUDFDateDiffColCol.class,
VectorUDFDateDiffScalarCol.class})
 public class GenericUDFDateDiff extends GenericUDF {
-  private transient Converter inputConverter1;
-  private transient Converter inputConverter2;
+  private final transient Converter[] tsConverters = new Converter[2];
   private IntWritable output = new IntWritable();
-  private transient PrimitiveCategory inputType1;
-  private transient PrimitiveCategory inputType2;
-  private IntWritable result = new IntWritable();
+  private final transient PrimitiveCategory[] tsInputTypes = new PrimitiveCategory[2];
+
 
   public GenericUDFDateDiff() {
   }
 
   @Override
   public ObjectInspector initialize(ObjectInspector[] arguments) throws UDFArgumentException
{
-    if (arguments.length != 2) {
-      throw new UDFArgumentLengthException(
-        "datediff() requires 2 argument, got " + arguments.length);
-    }
-    inputConverter1 = checkArguments(arguments, 0);
-    inputConverter2 = checkArguments(arguments, 1);
-    inputType1 = ((PrimitiveObjectInspector) arguments[0]).getPrimitiveCategory();
-    inputType2 = ((PrimitiveObjectInspector) arguments[1]).getPrimitiveCategory();
-    ObjectInspector outputOI = PrimitiveObjectInspectorFactory.writableIntObjectInspector;
-    return outputOI;
+    checkArgsSize(arguments,2,2);
+    checkArgPrimitive(arguments, 0);
+    checkArgPrimitive(arguments, 1);
+    checkArgGroups(arguments, 0, tsInputTypes, STRING_GROUP, DATE_GROUP);
+    checkArgGroups(arguments, 1, tsInputTypes, STRING_GROUP, DATE_GROUP);
+    obtainTimestampConverter(arguments,0, tsInputTypes, tsConverters);
+    obtainTimestampConverter(arguments,1, tsInputTypes, tsConverters);
+    return PrimitiveObjectInspectorFactory.writableIntObjectInspector;
   }
 
   @Override
   public IntWritable evaluate(DeferredObject[] arguments) throws HiveException {
-    output = evaluate(convertToDate(inputType1, inputConverter1, arguments[0]),
-      convertToDate(inputType2, inputConverter2, arguments[1]));
-    return output;
-  }
 
-  @Override
-  public String getDisplayString(String[] children) {
-    return getStandardDisplayString("datediff", children);
-  }
+    Timestamp ts1 = getTimestampValue(arguments, 0, tsConverters);
+    Timestamp ts2 = getTimestampValue(arguments, 1, tsConverters);
 
-  @Nullable
-  private Date convertToDate(PrimitiveCategory inputType, Converter converter, DeferredObject
argument)
-    throws HiveException {
-    assert(converter != null);
-    assert(argument != null);
-    if (argument.get() == null) {
-      return null;
-    }
-    switch (inputType) {
-    case STRING:
-    case VARCHAR:
-    case CHAR: {
-      String dateString = converter.convert(argument.get()).toString();
-      Date date = DateParser.parseDate(dateString);
-      if (date != null) {
-        return date;
-      }
-      Timestamp ts = PrimitiveObjectInspectorUtils.getTimestampFromString(dateString);
-      if (ts != null) {
-        return Date.ofEpochMilli(ts.toEpochMilli());
-      }
+    if (ts1 == null || ts2 == null) {
       return null;
     }
-    case TIMESTAMP:
-      Timestamp ts = ((TimestampWritableV2) converter.convert(argument.get()))
-        .getTimestamp();
-      return Date.ofEpochMilli(ts.toEpochMilli());
-    case DATE:
-      DateWritableV2 dw = (DateWritableV2) converter.convert(argument.get());
-      return dw.get();
-    case TIMESTAMPLOCALTZ:
-      TimestampTZ tsz = ((TimestampLocalTZWritable) converter.convert(argument.get()))
-          .getTimestampTZ();
-      return Date.ofEpochMilli(tsz.getEpochSecond() * 1000l);
-    default:
-      throw new UDFArgumentException(
-        "TO_DATE() only takes STRING/TIMESTAMP/TIMESTAMPLOCALTZ types, got " + inputType);

Review comment:
       I didn't see any negative tests verifying that an error is raised when types are illegal.
Would it be possible to add some tests to make sure that the refactoring doesn't break something?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Issue Time Tracking
-------------------

    Worklog Id:     (was: 617473)
    Time Spent: 50m  (was: 40m)

> Refactor GenericUDFDateDiff
> ---------------------------
>
>                 Key: HIVE-25297
>                 URL: https://issues.apache.org/jira/browse/HIVE-25297
>             Project: Hive
>          Issue Type: Task
>          Components: UDF
>    Affects Versions: All Versions
>            Reporter: Ashish Sharma
>            Assignee: Ashish Sharma
>            Priority: Trivial
>              Labels: pull-request-available
>             Fix For: 4.0.0
>
>          Time Spent: 50m
>  Remaining Estimate: 0h
>
> Description
> Remove redundant code and refactor entire GenericUDFDateDiff.class code



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Mime
View raw message