From commits-return-47-apmail-datafu-commits-archive=datafu.apache.org@datafu.incubator.apache.org Mon Jan 27 23:54:38 2014 Return-Path: X-Original-To: apmail-datafu-commits-archive@minotaur.apache.org Delivered-To: apmail-datafu-commits-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 5F45C10C48 for ; Mon, 27 Jan 2014 23:54:38 +0000 (UTC) Received: (qmail 58951 invoked by uid 500); 27 Jan 2014 23:53:49 -0000 Delivered-To: apmail-datafu-commits-archive@datafu.apache.org Received: (qmail 58837 invoked by uid 500); 27 Jan 2014 23:53:46 -0000 Mailing-List: contact commits-help@datafu.incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@datafu.incubator.apache.org Delivered-To: mailing list commits@datafu.incubator.apache.org Received: (qmail 58808 invoked by uid 99); 27 Jan 2014 23:53:45 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 27 Jan 2014 23:53:45 +0000 X-ASF-Spam-Status: No, hits=-2000.5 required=5.0 tests=ALL_TRUSTED,RP_MATCHES_RCVD X-Spam-Check-By: apache.org Received: from [140.211.11.3] (HELO mail.apache.org) (140.211.11.3) by apache.org (qpsmtpd/0.29) with SMTP; Mon, 27 Jan 2014 23:50:59 +0000 Received: (qmail 54988 invoked by uid 99); 27 Jan 2014 23:50:29 -0000 Received: from tyr.zones.apache.org (HELO tyr.zones.apache.org) (140.211.11.114) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 27 Jan 2014 23:50:28 +0000 Received: by tyr.zones.apache.org (Postfix, from userid 65534) id BA55A907A52; Mon, 27 Jan 2014 23:50:27 +0000 (UTC) Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit From: wvaughan@apache.org To: commits@datafu.incubator.apache.org Date: Mon, 27 Jan 2014 23:50:52 -0000 Message-Id: <9d4ca77e68a04cc7869df0655fc80453@git.apache.org> In-Reply-To: References: X-Mailer: ASF-Git Admin Mailer Subject: [27/51] [partial] DATAFU-20 Initial commit of website content X-Virus-Checked: Checked by ClamAV on apache.org http://git-wip-us.apache.org/repos/asf/incubator-datafu/blob/424e3b48/site/source/docs/datafu/1.1.0/datafu/pig/bags/DistinctBy.html ---------------------------------------------------------------------- diff --git a/site/source/docs/datafu/1.1.0/datafu/pig/bags/DistinctBy.html b/site/source/docs/datafu/1.1.0/datafu/pig/bags/DistinctBy.html new file mode 100644 index 0000000..afb0c50 --- /dev/null +++ b/site/source/docs/datafu/1.1.0/datafu/pig/bags/DistinctBy.html @@ -0,0 +1,390 @@ + + + + + + +DistinctBy (DataFu 1.1.0) + + + + + + + + + + + + +
+ + + + + + + + + + + + + + + + + + + +
+ +
+ + + +
+ +

+ +datafu.pig.bags +
+Class DistinctBy

+
+java.lang.Object
+  extended by org.apache.pig.EvalFunc<T>
+      extended by org.apache.pig.AccumulatorEvalFunc<org.apache.pig.data.DataBag>
+          extended by datafu.pig.bags.DistinctBy
+
+
+
All Implemented Interfaces:
org.apache.pig.Accumulator<org.apache.pig.data.DataBag>
+
+
+
+
public class DistinctBy
extends org.apache.pig.AccumulatorEvalFunc<org.apache.pig.data.DataBag>
+ + +

+Get distinct elements in a bag by a given set of field positions. + The input and output schemas will be identical. + + The first tuple containing each distinct combination of these fields will be taken. + + This operation is order preserving. If both A and B appear in the output, + and A appears before B in the input, then A will appear before B in the output. + + Example: +

+ define DistinctBy datafu.pig.bags.DistinctBy('0');
+ 
+ -- input:
+ -- ({(a, 1),(a,1),(b, 2),(b,22),(c, 3),(d, 4)})
+ input = LOAD 'input' AS (B: bag {T: tuple(alpha:CHARARRAY, numeric:INT)});
+ 
+ output = FOREACH input GENERATE DistinctBy(B);
+ 
+ -- output:
+ -- ({(a,1),(b,2),(c,3),(d,4)})
+  
+ 
+

+ +

+


+ +

+ + + + + + + +
+Field Summary
+ + + + + + + +
Fields inherited from class org.apache.pig.EvalFunc
log, pigLogger, reporter, returnType
+  + + + + + + + + + + +
+Constructor Summary
DistinctBy(java.lang.String... fields) + +
+           
+  + + + + + + + + + + + + + + + + + + + + + + + +
+Method Summary
+ voidaccumulate(org.apache.pig.data.Tuple input) + +
+           
+ voidcleanup() + +
+           
+ org.apache.pig.data.DataBaggetValue() + +
+           
+ org.apache.pig.impl.logicalLayer.schema.SchemaoutputSchema(org.apache.pig.impl.logicalLayer.schema.Schema input) + +
+           
+ + + + + + + +
Methods inherited from class org.apache.pig.AccumulatorEvalFunc
exec
+ + + + + + + +
Methods inherited from class org.apache.pig.EvalFunc
finish, getArgToFuncMapping, getCacheFiles, getInputSchema, getLogger, getPigLogger, getReporter, getReturnType, getSchemaName, isAsynchronous, progress, setInputSchema, setPigLogger, setReporter, setUDFContextSignature, warn
+ + + + + + + +
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
+  +

+ + + + + + + + +
+Constructor Detail
+ +

+DistinctBy

+
+public DistinctBy(java.lang.String... fields)
+
+
+ + + + + + + + +
+Method Detail
+ +

+accumulate

+
+public void accumulate(org.apache.pig.data.Tuple input)
+                throws java.io.IOException
+
+
+
Specified by:
accumulate in interface org.apache.pig.Accumulator<org.apache.pig.data.DataBag>
Specified by:
accumulate in class org.apache.pig.AccumulatorEvalFunc<org.apache.pig.data.DataBag>
+
+
+ +
Throws: +
java.io.IOException
+
+
+
+ +

+cleanup

+
+public void cleanup()
+
+
+
Specified by:
cleanup in interface org.apache.pig.Accumulator<org.apache.pig.data.DataBag>
Specified by:
cleanup in class org.apache.pig.AccumulatorEvalFunc<org.apache.pig.data.DataBag>
+
+
+
+
+
+
+ +

+getValue

+
+public org.apache.pig.data.DataBag getValue()
+
+
+
Specified by:
getValue in interface org.apache.pig.Accumulator<org.apache.pig.data.DataBag>
Specified by:
getValue in class org.apache.pig.AccumulatorEvalFunc<org.apache.pig.data.DataBag>
+
+
+
+
+
+
+ +

+outputSchema

+
+public org.apache.pig.impl.logicalLayer.schema.Schema outputSchema(org.apache.pig.impl.logicalLayer.schema.Schema input)
+
+
+
Overrides:
outputSchema in class org.apache.pig.EvalFunc<org.apache.pig.data.DataBag>
+
+
+
+
+
+ +
+ + + + + + + + + + + + + + + + + + + +
+ +
+ + + +
+Matthew Hayes, Sam Shah + + http://git-wip-us.apache.org/repos/asf/incubator-datafu/blob/424e3b48/site/source/docs/datafu/1.1.0/datafu/pig/bags/EmptyBagToNull.html ---------------------------------------------------------------------- diff --git a/site/source/docs/datafu/1.1.0/datafu/pig/bags/EmptyBagToNull.html b/site/source/docs/datafu/1.1.0/datafu/pig/bags/EmptyBagToNull.html new file mode 100644 index 0000000..4be0fea --- /dev/null +++ b/site/source/docs/datafu/1.1.0/datafu/pig/bags/EmptyBagToNull.html @@ -0,0 +1,313 @@ + + + + + + +EmptyBagToNull (DataFu 1.1.0) + + + + + + + + + + + + +
+ + + + + + + + + + + + + + + + + + + +
+ +
+ + + +
+ +

+ +datafu.pig.bags +
+Class EmptyBagToNull

+
+java.lang.Object
+  extended by org.apache.pig.EvalFunc<org.apache.pig.data.DataBag>
+      extended by datafu.pig.bags.EmptyBagToNull
+
+
+
+
public class EmptyBagToNull
extends org.apache.pig.EvalFunc<org.apache.pig.data.DataBag>
+ + +

+Returns null if the input is an empty bag; otherwise, + returns the input bag unchanged. +

+ +

+


+ +

+ + + + + + + +
+Field Summary
+ + + + + + + +
Fields inherited from class org.apache.pig.EvalFunc
log, pigLogger, reporter, returnType
+  + + + + + + + + + + +
+Constructor Summary
EmptyBagToNull() + +
+           
+  + + + + + + + + + + + + + + + +
+Method Summary
+ org.apache.pig.data.DataBagexec(org.apache.pig.data.Tuple tuple) + +
+           
+ org.apache.pig.impl.logicalLayer.schema.SchemaoutputSchema(org.apache.pig.impl.logicalLayer.schema.Schema input) + +
+           
+ + + + + + + +
Methods inherited from class org.apache.pig.EvalFunc
finish, getArgToFuncMapping, getCacheFiles, getInputSchema, getLogger, getPigLogger, getReporter, getReturnType, getSchemaName, isAsynchronous, progress, setInputSchema, setPigLogger, setReporter, setUDFContextSignature, warn
+ + + + + + + +
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
+  +

+ + + + + + + + +
+Constructor Detail
+ +

+EmptyBagToNull

+
+public EmptyBagToNull()
+
+
+ + + + + + + + +
+Method Detail
+ +

+exec

+
+public org.apache.pig.data.DataBag exec(org.apache.pig.data.Tuple tuple)
+                                 throws java.io.IOException
+
+
+
Specified by:
exec in class org.apache.pig.EvalFunc<org.apache.pig.data.DataBag>
+
+
+ +
Throws: +
java.io.IOException
+
+
+
+ +

+outputSchema

+
+public org.apache.pig.impl.logicalLayer.schema.Schema outputSchema(org.apache.pig.impl.logicalLayer.schema.Schema input)
+
+
+
Overrides:
outputSchema in class org.apache.pig.EvalFunc<org.apache.pig.data.DataBag>
+
+
+
+
+
+ +
+ + + + + + + + + + + + + + + + + + + +
+ +
+ + + +
+Matthew Hayes, Sam Shah + + http://git-wip-us.apache.org/repos/asf/incubator-datafu/blob/424e3b48/site/source/docs/datafu/1.1.0/datafu/pig/bags/EmptyBagToNullFields.html ---------------------------------------------------------------------- diff --git a/site/source/docs/datafu/1.1.0/datafu/pig/bags/EmptyBagToNullFields.html b/site/source/docs/datafu/1.1.0/datafu/pig/bags/EmptyBagToNullFields.html new file mode 100644 index 0000000..cd0f6e7 --- /dev/null +++ b/site/source/docs/datafu/1.1.0/datafu/pig/bags/EmptyBagToNullFields.html @@ -0,0 +1,328 @@ + + + + + + +EmptyBagToNullFields (DataFu 1.1.0) + + + + + + + + + + + + +
+ + + + + + + + + + + + + + + + + + + +
+ +
+ + + +
+ +

+ +datafu.pig.bags +
+Class EmptyBagToNullFields

+
+java.lang.Object
+  extended by org.apache.pig.EvalFunc<T>
+      extended by datafu.pig.util.ContextualEvalFunc<org.apache.pig.data.DataBag>
+          extended by datafu.pig.bags.EmptyBagToNullFields
+
+
+
+
public class EmptyBagToNullFields
extends ContextualEvalFunc<org.apache.pig.data.DataBag>
+ + +

+For an empty bag, inserts a tuple having null values for all fields; + otherwise, the input bag is returned unchanged. + +

+ This can be useful when performing FLATTEN on a bag from a COGROUP, + as FLATTEN on an empty bag produces no data. +

+

+ +

+


+ +

+ + + + + + + +
+Field Summary
+ + + + + + + +
Fields inherited from class org.apache.pig.EvalFunc
log, pigLogger, reporter, returnType
+  + + + + + + + + + + +
+Constructor Summary
EmptyBagToNullFields() + +
+           
+  + + + + + + + + + + + + + + + +
+Method Summary
+ org.apache.pig.data.DataBagexec(org.apache.pig.data.Tuple tuple) + +
+           
+ org.apache.pig.impl.logicalLayer.schema.SchemaoutputSchema(org.apache.pig.impl.logicalLayer.schema.Schema input) + +
+           
+ + + + + + + +
Methods inherited from class datafu.pig.util.ContextualEvalFunc
getContextProperties, getInstanceName, getInstanceProperties, setUDFContextSignature
+ + + + + + + +
Methods inherited from class org.apache.pig.EvalFunc
finish, getArgToFuncMapping, getCacheFiles, getInputSchema, getLogger, getPigLogger, getReporter, getReturnType, getSchemaName, isAsynchronous, progress, setInputSchema, setPigLogger, setReporter, warn
+ + + + + + + +
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
+  +

+ + + + + + + + +
+Constructor Detail
+ +

+EmptyBagToNullFields

+
+public EmptyBagToNullFields()
+
+
+ + + + + + + + +
+Method Detail
+ +

+exec

+
+public org.apache.pig.data.DataBag exec(org.apache.pig.data.Tuple tuple)
+                                 throws java.io.IOException
+
+
+
Specified by:
exec in class org.apache.pig.EvalFunc<org.apache.pig.data.DataBag>
+
+
+ +
Throws: +
java.io.IOException
+
+
+
+ +

+outputSchema

+
+public org.apache.pig.impl.logicalLayer.schema.Schema outputSchema(org.apache.pig.impl.logicalLayer.schema.Schema input)
+
+
+
Overrides:
outputSchema in class org.apache.pig.EvalFunc<org.apache.pig.data.DataBag>
+
+
+
+
+
+ +
+ + + + + + + + + + + + + + + + + + + +
+ +
+ + + +
+Matthew Hayes, Sam Shah + + http://git-wip-us.apache.org/repos/asf/incubator-datafu/blob/424e3b48/site/source/docs/datafu/1.1.0/datafu/pig/bags/Enumerate.html ---------------------------------------------------------------------- diff --git a/site/source/docs/datafu/1.1.0/datafu/pig/bags/Enumerate.html b/site/source/docs/datafu/1.1.0/datafu/pig/bags/Enumerate.html new file mode 100644 index 0000000..4233d21 --- /dev/null +++ b/site/source/docs/datafu/1.1.0/datafu/pig/bags/Enumerate.html @@ -0,0 +1,407 @@ + + + + + + +Enumerate (DataFu 1.1.0) + + + + + + + + + + + + +
+ + + + + + + + + + + + + + + + + + + +
+ +
+ + + +
+ +

+ +datafu.pig.bags +
+Class Enumerate

+
+java.lang.Object
+  extended by org.apache.pig.EvalFunc<T>
+      extended by org.apache.pig.AccumulatorEvalFunc<org.apache.pig.data.DataBag>
+          extended by datafu.pig.bags.Enumerate
+
+
+
All Implemented Interfaces:
org.apache.pig.Accumulator<org.apache.pig.data.DataBag>
+
+
+
+
public class Enumerate
extends org.apache.pig.AccumulatorEvalFunc<org.apache.pig.data.DataBag>
+ + +

+Enumerate a bag, appending to each tuple its index within the bag. + +

+ For example: +

+   {(A),(B),(C),(D)} => {(A,0),(B,1),(C,2),(D,3)}
+ 
+ The first constructor parameter (optional) dictates the starting index of the counting. + This UDF implements the accumulator interface, reducing DataBag materialization costs. +

+ +

+ Example: +

+ define Enumerate datafu.pig.bags.Enumerate('1');
+
+ -- input:
+ -- ({(100),(200),(300),(400)})
+ input = LOAD 'input' as (B: bag{T: tuple(v2:INT)});
+
+ -- output:
+ -- ({(100,1),(200,2),(300,3),(400,4)})
+ output = FOREACH input GENERATE Enumerate(B);
+ 
+ 
+

+ +

+


+ +

+ + + + + + + +
+Field Summary
+ + + + + + + +
Fields inherited from class org.apache.pig.EvalFunc
log, pigLogger, reporter, returnType
+  + + + + + + + + + + + + + +
+Constructor Summary
Enumerate() + +
+           
Enumerate(java.lang.String start) + +
+           
+  + + + + + + + + + + + + + + + + + + + + + + + +
+Method Summary
+ voidaccumulate(org.apache.pig.data.Tuple arg0) + +
+           
+ voidcleanup() + +
+           
+ org.apache.pig.data.DataBaggetValue() + +
+           
+ org.apache.pig.impl.logicalLayer.schema.SchemaoutputSchema(org.apache.pig.impl.logicalLayer.schema.Schema input) + +
+           
+ + + + + + + +
Methods inherited from class org.apache.pig.AccumulatorEvalFunc
exec
+ + + + + + + +
Methods inherited from class org.apache.pig.EvalFunc
finish, getArgToFuncMapping, getCacheFiles, getInputSchema, getLogger, getPigLogger, getReporter, getReturnType, getSchemaName, isAsynchronous, progress, setInputSchema, setPigLogger, setReporter, setUDFContextSignature, warn
+ + + + + + + +
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
+  +

+ + + + + + + + +
+Constructor Detail
+ +

+Enumerate

+
+public Enumerate()
+
+
+
+ +

+Enumerate

+
+public Enumerate(java.lang.String start)
+
+
+ + + + + + + + +
+Method Detail
+ +

+accumulate

+
+public void accumulate(org.apache.pig.data.Tuple arg0)
+                throws java.io.IOException
+
+
+
Specified by:
accumulate in interface org.apache.pig.Accumulator<org.apache.pig.data.DataBag>
Specified by:
accumulate in class org.apache.pig.AccumulatorEvalFunc<org.apache.pig.data.DataBag>
+
+
+ +
Throws: +
java.io.IOException
+
+
+
+ +

+cleanup

+
+public void cleanup()
+
+
+
Specified by:
cleanup in interface org.apache.pig.Accumulator<org.apache.pig.data.DataBag>
Specified by:
cleanup in class org.apache.pig.AccumulatorEvalFunc<org.apache.pig.data.DataBag>
+
+
+
+
+
+
+ +

+getValue

+
+public org.apache.pig.data.DataBag getValue()
+
+
+
Specified by:
getValue in interface org.apache.pig.Accumulator<org.apache.pig.data.DataBag>
Specified by:
getValue in class org.apache.pig.AccumulatorEvalFunc<org.apache.pig.data.DataBag>
+
+
+
+
+
+
+ +

+outputSchema

+
+public org.apache.pig.impl.logicalLayer.schema.Schema outputSchema(org.apache.pig.impl.logicalLayer.schema.Schema input)
+
+
+
Overrides:
outputSchema in class org.apache.pig.EvalFunc<org.apache.pig.data.DataBag>
+
+
+
+
+
+ +
+ + + + + + + + + + + + + + + + + + + +
+ +
+ + + +
+Matthew Hayes, Sam Shah + + http://git-wip-us.apache.org/repos/asf/incubator-datafu/blob/424e3b48/site/source/docs/datafu/1.1.0/datafu/pig/bags/FirstTupleFromBag.html ---------------------------------------------------------------------- diff --git a/site/source/docs/datafu/1.1.0/datafu/pig/bags/FirstTupleFromBag.html b/site/source/docs/datafu/1.1.0/datafu/pig/bags/FirstTupleFromBag.html new file mode 100644 index 0000000..6165d26 --- /dev/null +++ b/site/source/docs/datafu/1.1.0/datafu/pig/bags/FirstTupleFromBag.html @@ -0,0 +1,340 @@ + + + + + + +FirstTupleFromBag (DataFu 1.1.0) + + + + + + + + + + + + +
+ + + + + + + + + + + + + + + + + + + +
+ +
+ + + +
+ +

+ +datafu.pig.bags +
+Class FirstTupleFromBag

+
+java.lang.Object
+  extended by org.apache.pig.EvalFunc<T>
+      extended by datafu.pig.util.SimpleEvalFunc<org.apache.pig.data.Tuple>
+          extended by datafu.pig.bags.FirstTupleFromBag
+
+
+
+
public class FirstTupleFromBag
extends SimpleEvalFunc<org.apache.pig.data.Tuple>
+ + +

+Returns the first tuple from a bag. Requires a second parameter that will be returned if the bag is empty. + + Example: +

+ define FirstTupleFromBag datafu.pig.bags.FirstTupleFromBag();
+
+ -- input:
+ -- ({(a,1)})
+ input = LOAD 'input' AS (B: bag {T: tuple(alpha:CHARARRAY, numeric:INT)});
+
+ output = FOREACH input GENERATE FirstTupleFromBag(B, null);
+
+ -- output:
+ -- (a,1)
+ 
+ 
+

+ +

+


+ +

+ + + + + + + +
+Field Summary
+ + + + + + + +
Fields inherited from class org.apache.pig.EvalFunc
log, pigLogger, reporter, returnType
+  + + + + + + + + + + +
+Constructor Summary
FirstTupleFromBag() + +
+           
+  + + + + + + + + + + + + + + + +
+Method Summary
+ org.apache.pig.data.Tuplecall(org.apache.pig.data.DataBag bag, + org.apache.pig.data.Tuple defaultValue) + +
+           
+ org.apache.pig.impl.logicalLayer.schema.SchemaoutputSchema(org.apache.pig.impl.logicalLayer.schema.Schema input) + +
+          Override outputSchema so we can verify the input schema at pig compile time, instead of runtime
+ + + + + + + +
Methods inherited from class datafu.pig.util.SimpleEvalFunc
exec, getReturnType
+ + + + + + + +
Methods inherited from class org.apache.pig.EvalFunc
finish, getArgToFuncMapping, getCacheFiles, getInputSchema, getLogger, getPigLogger, getReporter, getSchemaName, isAsynchronous, progress, setInputSchema, setPigLogger, setReporter, setUDFContextSignature, warn
+ + + + + + + +
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
+  +

+ + + + + + + + +
+Constructor Detail
+ +

+FirstTupleFromBag

+
+public FirstTupleFromBag()
+
+
+ + + + + + + + +
+Method Detail
+ +

+call

+
+public org.apache.pig.data.Tuple call(org.apache.pig.data.DataBag bag,
+                                      org.apache.pig.data.Tuple defaultValue)
+                               throws java.io.IOException
+
+
+ +
Throws: +
java.io.IOException
+
+
+
+ +

+outputSchema

+
+public org.apache.pig.impl.logicalLayer.schema.Schema outputSchema(org.apache.pig.impl.logicalLayer.schema.Schema input)
+
+
Description copied from class: SimpleEvalFunc
+
Override outputSchema so we can verify the input schema at pig compile time, instead of runtime +

+

+
Overrides:
outputSchema in class SimpleEvalFunc<org.apache.pig.data.Tuple>
+
+
+
Parameters:
input - input schema +
Returns:
call to super.outputSchema in case schema was defined elsewhere
+
+
+ +
+ + + + + + + + + + + + + + + + + + + +
+ +
+ + + +
+Matthew Hayes, Sam Shah + +