Thank you matt for this response
Yeah it worked😊
On 20-Jun-2017 7:55 AM, "Matt Burgess" <mattyb149@apache.org> wrote:
Prabhu,
I'm no Python/Jython master by any means, so I'm sure there's a better
way to do this than what I came up with. Along the way I noticed some
things about the input data and Jython vs Python:
1) Your "for line in text[1:]:" is skipping the first line, I assume
in the "real" data there is a header?
2) The second row of data refers to a leap day (Feb 29) which did not
exist in 2015 so it throws an exception. I changed all the months to
03 and kept going
3) Your third row doesn't have any fractional seconds, is this on
purpose? I assumed so and tried to provide for that
4) Jython (and Python 2) don't support the %z directive in datetime
formats, and %Z refers to a String like a City or Country in that
timezone or the friendly name of the timezone, not the +-HHMM value.
Also in your data you include only the hour offset, not minutes
I came up with a fairly fragile script that seems to work given your input:
import datetime
import json
import java.io
from org.apache.commons.io import IOUtils
from java.nio.charset import StandardCharsets
from org.apache.nifi.processor.io import StreamCallback
class PyStreamCallback(StreamCallback):
logger = None
def __init__(self, log):
logger = log
pass
def process(self, inputStream, outputStream):
text = IOUtils.readLines(inputStream, StandardCharsets.UTF_8)
for line in text[1:]:
cols = line.split(",")
df = "%d-%m-%Y %H:%M:%S.%f"
trunc_3 = True
try:
d2 = datetime.datetime.strptime(cols[3][:-3],df)
except ValueError:
df = "%d-%m-%Y %H:%M:%S"
trunc_3 = False
d2 = datetime.datetime.strptime(cols[3][:-3],df)
if trunc_3:
cols[3] = d2.strftime(df)[:-3]
else:
cols[3] = d2.strftime(df)
outputStream.write(",".join(cols) + "\n")
flowFile = session.get()
if (flowFile != None):
flowFile = session.write(flowFile,PyStreamCallback(log))
flowFile = session.putAttribute(flowFile, "filename",
flowFile.getAttribute('filename'))
session.transfer(flowFile, REL_SUCCESS)
Please let me know if I've misunderstood anything, and I will try to
fix/improve the script.
Regards,
Matt
On Mon, Jun 19, 2017 at 8:31 AM, prabhu Mahendran
<prabhuu161994@gmail.com> wrote:
> I'm having one csv which contains lakhs of rows and below is sample
lines..,
>
> 1,Ni,23,28-02-2015 12:22:33.2212-02
> 2,Fi,21,29-02-2015 12:22:34.3212-02
> 3,Us,33,30-03-2015 12:23:35-01
> 4,Uk,34,31-03-2015 12:24:36.332211-02
> I need to get the last column of csv data which is in wrong datetime
format.
> So I need to get default datetimeformat("YYYY-MM-DD hh:mm:ss[.nnn]") from
> last column of the data.
>
> I have tried the following script to get lines from it and write into flow
> file.
>
> import json
> import java.io
> from org.apache.commons.io import IOUtils
> from java.nio.charset import StandardCharsets
> from org.apache.nifi.processor.io import StreamCallback
>
> class PyStreamCallback(StreamCallback):
> def __init__(self):
> pass
> def process(self, inputStream, outputStream):
> text = IOUtils.readLines(inputStream, StandardCharsets.UTF_8)
> for line in text[1:]:
> outputStream.write(line + "\n")
>
> flowFile = session.get()
> if (flowFile != None):
> flowFile = session.write(flowFile,PyStreamCallback())
> flowFile = session.putAttribute(flowFile, "filename",
> flowFile.getAttribute('filename'))
> session.transfer(flowFile, REL_SUCCESS)
> but I am not able to find a way to convert it like below output.
>
> 1,Ni,23,28-02-2015 12:22:33.221
> 2,Fi,21,29-02-2015 12:22:34.321
> 3,Us,33,30-03-2015 12:23:35
> 4,Uk,34,31-03-2015 12:24:36.332
> I have checked those requirement with my friend(google) and still not able
> to find solution.
>
> Can anyone guide me to convert those input data into my required output?
|