nifi-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Arpad Boda <ab...@hortonworks.com>
Subject Re: Modify Flowfile attributes
Date Wed, 30 Jan 2019 10:25:20 GMT
Tom,

Could you use logattribute processor and somehow log the value of your “date_final” variables?

Tested your code with Jpython, with input string “181231” it works as expected (the result
is 1st week of 2019).

From: Tomislav Novosel <to.novosel@gmail.com>
Reply-To: "users@nifi.apache.org" <users@nifi.apache.org>
Date: Wednesday, 30 January 2019 at 11:10
To: "users@nifi.apache.org" <users@nifi.apache.org>
Subject: Re: Modify Flowfile attributes

Yes, the values are correct. Attribute has value which is expected to be.
i.e. for date 181231 in filename I get value 18231 for attribute week_extracted which is extracted
from filename with split method.

Tom.

On Wed, 30 Jan 2019 at 10:59, Arpad Boda <aboda@hortonworks.com<mailto:aboda@hortonworks.com>>
wrote:
Hi Tom,

“that is exactly what I tried and date_final or date_file are applied to the attribute of
outgoing flowfile, it works.”

It works as they are strings, so not working would be a surprise. The question is: what are
their values? 😊

Regards,
Arpad

From: Tomislav Novosel <to.novosel@gmail.com<mailto:to.novosel@gmail.com>>
Reply-To: "users@nifi.apache.org<mailto:users@nifi.apache.org>" <users@nifi.apache.org<mailto:users@nifi.apache.org>>
Date: Wednesday, 30 January 2019 at 10:53
To: "users@nifi.apache.org<mailto:users@nifi.apache.org>" <users@nifi.apache.org<mailto:users@nifi.apache.org>>
Subject: Re: Modify Flowfile attributes

Hi Arpad,

that is exactly what I tried and date_final or date_file are applied to the attribute of outgoing
flowfile, it works.
But if I put to attribute week_att, there is error: week_att cannot be coerced as String,
and if I put str_week it gives me week number 44.

Tom

On Wed, 30 Jan 2019 at 08:40, Arpad Boda <aboda@hortonworks.com<mailto:aboda@hortonworks.com>>
wrote:
Tom,

The Python code to get the week number for a datetime string seems to be correct.

To help debugging could you stamp your “date_final” or “date_file” variable to an
attribute, so we could see what’s the input?
My gut feeling says there is some parsing magic going wrong here.

Regards,
Arpad

From: Tomislav Novosel <to.novosel@gmail.com<mailto:to.novosel@gmail.com>>
Reply-To: "users@nifi.apache.org<mailto:users@nifi.apache.org>" <users@nifi.apache.org<mailto:users@nifi.apache.org>>
Date: Tuesday, 29 January 2019 at 20:13
To: "users@nifi.apache.org<mailto:users@nifi.apache.org>" <users@nifi.apache.org<mailto:users@nifi.apache.org>>
Subject: Re: Modify Flowfile attributes

With following script I get week number 44 and year 118, which is strange result.
Week should be 1 and year 2019 for date 2018-31-12.
What is wrong here?

Tom

from datetime import datetime, timedelta, date

flowFile = session.get()
if (flowFile != None):
    file_name = flowFile.getAttribute('filename')

    date_file = file_name.split("_")[6]
    date_final = date_file.split(".")[0]
    date_obj = datetime.strptime(date_final,'%y%m%d')
    date_year = date_obj.year
    date_day = date_obj.day
    date_month = date_obj.month

    week_att = date(year=date_year, month=date_month, day=date_day).isocalendar()[1]
    year_att = date(year=date_year, month=date_month, day=date_day).isocalendar()[0]
    str_week = str(week_att)
    str_year = str(year_att)

    flowFile = session.putAttribute(flowFile, "year_extracted", str_year)
    flowFile = session.putAttribute(flowFile, "week_extracted", str_week)
    session.transfer(flowFile, REL_SUCCESS)
    session.commit()

On Tue, 29 Jan 2019 at 16:59, Tomislav Novosel <to.novosel@gmail.com<mailto:to.novosel@gmail.com>>
wrote:
Thank you all for answers. The reason why I want this to do with python script is wrong calculation
of week number from date. Nifi has that function in expression lang. (extracted_date:format("w",
<<time_zone>>)). My time zone is GMT+2.
If i set date, for example 20180819, and time zone in function GMT I get week number 34, which
is wrong. If I ommit time zone, I get week number 33, which is right. I'm not sure if thats
bug. You can test it for yourself, and if you do, please share your findings here, maybe I'm
doing something wrong.

On the other side, if I use python, I'more sure that I will get correct week number, even
for dates which overlaps with week number in next year(e.g. 20181231)

Since this calc will be in production, I need resilient workflow in the future without errors.

Regarding script I sent above, I'm getting error: "week cannot bo coerced as string". I checked
right on the beginning if the session is null or not.

On Tue, 29 Jan 2019, 16:26 Jerry Vinokurov <grapesmoker@gmail.com<mailto:grapesmoker@gmail.com>
wrote:
I wanted to add, since I've done this specific operation many times, that you can really just
do this via the NiFi expression language, which I think is more "idiomatic" than having ExecuteScript
processors all over the place. Basically, you would have an UpdateAttribute that set something
called, say, date_extracted with an expression that looks something like ${filename:substringAfterLast('_'):toDate('yyyy.MM.dd')}
(this is an approximation based on the above, modify as necessary for your purpose). Then
you could use a second UpdateAttribute to extract various information from this date with
the format command, e.g. ${date_extracted:format('<your format expression here>')}.
I don't think there's one for "week" but in general this is the approach I take when I need
to do date munging.

On Tue, Jan 29, 2019 at 10:06 AM Tomislav Novosel <to.novosel@gmail.com<mailto:to.novosel@gmail.com>>
wrote:
Hi Matt, thanks for suggestions. But performance is not crucial here.
This is code i tried. but I get error: "AttributeError: 'NoneType' object has no attribute
'getAttribute' at line number 4"
If I remove code from line 6 to line 14, it works with some default attribute values for year_extracted
and week_extracted, otherwise i get
error form above.

Tom

from datetime import datetime, timedelta, date

flowFile = session.get()
file_name = flowFile.getAttribute('filename')

date_file = file_name.split("_")[6]
date_final = date_file.split(".")[0]
date_obj = datetime.strptime(date_final,'%y%m%d')
date_year = date_obj.year
date_day = date_obj.day
date_month = date_obj.month

week = date(year=date_year, month=date_month, day=date_day).isocalendar()[1]
year = date(year=date_year, month=date_month, day=date_day).isocalendar()[0]

if (flowFile != None):
flowFile = session.putAttribute(flowFile, "year_extracted", year)
flowFile = session.putAttribute(flowFile, "week_extracted", week)
session.transfer(flowFile, REL_SUCCESS)
session.commit()

On Tue, 29 Jan 2019 at 15:53, Matt Burgess <mattyb149@apache.org<mailto:mattyb149@apache.org>>
wrote:
Tom,

Keep in mind that you are using Jython not Python, which I mention
only to point out that it is *much* slower than the native Java
processors such as UpdateAttribute, and slower than other scripting
engines such as Groovy or Javascript/Nashorn.

If performance/throughput is not a concern and you're more comfortable
with Jython, then Jerry's suggestion of session.putAttribute(flowFile,
attributeName, attributeValue) should do the trick. Note that if you
are adding more than a couple attributes, it's probably better to
create a dictionary (eventually/actually, a Java Map<String,String>)
of attribute name/value pairs, and use putAllAttributes(flowFile,
attributes) instead, as it is more performant.

Regards,
Matt

On Tue, Jan 29, 2019 at 9:25 AM Tomislav Novosel <to.novosel@gmail.com<mailto:to.novosel@gmail.com>>
wrote:
>
> Thanks for the answer.
>
> Yes I know I can handle that with Expression language and UpdateAttribute processor,
but this is specific case on my work and I think Python
> is better and more simple solution. I need to calc that with python script.
>
> Tom
>
> On Tue, 29 Jan 2019 at 15:18, John McGinn <amruginn-nifi@yahoo.com<mailto:amruginn-nifi@yahoo.com>>
wrote:
>>
>> Since you're script shows that "filename" is an attribute of your flowfile, you could
use the UpdateAttribute processor.
>>
>> If you right click on UpdateAttribute and choose ShowUsage, then choose Expression
Language Guide, it shows you the things you can handle.
>>
>> Something along the lines of ${filename:getDelimitedField(6,'_')}, if I understand
the Groovy code correctly. I did a GenerateFlowFIle to an UpdateAttribute processor setting
filename to "1_2_3_4_5_6.2_abc", then sent that to another UpdateAttribute with the getDelimitedField()
I listed and I received 6.2. Then another UpdateAttribute could parse the 6.2 for the second
substring, or you might be able to chain them in the existing UpdateProcessor.
>>
>>
>> --------------------------------------------
>> On Tue, 1/29/19, Tomislav Novosel <to.novosel@gmail.com<mailto:to.novosel@gmail.com>>
wrote:
>>
>>  Subject: Modify Flowfile attributes
>>  To: users@nifi.apache.org<mailto:users@nifi.apache.org>
>>  Date: Tuesday, January 29, 2019, 9:04 AM
>>
>>  Hi all,
>>  I'm trying to calculate week number and date
>>  from filename using ExecuteScript processor and Jython. Here
>>  is python script.How can I add calculated
>>  attributes week and year to flowfile?
>>  Please help, thank you.Tom
>>  P.S. Maybe I completely missed with this script.
>>  Feel free to correct me.
>>
>>  import
>>  jsonimport java.iofrom org.apache.commons.io<http://org.apache.commons.io>
import
>>  IOUtilsfrom java.nio.charset import
>>  StandardCharsetsfrom org.apache.nifi.processor.io<http://org.apache.nifi.processor.io>
import
>>  StreamCallbackfrom datetime import datetime, timedelta, date
>>  class PyStreamCallback(StreamCallback):
>>  def __init__(self, flowfile):
>>  self.ff = flowfile
>>         pass
>>  def process(self, inputStream, outputStream):
>>  file_name =
>>  self.ff.getAttribute("filename")
>>  date_file =
>>  file_name.split("_")[6]
>>  date_final =
>>  date_file.split(".")[0]
>>  date_obj =
>>  datetime.strptime(date_final,'%y%m%d')
>>  date_year =
>>  date_obj.year
>>    date_day =
>>  date_obj.day
>>   date_month =
>>  date_obj.month
>>          week = date(year=date_year, month=date_month, day=date_day).isocalendar()[1]
>>  year =
>>  date(year=date_year, month=date_month, day=date_day).isocalendar()[0]
>>  flowFile =
>>  session.get()if (flowFile != None):
>>  session.transfer(flowFile, REL_SUCCESS)
>>  session.commit()


--
http://www.google.com/profiles/grapesmoker
Mime
View raw message