nifi-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tomislav Novosel <to.novo...@gmail.com>
Subject Re: Modify Flowfile attributes
Date Tue, 29 Jan 2019 15:59:02 GMT
Thank you all for answers. The reason why I want this to do with python
script is wrong calculation of week number from date. Nifi has that
function in expression lang. (extracted_date:format("w", <<time_zone>>)).
My time zone is GMT+2.
If i set date, for example 20180819, and time zone in function GMT I get
week number 34, which is wrong. If I ommit time zone, I get week number 33,
which is right. I'm not sure if thats bug. You can test it for yourself,
and if you do, please share your findings here, maybe I'm doing something
wrong.

On the other side, if I use python, I'more sure that I will get correct
week number, even for dates which overlaps with week number in next
year(e.g. 20181231)

Since this calc will be in production, I need resilient workflow in the
future without errors.

Regarding script I sent above, I'm getting error: "week cannot bo coerced
as string". I checked right on the beginning if the session is null or not.

On Tue, 29 Jan 2019, 16:26 Jerry Vinokurov <grapesmoker@gmail.com wrote:

> I wanted to add, since I've done this specific operation many times, that
> you can really just do this via the NiFi expression language, which I think
> is more "idiomatic" than having ExecuteScript processors all over the
> place. Basically, you would have an UpdateAttribute that set something
> called, say, date_extracted with an expression that looks something like
> ${filename:substringAfterLast('_'):toDate('yyyy.MM.dd')} (this is an
> approximation based on the above, modify as necessary for your purpose).
> Then you could use a second UpdateAttribute to extract various information
> from this date with the format command, e.g. ${date_extracted:format('<your
> format expression here>')}. I don't think there's one for "week" but in
> general this is the approach I take when I need to do date munging.
>
> On Tue, Jan 29, 2019 at 10:06 AM Tomislav Novosel <to.novosel@gmail.com>
> wrote:
>
>> Hi Matt, thanks for suggestions. But performance is not crucial here.
>> This is code i tried. but I get error: "AttributeError: 'NoneType' object
>> has no attribute 'getAttribute' at line number 4"
>> If I remove code from line 6 to line 14, it works with some default
>> attribute values for year_extracted and week_extracted, otherwise i get
>> error form above.
>>
>> Tom
>>
>> from datetime import datetime, timedelta, date
>>
>> flowFile = session.get()
>> file_name = flowFile.getAttribute('filename')
>>
>> date_file = file_name.split("_")[6]
>> date_final = date_file.split(".")[0]
>> date_obj = datetime.strptime(date_final,'%y%m%d')
>> date_year = date_obj.year
>> date_day = date_obj.day
>> date_month = date_obj.month
>>
>> week = date(year=date_year, month=date_month, day=date_day).isocalendar
>> ()[1]
>> year = date(year=date_year, month=date_month, day=date_day).isocalendar
>> ()[0]
>>
>> if (flowFile != None):
>> flowFile = session.putAttribute(flowFile, "year_extracted", year)
>> flowFile = session.putAttribute(flowFile, "week_extracted", week)
>> session.transfer(flowFile, REL_SUCCESS)
>> session.commit()
>>
>> On Tue, 29 Jan 2019 at 15:53, Matt Burgess <mattyb149@apache.org> wrote:
>>
>>> Tom,
>>>
>>> Keep in mind that you are using Jython not Python, which I mention
>>> only to point out that it is *much* slower than the native Java
>>> processors such as UpdateAttribute, and slower than other scripting
>>> engines such as Groovy or Javascript/Nashorn.
>>>
>>> If performance/throughput is not a concern and you're more comfortable
>>> with Jython, then Jerry's suggestion of session.putAttribute(flowFile,
>>> attributeName, attributeValue) should do the trick. Note that if you
>>> are adding more than a couple attributes, it's probably better to
>>> create a dictionary (eventually/actually, a Java Map<String,String>)
>>> of attribute name/value pairs, and use putAllAttributes(flowFile,
>>> attributes) instead, as it is more performant.
>>>
>>> Regards,
>>> Matt
>>>
>>> On Tue, Jan 29, 2019 at 9:25 AM Tomislav Novosel <to.novosel@gmail.com>
>>> wrote:
>>> >
>>> > Thanks for the answer.
>>> >
>>> > Yes I know I can handle that with Expression language and
>>> UpdateAttribute processor, but this is specific case on my work and I think
>>> Python
>>> > is better and more simple solution. I need to calc that with python
>>> script.
>>> >
>>> > Tom
>>> >
>>> > On Tue, 29 Jan 2019 at 15:18, John McGinn <amruginn-nifi@yahoo.com>
>>> wrote:
>>> >>
>>> >> Since you're script shows that "filename" is an attribute of your
>>> flowfile, you could use the UpdateAttribute processor.
>>> >>
>>> >> If you right click on UpdateAttribute and choose ShowUsage, then
>>> choose Expression Language Guide, it shows you the things you can handle.
>>> >>
>>> >> Something along the lines of ${filename:getDelimitedField(6,'_')}, if
>>> I understand the Groovy code correctly. I did a GenerateFlowFIle to an
>>> UpdateAttribute processor setting filename to "1_2_3_4_5_6.2_abc", then
>>> sent that to another UpdateAttribute with the getDelimitedField() I listed
>>> and I received 6.2. Then another UpdateAttribute could parse the 6.2 for
>>> the second substring, or you might be able to chain them in the existing
>>> UpdateProcessor.
>>> >>
>>> >>
>>> >> --------------------------------------------
>>> >> On Tue, 1/29/19, Tomislav Novosel <to.novosel@gmail.com> wrote:
>>> >>
>>> >>  Subject: Modify Flowfile attributes
>>> >>  To: users@nifi.apache.org
>>> >>  Date: Tuesday, January 29, 2019, 9:04 AM
>>> >>
>>> >>  Hi all,
>>> >>  I'm trying to calculate week number and date
>>> >>  from filename using ExecuteScript processor and Jython. Here
>>> >>  is python script.How can I add calculated
>>> >>  attributes week and year to flowfile?
>>> >>  Please help, thank you.Tom
>>> >>  P.S. Maybe I completely missed with this script.
>>> >>  Feel free to correct me.
>>> >>
>>> >>  import
>>> >>  jsonimport java.iofrom org.apache.commons.io import
>>> >>  IOUtilsfrom java.nio.charset import
>>> >>  StandardCharsetsfrom org.apache.nifi.processor.io import
>>> >>  StreamCallbackfrom datetime import datetime, timedelta, date
>>> >>  class PyStreamCallback(StreamCallback):
>>> >>  def __init__(self, flowfile):
>>> >>  self.ff = flowfile
>>> >>         pass
>>> >>  def process(self, inputStream, outputStream):
>>> >>  file_name =
>>> >>  self.ff.getAttribute("filename")
>>> >>  date_file =
>>> >>  file_name.split("_")[6]
>>> >>  date_final =
>>> >>  date_file.split(".")[0]
>>> >>  date_obj =
>>> >>  datetime.strptime(date_final,'%y%m%d')
>>> >>  date_year =
>>> >>  date_obj.year
>>> >>    date_day =
>>> >>  date_obj.day
>>> >>   date_month =
>>> >>  date_obj.month
>>> >>          week = date(year=date_year, month=date_month,
>>> day=date_day).isocalendar()[1]
>>> >>  year =
>>> >>  date(year=date_year, month=date_month, day=date_day).isocalendar()[0]
>>> >>  flowFile =
>>> >>  session.get()if (flowFile != None):
>>> >>  session.transfer(flowFile, REL_SUCCESS)
>>> >>  session.commit()
>>>
>>
>
> --
> http://www.google.com/profiles/grapesmoker
>

Mime
View raw message