nifi-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tomislav Novosel <to.novo...@gmail.com>
Subject Re: Modify Flowfile attributes
Date Wed, 30 Jan 2019 11:20:29 GMT
Arpad,

I tried to pass variables date_year, date_day and date_month to outgoing
flowfile and I get unexpected values.
For day I get 1, for year 118 and for month 11.
And that gives week number 44 and year 118 according to my code.

It is strange that my code works as expected on your machine. I use Nifi
1.7.1

Regards,
Tom

On Wed, 30 Jan 2019 at 11:25, Arpad Boda <aboda@hortonworks.com> wrote:

> Tom,
>
>
>
> Could you use logattribute processor and somehow log the value of your
> “date_final” variables?
>
>
>
> Tested your code with Jpython, with input string “181231” it works as
> expected (the result is 1st week of 2019).
>
>
>
> *From: *Tomislav Novosel <to.novosel@gmail.com>
> *Reply-To: *"users@nifi.apache.org" <users@nifi.apache.org>
> *Date: *Wednesday, 30 January 2019 at 11:10
> *To: *"users@nifi.apache.org" <users@nifi.apache.org>
> *Subject: *Re: Modify Flowfile attributes
>
>
>
> Yes, the values are correct. Attribute has value which is expected to be.
>
> i.e. for date 181231 in filename I get value 18231 for attribute
> week_extracted which is extracted from filename with split method.
>
>
>
> Tom.
>
>
>
> On Wed, 30 Jan 2019 at 10:59, Arpad Boda <aboda@hortonworks.com> wrote:
>
> Hi Tom,
>
>
>
> “that is exactly what I tried and date_final or date_file are applied to
> the attribute of outgoing flowfile, it works.”
>
>
>
> It works as they are strings, so not working would be a surprise. The
> question is: what are their values? 😊
>
>
>
> Regards,
>
> Arpad
>
>
>
> *From: *Tomislav Novosel <to.novosel@gmail.com>
> *Reply-To: *"users@nifi.apache.org" <users@nifi.apache.org>
> *Date: *Wednesday, 30 January 2019 at 10:53
> *To: *"users@nifi.apache.org" <users@nifi.apache.org>
> *Subject: *Re: Modify Flowfile attributes
>
>
>
> Hi Arpad,
>
>
>
> that is exactly what I tried and date_final or date_file are applied to
> the attribute of outgoing flowfile, it works.
>
> But if I put to attribute week_att, there is error: week_att cannot be
> coerced as String, and if I put str_week it gives me week number 44.
>
>
>
> Tom
>
>
>
> On Wed, 30 Jan 2019 at 08:40, Arpad Boda <aboda@hortonworks.com> wrote:
>
> Tom,
>
>
>
> The Python code to get the week number for a datetime string seems to be
> correct.
>
>
>
> To help debugging could you stamp your “date_final” or “date_file”
> variable to an attribute, so we could see what’s the input?
>
> My gut feeling says there is some parsing magic going wrong here.
>
>
>
> Regards,
>
> Arpad
>
>
>
> *From: *Tomislav Novosel <to.novosel@gmail.com>
> *Reply-To: *"users@nifi.apache.org" <users@nifi.apache.org>
> *Date: *Tuesday, 29 January 2019 at 20:13
> *To: *"users@nifi.apache.org" <users@nifi.apache.org>
> *Subject: *Re: Modify Flowfile attributes
>
>
>
> With following script I get week number 44 and year 118, which is strange
> result.
> Week should be 1 and year 2019 for date 2018-31-12.
>
> What is wrong here?
>
>
>
> Tom
>
>
>
> from datetime import datetime, timedelta, date
>
>
>
> flowFile = session.get()
>
> if (flowFile != None):
>
>     file_name = flowFile.getAttribute('filename')
>
>
>
>     date_file = file_name.split("_")[6]
>
>     date_final = date_file.split(".")[0]
>
>     date_obj = datetime.strptime(date_final,'%y%m%d')
>
>     date_year = date_obj.year
>
>     date_day = date_obj.day
>
>     date_month = date_obj.month
>
>
>
>     week_att = date(year=date_year, month=date_month,
> day=date_day).isocalendar()[1]
>
>     year_att = date(year=date_year, month=date_month,
> day=date_day).isocalendar()[0]
>
>     str_week = str(week_att)
>
>     str_year = str(year_att)
>
>
>
>     flowFile = session.putAttribute(flowFile, "year_extracted", str_year)
>
>     flowFile = session.putAttribute(flowFile, "week_extracted", str_week)
>
>     session.transfer(flowFile, REL_SUCCESS)
>
>     session.commit()
>
>
>
> On Tue, 29 Jan 2019 at 16:59, Tomislav Novosel <to.novosel@gmail.com>
> wrote:
>
> Thank you all for answers. The reason why I want this to do with python
> script is wrong calculation of week number from date. Nifi has that
> function in expression lang. (extracted_date:format("w", <<time_zone>>)).
> My time zone is GMT+2.
>
> If i set date, for example 20180819, and time zone in function GMT I get
> week number 34, which is wrong. If I ommit time zone, I get week number 33,
> which is right. I'm not sure if thats bug. You can test it for yourself,
> and if you do, please share your findings here, maybe I'm doing something
> wrong.
>
>
>
> On the other side, if I use python, I'more sure that I will get correct
> week number, even for dates which overlaps with week number in next
> year(e.g. 20181231)
>
>
>
> Since this calc will be in production, I need resilient workflow in the
> future without errors.
>
>
>
> Regarding script I sent above, I'm getting error: "week cannot bo coerced
> as string". I checked right on the beginning if the session is null or not.
>
>
>
> On Tue, 29 Jan 2019, 16:26 Jerry Vinokurov <grapesmoker@gmail.com wrote:
>
> I wanted to add, since I've done this specific operation many times, that
> you can really just do this via the NiFi expression language, which I think
> is more "idiomatic" than having ExecuteScript processors all over the
> place. Basically, you would have an UpdateAttribute that set something
> called, say, date_extracted with an expression that looks something like
> ${filename:substringAfterLast('_'):toDate('yyyy.MM.dd')} (this is an
> approximation based on the above, modify as necessary for your purpose).
> Then you could use a second UpdateAttribute to extract various information
> from this date with the format command, e.g. ${date_extracted:format('<your
> format expression here>')}. I don't think there's one for "week" but in
> general this is the approach I take when I need to do date munging.
>
>
>
> On Tue, Jan 29, 2019 at 10:06 AM Tomislav Novosel <to.novosel@gmail.com>
> wrote:
>
> Hi Matt, thanks for suggestions. But performance is not crucial here.
>
> This is code i tried. but I get error: "AttributeError: 'NoneType' object
> has no attribute 'getAttribute' at line number 4"
>
> If I remove code from line 6 to line 14, it works with some default
> attribute values for year_extracted and week_extracted, otherwise i get
>
> error form above.
>
>
>
> Tom
>
>
>
> from datetime import datetime, timedelta, date
>
>
>
> flowFile = session.get()
>
> file_name = flowFile.getAttribute('filename')
>
>
>
> date_file = file_name.split("_")[6]
>
> date_final = date_file.split(".")[0]
>
> date_obj = datetime.strptime(date_final,'%y%m%d')
>
> date_year = date_obj.year
>
> date_day = date_obj.day
>
> date_month = date_obj.month
>
>
>
> week = date(year=date_year, month=date_month, day=date_day).isocalendar()[
> 1]
>
> year = date(year=date_year, month=date_month, day=date_day).isocalendar()[
> 0]
>
>
>
> if (flowFile != None):
>
> flowFile = session.putAttribute(flowFile, "year_extracted", year)
>
> flowFile = session.putAttribute(flowFile, "week_extracted", week)
>
> session.transfer(flowFile, REL_SUCCESS)
>
> session.commit()
>
>
>
> On Tue, 29 Jan 2019 at 15:53, Matt Burgess <mattyb149@apache.org> wrote:
>
> Tom,
>
> Keep in mind that you are using Jython not Python, which I mention
> only to point out that it is *much* slower than the native Java
> processors such as UpdateAttribute, and slower than other scripting
> engines such as Groovy or Javascript/Nashorn.
>
> If performance/throughput is not a concern and you're more comfortable
> with Jython, then Jerry's suggestion of session.putAttribute(flowFile,
> attributeName, attributeValue) should do the trick. Note that if you
> are adding more than a couple attributes, it's probably better to
> create a dictionary (eventually/actually, a Java Map<String,String>)
> of attribute name/value pairs, and use putAllAttributes(flowFile,
> attributes) instead, as it is more performant.
>
> Regards,
> Matt
>
> On Tue, Jan 29, 2019 at 9:25 AM Tomislav Novosel <to.novosel@gmail.com>
> wrote:
> >
> > Thanks for the answer.
> >
> > Yes I know I can handle that with Expression language and
> UpdateAttribute processor, but this is specific case on my work and I think
> Python
> > is better and more simple solution. I need to calc that with python
> script.
> >
> > Tom
> >
> > On Tue, 29 Jan 2019 at 15:18, John McGinn <amruginn-nifi@yahoo.com>
> wrote:
> >>
> >> Since you're script shows that "filename" is an attribute of your
> flowfile, you could use the UpdateAttribute processor.
> >>
> >> If you right click on UpdateAttribute and choose ShowUsage, then choose
> Expression Language Guide, it shows you the things you can handle.
> >>
> >> Something along the lines of ${filename:getDelimitedField(6,'_')}, if I
> understand the Groovy code correctly. I did a GenerateFlowFIle to an
> UpdateAttribute processor setting filename to "1_2_3_4_5_6.2_abc", then
> sent that to another UpdateAttribute with the getDelimitedField() I listed
> and I received 6.2. Then another UpdateAttribute could parse the 6.2 for
> the second substring, or you might be able to chain them in the existing
> UpdateProcessor.
> >>
> >>
> >> --------------------------------------------
> >> On Tue, 1/29/19, Tomislav Novosel <to.novosel@gmail.com> wrote:
> >>
> >>  Subject: Modify Flowfile attributes
> >>  To: users@nifi.apache.org
> >>  Date: Tuesday, January 29, 2019, 9:04 AM
> >>
> >>  Hi all,
> >>  I'm trying to calculate week number and date
> >>  from filename using ExecuteScript processor and Jython. Here
> >>  is python script.How can I add calculated
> >>  attributes week and year to flowfile?
> >>  Please help, thank you.Tom
> >>  P.S. Maybe I completely missed with this script.
> >>  Feel free to correct me.
> >>
> >>  import
> >>  jsonimport java.iofrom org.apache.commons.io import
> >>  IOUtilsfrom java.nio.charset import
> >>  StandardCharsetsfrom org.apache.nifi.processor.io import
> >>  StreamCallbackfrom datetime import datetime, timedelta, date
> >>  class PyStreamCallback(StreamCallback):
> >>  def __init__(self, flowfile):
> >>  self.ff = flowfile
> >>         pass
> >>  def process(self, inputStream, outputStream):
> >>  file_name =
> >>  self.ff.getAttribute("filename")
> >>  date_file =
> >>  file_name.split("_")[6]
> >>  date_final =
> >>  date_file.split(".")[0]
> >>  date_obj =
> >>  datetime.strptime(date_final,'%y%m%d')
> >>  date_year =
> >>  date_obj.year
> >>    date_day =
> >>  date_obj.day
> >>   date_month =
> >>  date_obj.month
> >>          week = date(year=date_year, month=date_month,
> day=date_day).isocalendar()[1]
> >>  year =
> >>  date(year=date_year, month=date_month, day=date_day).isocalendar()[0]
> >>  flowFile =
> >>  session.get()if (flowFile != None):
> >>  session.transfer(flowFile, REL_SUCCESS)
> >>  session.commit()
>
>
>
> --
>
> http://www.google.com/profiles/grapesmoker
>
>

Mime
View raw message