nifi-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tomislav Novosel <to.novo...@gmail.com>
Subject Re: Modify Flowfile attributes
Date Wed, 30 Jan 2019 10:09:39 GMT
Yes, the values are correct. Attribute has value which is expected to be.
i.e. for date 181231 in filename I get value 18231 for attribute
week_extracted which is extracted from filename with split method.

Tom.

On Wed, 30 Jan 2019 at 10:59, Arpad Boda <aboda@hortonworks.com> wrote:

> Hi Tom,
>
>
>
> “that is exactly what I tried and date_final or date_file are applied to
> the attribute of outgoing flowfile, it works.”
>
>
>
> It works as they are strings, so not working would be a surprise. The
> question is: what are their values? 😊
>
>
>
> Regards,
>
> Arpad
>
>
>
> *From: *Tomislav Novosel <to.novosel@gmail.com>
> *Reply-To: *"users@nifi.apache.org" <users@nifi.apache.org>
> *Date: *Wednesday, 30 January 2019 at 10:53
> *To: *"users@nifi.apache.org" <users@nifi.apache.org>
> *Subject: *Re: Modify Flowfile attributes
>
>
>
> Hi Arpad,
>
>
>
> that is exactly what I tried and date_final or date_file are applied to
> the attribute of outgoing flowfile, it works.
>
> But if I put to attribute week_att, there is error: week_att cannot be
> coerced as String, and if I put str_week it gives me week number 44.
>
>
>
> Tom
>
>
>
> On Wed, 30 Jan 2019 at 08:40, Arpad Boda <aboda@hortonworks.com> wrote:
>
> Tom,
>
>
>
> The Python code to get the week number for a datetime string seems to be
> correct.
>
>
>
> To help debugging could you stamp your “date_final” or “date_file”
> variable to an attribute, so we could see what’s the input?
>
> My gut feeling says there is some parsing magic going wrong here.
>
>
>
> Regards,
>
> Arpad
>
>
>
> *From: *Tomislav Novosel <to.novosel@gmail.com>
> *Reply-To: *"users@nifi.apache.org" <users@nifi.apache.org>
> *Date: *Tuesday, 29 January 2019 at 20:13
> *To: *"users@nifi.apache.org" <users@nifi.apache.org>
> *Subject: *Re: Modify Flowfile attributes
>
>
>
> With following script I get week number 44 and year 118, which is strange
> result.
> Week should be 1 and year 2019 for date 2018-31-12.
>
> What is wrong here?
>
>
>
> Tom
>
>
>
> from datetime import datetime, timedelta, date
>
>
>
> flowFile = session.get()
>
> if (flowFile != None):
>
>     file_name = flowFile.getAttribute('filename')
>
>
>
>     date_file = file_name.split("_")[6]
>
>     date_final = date_file.split(".")[0]
>
>     date_obj = datetime.strptime(date_final,'%y%m%d')
>
>     date_year = date_obj.year
>
>     date_day = date_obj.day
>
>     date_month = date_obj.month
>
>
>
>     week_att = date(year=date_year, month=date_month,
> day=date_day).isocalendar()[1]
>
>     year_att = date(year=date_year, month=date_month,
> day=date_day).isocalendar()[0]
>
>     str_week = str(week_att)
>
>     str_year = str(year_att)
>
>
>
>     flowFile = session.putAttribute(flowFile, "year_extracted", str_year)
>
>     flowFile = session.putAttribute(flowFile, "week_extracted", str_week)
>
>     session.transfer(flowFile, REL_SUCCESS)
>
>     session.commit()
>
>
>
> On Tue, 29 Jan 2019 at 16:59, Tomislav Novosel <to.novosel@gmail.com>
> wrote:
>
> Thank you all for answers. The reason why I want this to do with python
> script is wrong calculation of week number from date. Nifi has that
> function in expression lang. (extracted_date:format("w", <<time_zone>>)).
> My time zone is GMT+2.
>
> If i set date, for example 20180819, and time zone in function GMT I get
> week number 34, which is wrong. If I ommit time zone, I get week number 33,
> which is right. I'm not sure if thats bug. You can test it for yourself,
> and if you do, please share your findings here, maybe I'm doing something
> wrong.
>
>
>
> On the other side, if I use python, I'more sure that I will get correct
> week number, even for dates which overlaps with week number in next
> year(e.g. 20181231)
>
>
>
> Since this calc will be in production, I need resilient workflow in the
> future without errors.
>
>
>
> Regarding script I sent above, I'm getting error: "week cannot bo coerced
> as string". I checked right on the beginning if the session is null or not.
>
>
>
> On Tue, 29 Jan 2019, 16:26 Jerry Vinokurov <grapesmoker@gmail.com wrote:
>
> I wanted to add, since I've done this specific operation many times, that
> you can really just do this via the NiFi expression language, which I think
> is more "idiomatic" than having ExecuteScript processors all over the
> place. Basically, you would have an UpdateAttribute that set something
> called, say, date_extracted with an expression that looks something like
> ${filename:substringAfterLast('_'):toDate('yyyy.MM.dd')} (this is an
> approximation based on the above, modify as necessary for your purpose).
> Then you could use a second UpdateAttribute to extract various information
> from this date with the format command, e.g. ${date_extracted:format('<your
> format expression here>')}. I don't think there's one for "week" but in
> general this is the approach I take when I need to do date munging.
>
>
>
> On Tue, Jan 29, 2019 at 10:06 AM Tomislav Novosel <to.novosel@gmail.com>
> wrote:
>
> Hi Matt, thanks for suggestions. But performance is not crucial here.
>
> This is code i tried. but I get error: "AttributeError: 'NoneType' object
> has no attribute 'getAttribute' at line number 4"
>
> If I remove code from line 6 to line 14, it works with some default
> attribute values for year_extracted and week_extracted, otherwise i get
>
> error form above.
>
>
>
> Tom
>
>
>
> from datetime import datetime, timedelta, date
>
>
>
> flowFile = session.get()
>
> file_name = flowFile.getAttribute('filename')
>
>
>
> date_file = file_name.split("_")[6]
>
> date_final = date_file.split(".")[0]
>
> date_obj = datetime.strptime(date_final,'%y%m%d')
>
> date_year = date_obj.year
>
> date_day = date_obj.day
>
> date_month = date_obj.month
>
>
>
> week = date(year=date_year, month=date_month, day=date_day).isocalendar()[
> 1]
>
> year = date(year=date_year, month=date_month, day=date_day).isocalendar()[
> 0]
>
>
>
> if (flowFile != None):
>
> flowFile = session.putAttribute(flowFile, "year_extracted", year)
>
> flowFile = session.putAttribute(flowFile, "week_extracted", week)
>
> session.transfer(flowFile, REL_SUCCESS)
>
> session.commit()
>
>
>
> On Tue, 29 Jan 2019 at 15:53, Matt Burgess <mattyb149@apache.org> wrote:
>
> Tom,
>
> Keep in mind that you are using Jython not Python, which I mention
> only to point out that it is *much* slower than the native Java
> processors such as UpdateAttribute, and slower than other scripting
> engines such as Groovy or Javascript/Nashorn.
>
> If performance/throughput is not a concern and you're more comfortable
> with Jython, then Jerry's suggestion of session.putAttribute(flowFile,
> attributeName, attributeValue) should do the trick. Note that if you
> are adding more than a couple attributes, it's probably better to
> create a dictionary (eventually/actually, a Java Map<String,String>)
> of attribute name/value pairs, and use putAllAttributes(flowFile,
> attributes) instead, as it is more performant.
>
> Regards,
> Matt
>
> On Tue, Jan 29, 2019 at 9:25 AM Tomislav Novosel <to.novosel@gmail.com>
> wrote:
> >
> > Thanks for the answer.
> >
> > Yes I know I can handle that with Expression language and
> UpdateAttribute processor, but this is specific case on my work and I think
> Python
> > is better and more simple solution. I need to calc that with python
> script.
> >
> > Tom
> >
> > On Tue, 29 Jan 2019 at 15:18, John McGinn <amruginn-nifi@yahoo.com>
> wrote:
> >>
> >> Since you're script shows that "filename" is an attribute of your
> flowfile, you could use the UpdateAttribute processor.
> >>
> >> If you right click on UpdateAttribute and choose ShowUsage, then choose
> Expression Language Guide, it shows you the things you can handle.
> >>
> >> Something along the lines of ${filename:getDelimitedField(6,'_')}, if I
> understand the Groovy code correctly. I did a GenerateFlowFIle to an
> UpdateAttribute processor setting filename to "1_2_3_4_5_6.2_abc", then
> sent that to another UpdateAttribute with the getDelimitedField() I listed
> and I received 6.2. Then another UpdateAttribute could parse the 6.2 for
> the second substring, or you might be able to chain them in the existing
> UpdateProcessor.
> >>
> >>
> >> --------------------------------------------
> >> On Tue, 1/29/19, Tomislav Novosel <to.novosel@gmail.com> wrote:
> >>
> >>  Subject: Modify Flowfile attributes
> >>  To: users@nifi.apache.org
> >>  Date: Tuesday, January 29, 2019, 9:04 AM
> >>
> >>  Hi all,
> >>  I'm trying to calculate week number and date
> >>  from filename using ExecuteScript processor and Jython. Here
> >>  is python script.How can I add calculated
> >>  attributes week and year to flowfile?
> >>  Please help, thank you.Tom
> >>  P.S. Maybe I completely missed with this script.
> >>  Feel free to correct me.
> >>
> >>  import
> >>  jsonimport java.iofrom org.apache.commons.io import
> >>  IOUtilsfrom java.nio.charset import
> >>  StandardCharsetsfrom org.apache.nifi.processor.io import
> >>  StreamCallbackfrom datetime import datetime, timedelta, date
> >>  class PyStreamCallback(StreamCallback):
> >>  def __init__(self, flowfile):
> >>  self.ff = flowfile
> >>         pass
> >>  def process(self, inputStream, outputStream):
> >>  file_name =
> >>  self.ff.getAttribute("filename")
> >>  date_file =
> >>  file_name.split("_")[6]
> >>  date_final =
> >>  date_file.split(".")[0]
> >>  date_obj =
> >>  datetime.strptime(date_final,'%y%m%d')
> >>  date_year =
> >>  date_obj.year
> >>    date_day =
> >>  date_obj.day
> >>   date_month =
> >>  date_obj.month
> >>          week = date(year=date_year, month=date_month,
> day=date_day).isocalendar()[1]
> >>  year =
> >>  date(year=date_year, month=date_month, day=date_day).isocalendar()[0]
> >>  flowFile =
> >>  session.get()if (flowFile != None):
> >>  session.transfer(flowFile, REL_SUCCESS)
> >>  session.commit()
>
>
>
> --
>
> http://www.google.com/profiles/grapesmoker
>
>

Mime
View raw message