nifi-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tomislav Novosel <to.novo...@gmail.com>
Subject Re: Modify Flowfile attributes
Date Tue, 29 Jan 2019 19:05:27 GMT
With following script I get week number 44 and year 118, which is strange
result.
Week should be 1 and year 2019 for date 2018-31-12.
What is wrong here?

Tom

from datetime import datetime, timedelta, date

flowFile = session.get()
if (flowFile != None):
    file_name = flowFile.getAttribute('filename')

    date_file = file_name.split("_")[6]
    date_final = date_file.split(".")[0]
    date_obj = datetime.strptime(date_final,'%y%m%d')
    date_year = date_obj.year
    date_day = date_obj.day
    date_month = date_obj.month

    week_att = date(year=date_year, month=date_month,
day=date_day).isocalendar()[1]
    year_att = date(year=date_year, month=date_month,
day=date_day).isocalendar()[0]
    str_week = str(week_att)
    str_year = str(year_att)

    flowFile = session.putAttribute(flowFile, "year_extracted", str_year)
    flowFile = session.putAttribute(flowFile, "week_extracted", str_week)
    session.transfer(flowFile, REL_SUCCESS)
    session.commit()

On Tue, 29 Jan 2019 at 16:59, Tomislav Novosel <to.novosel@gmail.com> wrote:

> Thank you all for answers. The reason why I want this to do with python
> script is wrong calculation of week number from date. Nifi has that
> function in expression lang. (extracted_date:format("w", <<time_zone>>)).
> My time zone is GMT+2.
> If i set date, for example 20180819, and time zone in function GMT I get
> week number 34, which is wrong. If I ommit time zone, I get week number 33,
> which is right. I'm not sure if thats bug. You can test it for yourself,
> and if you do, please share your findings here, maybe I'm doing something
> wrong.
>
> On the other side, if I use python, I'more sure that I will get correct
> week number, even for dates which overlaps with week number in next
> year(e.g. 20181231)
>
> Since this calc will be in production, I need resilient workflow in the
> future without errors.
>
> Regarding script I sent above, I'm getting error: "week cannot bo coerced
> as string". I checked right on the beginning if the session is null or not.
>
> On Tue, 29 Jan 2019, 16:26 Jerry Vinokurov <grapesmoker@gmail.com wrote:
>
>> I wanted to add, since I've done this specific operation many times, that
>> you can really just do this via the NiFi expression language, which I think
>> is more "idiomatic" than having ExecuteScript processors all over the
>> place. Basically, you would have an UpdateAttribute that set something
>> called, say, date_extracted with an expression that looks something like
>> ${filename:substringAfterLast('_'):toDate('yyyy.MM.dd')} (this is an
>> approximation based on the above, modify as necessary for your purpose).
>> Then you could use a second UpdateAttribute to extract various information
>> from this date with the format command, e.g. ${date_extracted:format('<your
>> format expression here>')}. I don't think there's one for "week" but in
>> general this is the approach I take when I need to do date munging.
>>
>> On Tue, Jan 29, 2019 at 10:06 AM Tomislav Novosel <to.novosel@gmail.com>
>> wrote:
>>
>>> Hi Matt, thanks for suggestions. But performance is not crucial here.
>>> This is code i tried. but I get error: "AttributeError: 'NoneType'
>>> object has no attribute 'getAttribute' at line number 4"
>>> If I remove code from line 6 to line 14, it works with some default
>>> attribute values for year_extracted and week_extracted, otherwise i get
>>> error form above.
>>>
>>> Tom
>>>
>>> from datetime import datetime, timedelta, date
>>>
>>> flowFile = session.get()
>>> file_name = flowFile.getAttribute('filename')
>>>
>>> date_file = file_name.split("_")[6]
>>> date_final = date_file.split(".")[0]
>>> date_obj = datetime.strptime(date_final,'%y%m%d')
>>> date_year = date_obj.year
>>> date_day = date_obj.day
>>> date_month = date_obj.month
>>>
>>> week = date(year=date_year, month=date_month, day=date_day).isocalendar
>>> ()[1]
>>> year = date(year=date_year, month=date_month, day=date_day).isocalendar
>>> ()[0]
>>>
>>> if (flowFile != None):
>>> flowFile = session.putAttribute(flowFile, "year_extracted", year)
>>> flowFile = session.putAttribute(flowFile, "week_extracted", week)
>>> session.transfer(flowFile, REL_SUCCESS)
>>> session.commit()
>>>
>>> On Tue, 29 Jan 2019 at 15:53, Matt Burgess <mattyb149@apache.org> wrote:
>>>
>>>> Tom,
>>>>
>>>> Keep in mind that you are using Jython not Python, which I mention
>>>> only to point out that it is *much* slower than the native Java
>>>> processors such as UpdateAttribute, and slower than other scripting
>>>> engines such as Groovy or Javascript/Nashorn.
>>>>
>>>> If performance/throughput is not a concern and you're more comfortable
>>>> with Jython, then Jerry's suggestion of session.putAttribute(flowFile,
>>>> attributeName, attributeValue) should do the trick. Note that if you
>>>> are adding more than a couple attributes, it's probably better to
>>>> create a dictionary (eventually/actually, a Java Map<String,String>)
>>>> of attribute name/value pairs, and use putAllAttributes(flowFile,
>>>> attributes) instead, as it is more performant.
>>>>
>>>> Regards,
>>>> Matt
>>>>
>>>> On Tue, Jan 29, 2019 at 9:25 AM Tomislav Novosel <to.novosel@gmail.com>
>>>> wrote:
>>>> >
>>>> > Thanks for the answer.
>>>> >
>>>> > Yes I know I can handle that with Expression language and
>>>> UpdateAttribute processor, but this is specific case on my work and I think
>>>> Python
>>>> > is better and more simple solution. I need to calc that with python
>>>> script.
>>>> >
>>>> > Tom
>>>> >
>>>> > On Tue, 29 Jan 2019 at 15:18, John McGinn <amruginn-nifi@yahoo.com>
>>>> wrote:
>>>> >>
>>>> >> Since you're script shows that "filename" is an attribute of your
>>>> flowfile, you could use the UpdateAttribute processor.
>>>> >>
>>>> >> If you right click on UpdateAttribute and choose ShowUsage, then
>>>> choose Expression Language Guide, it shows you the things you can handle.
>>>> >>
>>>> >> Something along the lines of ${filename:getDelimitedField(6,'_')},
>>>> if I understand the Groovy code correctly. I did a GenerateFlowFIle to an
>>>> UpdateAttribute processor setting filename to "1_2_3_4_5_6.2_abc", then
>>>> sent that to another UpdateAttribute with the getDelimitedField() I listed
>>>> and I received 6.2. Then another UpdateAttribute could parse the 6.2 for
>>>> the second substring, or you might be able to chain them in the existing
>>>> UpdateProcessor.
>>>> >>
>>>> >>
>>>> >> --------------------------------------------
>>>> >> On Tue, 1/29/19, Tomislav Novosel <to.novosel@gmail.com> wrote:
>>>> >>
>>>> >>  Subject: Modify Flowfile attributes
>>>> >>  To: users@nifi.apache.org
>>>> >>  Date: Tuesday, January 29, 2019, 9:04 AM
>>>> >>
>>>> >>  Hi all,
>>>> >>  I'm trying to calculate week number and date
>>>> >>  from filename using ExecuteScript processor and Jython. Here
>>>> >>  is python script.How can I add calculated
>>>> >>  attributes week and year to flowfile?
>>>> >>  Please help, thank you.Tom
>>>> >>  P.S. Maybe I completely missed with this script.
>>>> >>  Feel free to correct me.
>>>> >>
>>>> >>  import
>>>> >>  jsonimport java.iofrom org.apache.commons.io import
>>>> >>  IOUtilsfrom java.nio.charset import
>>>> >>  StandardCharsetsfrom org.apache.nifi.processor.io import
>>>> >>  StreamCallbackfrom datetime import datetime, timedelta, date
>>>> >>  class PyStreamCallback(StreamCallback):
>>>> >>  def __init__(self, flowfile):
>>>> >>  self.ff = flowfile
>>>> >>         pass
>>>> >>  def process(self, inputStream, outputStream):
>>>> >>  file_name =
>>>> >>  self.ff.getAttribute("filename")
>>>> >>  date_file =
>>>> >>  file_name.split("_")[6]
>>>> >>  date_final =
>>>> >>  date_file.split(".")[0]
>>>> >>  date_obj =
>>>> >>  datetime.strptime(date_final,'%y%m%d')
>>>> >>  date_year =
>>>> >>  date_obj.year
>>>> >>    date_day =
>>>> >>  date_obj.day
>>>> >>   date_month =
>>>> >>  date_obj.month
>>>> >>          week = date(year=date_year, month=date_month,
>>>> day=date_day).isocalendar()[1]
>>>> >>  year =
>>>> >>  date(year=date_year, month=date_month,
>>>> day=date_day).isocalendar()[0]
>>>> >>  flowFile =
>>>> >>  session.get()if (flowFile != None):
>>>> >>  session.transfer(flowFile, REL_SUCCESS)
>>>> >>  session.commit()
>>>>
>>>
>>
>> --
>> http://www.google.com/profiles/grapesmoker
>>
>

Mime
View raw message