spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sean Owen <sro...@gmail.com>
Subject Re: Using Lambda function to generate random data in PySpark throws not defined error
Date Fri, 11 Dec 2020 16:46:51 GMT
Looks like a simple Python error - you haven't shown the code that produces
it. Indeed, I suspect you'll find there is no such symbol.

On Fri, Dec 11, 2020 at 9:09 AM Mich Talebzadeh <mich.talebzadeh@gmail.com>
wrote:

> Hi,
>
> This used to work but not anymore.
>
> I have UsedFunctions.py file that has these functions
>
> import random
> import string
> import math
>
> def randomString(length):
>     letters = string.ascii_letters
>     result_str = ''.join(random.choice(letters) for i in range(length))
>     return result_str
>
> def clustered(x,numRows):
>     return math.floor(x -1)/numRows
>
> def scattered(x,numRows):
>     return abs((x -1 % numRows))* 1.0
>
> def randomised(seed,numRows):
>     random.seed(seed)
>     return abs(random.randint(0, numRows) % numRows) * 1.0
>
> def padString(x,chars,length):
>     n = int(math.log10(x) + 1)
>     result_str = ''.join(random.choice(chars) for i in range(length-n)) + str(x)
>     return result_str
>
> def padSingleChar(chars,length):
>     result_str = ''.join(chars for i in range(length))
>     return result_str
>
> def println(lst):
>     for ll in lst:
>       print(ll[0])
>
> Now in the main().py module I import this file as follows:
>
> import UsedFunctions as uf
>
> Then I try the following
>
> import UsedFunctions as uf
>
>  numRows = 100000   ## do in increment of 100K rows
>  rdd = sc.parallelize(Range). \
>            map(lambda x: (x, uf.clustered(x, numRows), \
>                              uf.scattered(x,10000), \
>                              uf.randomised(x,10000), \
>                              uf.randomString(50), \
>                              uf.padString(x," ",50), \
>                              uf.padSingleChar("x",4000)))
> The problem is that now it throws error for numRows as below
>
>
>   File "C:/Users/admin/PycharmProjects/pythonProject2/pilot/src/main.py",
> line 101, in <lambda>
>     map(lambda x: (x, uf.clustered(x, numRows), \
> NameError: name 'numRows' is not defined
>
> I don't know why this error is coming!
>
> Appreciate any ideas
>
> Thanks,
>
> Mich
>
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>

Mime
View raw message