asterixdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Wail Alkowaileet <wael....@gmail.com>
Subject Does the limit clause skew the results to a single NC?
Date Mon, 30 Nov 2015 13:55:37 GMT
Hi Team,

I noticed a weird behavior when executing an AQL with the limit clause
(LIMIT 100000)
I get an exception in one NC: java.lang.OutOfMemoryError
while the others seem to operate normally.

my -Xmx configurations are the default:
nc.java.opts                             :-Xmx1536m
cc.java.opts                             :-Xmx1024m

Here is the story:

I have a dataset for publications. The data contains huge nested and
heterogenous records.
Therefore, the specified type contains only a unique ID.

create type wosType as open
{
UID:string
}

After loading the data, I want to extract all the authors names (first and
last). However, the authors details for each publications is *heterogenous*.
if there is only one author (i.e no co-authors), the type of field "name"
is a JSON object, ordered list o.w

So I did the following (excuse the ugliness of my AQL):

-----------------------------
use dataverse wosDataverse

*//Get name details for single-authors*
let $noCoAuth := (for $x in dataset wos
let $summary := $x.static_data.summary
let $names := $summary.names
where $names.count = "1"
return {
"firstName":$names.name.first_name,
"lastName":$names.name.last_name
}
)

*//Generate a list of names for all co-authors*
let $coAuthList := (for $x in dataset wos
let $summary := $x.static_data.summary
let $names := $summary.names
where $names.count != "1"
return $names.name
)

*//Flatten the co-authors name list*
let $coAuth := (for $x in $coAuthList
for $y in $x
return {"firstName":$y.first_name,"lastName":$y.last_name})

//print all authors.
let $res := (for $t in  [$coAuth,$noCoAuth]
limit 100
return $t)

return $res
-----------------------------


This query couldn't be executed due to frame size limit:

Unable to allocate frame larger than:255 bytes [HyracksDataException]

So..
I limited the number of the results as such:

-----------------------------
use dataverse wosDataverse
let $noCoAuth := (for $x in dataset wos
let $summary := $x.static_data.summary
let $names := $summary.names
where $names.count = "1"
*limit 100000*
return {
"firstName":$names.name.first_name,
"lastName":$names.name.last_name
}
)

let $coAuthList := (for $x in dataset wos
let $summary := $x.static_data.summary
let $names := $summary.names
where $names.count != "1"
return $names.name
)

let $coAuth := (for $x in $coAuthList
for $y in $x
*limit 100000*
return {"firstName":$y.first_name,"lastName":$y.last_name})


let $res := (for $t in [$coAuth, $noCoAuth]
limit 100
return $t)

return $res
-----------------------------

Once I execute the previous AQL, one node (different one in each run)
reaches *400%* cpu-load (4-cores) and swallows up all the available memory
it can get.


For smaller result (e.g. limit 10000), it works fine.


Thanks and sorry for the long email.
-- 

*Regards,*
Wail Alkowaileet

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message