gora-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alexis (JIRA)" <j...@apache.org>
Subject [jira] Updated: (GORA-23) Limit result set in store reads
Date Wed, 26 Jan 2011 17:08:44 GMT

     [ https://issues.apache.org/jira/browse/GORA-23?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Alexis updated GORA-23:

    Attachment: mapred-site.xml

Thanks for your 2 valuable comments.

1. I agree. We could rather set the minimum to BUFFER_LIMIT_VALUE which is the default value
when the "gora.buffer.limit" property is not found. 2 is the minimum and way too low.

2.   I agree too.
- A separate class GoraRecordCounter has been created to count the records.
- we break down this gora.buffer.limit parameter into 2 to let the user set 2 distinct ones
(gora.buffer.read.limit and gora.buffer.write.limit).

See 2nd patch

> Limit result set in store reads
> -------------------------------
>                 Key: GORA-23
>                 URL: https://issues.apache.org/jira/browse/GORA-23
>             Project: Gora
>          Issue Type: Bug
>          Components: storage
>         Environment: MySQL
>            Reporter: Alexis
>         Attachments: gora.patch, gora.patch, mapred-site.xml
> Once again, whatever the capacity of our system, we have a limited amount of RAM. Sooner
or later, we will eventually run out of memory.
> Please refer to http://techvineyard.blogspot.com/2010/12/build-nutch-20.html#Gora for
the description of the issue:
> When using MySQL as Gora backend, with the parse command, the execution hangs then crashes
because it runs out of memory, because of this query:
> SELECT id,content,status,outlinks,baseUrl,typ,parseStatus,metadata,signature,markers
FROM webpage;
> We are running exactly into the same issue that GORA-20. Except that we are not writing
to the store, but reading it. Currently the code loads the entire webpage table into memory.
We want to set a limit to the system call that pulls data from the database.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message