jena-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andy Seaborne <a...@apache.org>
Subject Re: Consolidate "spillOnDiskSortingThreshold" and "spillOnDiskUpdateThreshold"?
Date Fri, 04 Nov 2011 14:55:41 GMT
On 01/11/11 21:08, Stephen Allen wrote:
> As a note, most database systems specify the memory size allowed on a per
> operator basis.
>
> PostgreSQL calls it "work_mem" [1], MySQL calls it "tmp_table_size" [2],
> and Oracle used to call it "sort_area_size", but now has a new setting
> called "pga_aggregate_target" [3].
>
> -Stephen
>
> [1] http://www.postgresql.org/docs/9.1/static/runtime-config-resource.html
> [2] http://dev.mysql.com/doc/refman/5.6/en/internal-temporary-tables.html
> [3]
> http://download.oracle.com/docs/cd/B28359_01/server.111/b28320/initparams232.htm
>
> On Tue, Nov 1, 2011 at 3:59 PM, Stephen Allen<sallen@apache.org>  wrote:
>
>> All,
>>
>> I am working on JENA-119, and wanted to get some feedback on an external
>> user-facing change.
>>
>> I'd like to consolidate the "spillOnDiskSortingThreshold",
>> "spillOnDiskUpdateThreshold", and any potential future
>> "spillOnDisk*Threshold" parameters into a single variable.  Separate
>> symbols for each operator does not seem to scale well, we could potentially
>> have about 10 different operations that would require a setting.  Also I
>> don't think that a user will really have a good notion of what to set it to.
>>
>> I propose the name "workCount" for the variable.  I picked this because it
>> captures the idea of storing that many items (mostly bindings) in memory as
>> a count.  In the future I think we would want something like "workMem" to
>> specify the amount of memory each operator can use rather than the count of
>> the items.  I have a mild aversion to "spillToDiskThreshold", as I think it
>> might focus too much on the implementation details, and does not indicate
>> what it's units of measurement are (count vs. memory size).  But I want to
>> know your opinions.  Since this is a user-facing change, we want to make
>> sure to get it right the first time, as it will be hard to change later.
>>
>> So two questions:
>> 1) Should I consolidate the parameters?

+1

>> 2) Is "workCount" a good name?

+0.75

I have no strong feeling but "spillCountThreshold" is more obvious to me 
because it says "spill" rather than general "work" (there is other work 
in the system!).  But "spillToDiskThreshold" also works for me and I'm 
sure that's coloured by seeing it's usage up to now.  The Javadoc 
documentation is more important anyway.


Are you planning on having different setting for each usage, each with 
it's own name, but defaulting to the setting of a common symbol?  I 
don't think many users will want control each one separately but leaving 
the possibility open (if it's not too much bother) would seem reasonable.

	Andy

>>
>> -Stephen
>>
>


Mime
View raw message