jena-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Stephen Allen" <sal...@bbn.com>
Subject RE: Consolidate "spillOnDiskSortingThreshold" and "spillOnDiskUpdateThreshold"?
Date Fri, 04 Nov 2011 17:08:00 GMT


> -----Original Message-----
> From: Andy Seaborne [mailto:andy@apache.org]
> Sent: Friday, November 04, 2011 10:56 AM
> To: jena-dev@incubator.apache.org
> Subject: Re: Consolidate "spillOnDiskSortingThreshold" and
> "spillOnDiskUpdateThreshold"?
> 
> On 01/11/11 21:08, Stephen Allen wrote:
> > As a note, most database systems specify the memory size allowed on a
> > per operator basis.
> >
> > PostgreSQL calls it "work_mem" [1], MySQL calls it "tmp_table_size"
> > [2], and Oracle used to call it "sort_area_size", but now has a new
> > setting called "pga_aggregate_target" [3].
> >
> > -Stephen
> >
> > [1]
> > http://www.postgresql.org/docs/9.1/static/runtime-config-
> resource.html
> > [2]
> > http://dev.mysql.com/doc/refman/5.6/en/internal-temporary-tables.html
> > [3]
> >
> http://download.oracle.com/docs/cd/B28359_01/server.111/b28320/initpar
> > ams232.htm
> >
> > On Tue, Nov 1, 2011 at 3:59 PM, Stephen Allen<sallen@apache.org>
> wrote:
> >
> >> All,
> >>
> >> I am working on JENA-119, and wanted to get some feedback on an
> >> external user-facing change.
> >>
> >> I'd like to consolidate the "spillOnDiskSortingThreshold",
> >> "spillOnDiskUpdateThreshold", and any potential future
> >> "spillOnDisk*Threshold" parameters into a single variable.  Separate
> >> symbols for each operator does not seem to scale well, we could
> >> potentially have about 10 different operations that would require a
> >> setting.  Also I don't think that a user will really have a good
> notion of what to set it to.
> >>
> >> I propose the name "workCount" for the variable.  I picked this
> >> because it captures the idea of storing that many items (mostly
> >> bindings) in memory as a count.  In the future I think we would want
> >> something like "workMem" to specify the amount of memory each
> >> operator can use rather than the count of the items.  I have a mild
> >> aversion to "spillToDiskThreshold", as I think it might focus too
> >> much on the implementation details, and does not indicate what it's
> >> units of measurement are (count vs. memory size).  But I want to
> know
> >> your opinions.  Since this is a user-facing change, we want to make
> sure to get it right the first time, as it will be hard to change
> later.
> >>
> >> So two questions:
> >> 1) Should I consolidate the parameters?
> 
> +1
> 
> >> 2) Is "workCount" a good name?
> 
> +0.75
> 
> I have no strong feeling but "spillCountThreshold" is more obvious to
> me because it says "spill" rather than general "work" (there is other
> work in the system!).  But "spillToDiskThreshold" also works for me and
> I'm sure that's coloured by seeing it's usage up to now.  The Javadoc
> documentation is more important anyway.


Ok, "spillToDiskThreshold" with good Javadocs looks like it!  You're right
about "work" being ambiguous.


> 
> Are you planning on having different setting for each usage, each with
> it's own name, but defaulting to the setting of a common symbol?  I
> don't think many users will want control each one separately but
> leaving the possibility open (if it's not too much bother) would seem
> reasonable.
> 

I was planning on a single symbol that would apply to all operators
(currently those are Sort, Update, and a new CONSTRUCT method that returns a
databag backed Model).  I think I'll leave it to a future task to add
symbols for each individual operator.  If those settings turn out to not be
necessary then it simplifies the system for both the developers and user by
omitting them.

-Stephen




Mime
View raw message