commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gary Gregory <garydgreg...@gmail.com>
Subject Re: Immutable builder pattern for parsers?
Date Thu, 09 Feb 2017 04:58:50 GMT
On Wed, Feb 8, 2017 at 7:27 PM, Matt Sicker <boards@gmail.com> wrote:

> I've always considered builders to be mutable, thread unsafe objects used
> to create immutable, thread safe objects.


+1

Gary


> If these builders cause GC
> pressure to go too high, then I'd turn to object pooling or per-thread
> reusable objects depending on how the code is used.
>
> On 8 February 2017 at 20:38, Stian Soiland-Reyes <stain@apache.org> wrote:
>
> > Peter Ansell raised a valid question about why in Commons RDF I made the
> > RDFParser interface as an immutable factory-like builder pattern.
> >
> > https://commons.apache.org/proper/commons-rdf/apidocs/
> > org/apache/commons/rdf/experimental/RDFParser.html
> >
> >
> > Here is how it can be used today:
> >
> >  RDF rdf = new JenaRDF();
> >  Graph g1 = rdf.createGraph();
> >  Future future = new JenaRDFParser()
> >          .source(Paths.get("/tmp/graph.ttl"))
> >          .contentType(RDFSyntax.TURTLE)
> >          .target(g1).parse();
> >  future.get(30, TimeUnit.Seconds); // block
> >
> >
> > You may notice it is not a factory as-such, as you only get a Future
> > of the parser result, rather than the actual "parser". Also it is
> > currently an interface, with multiple implementations (one per RDF
> > implementation).
> >
> >
> >
> > Although the javadoc only suggests for it to be immutable and
> > threadsafe, the abstract class AbstractRDFParser
> > https://commons.apache.org/proper/commons-rdf/apidocs/
> > org/apache/commons/rdf/simple/experimental/AbstractRDFParser.html
> > https://github.com/apache/commons-rdf/blob/master/
> > simple/src/main/java/org/apache/commons/rdf/simple/experimental/
> > AbstractRDFParser.java
> > is both, and thus every "option setter" method therefore return a cloned
> > copy of the factory with the mutated fields set there. This simply uses
> > .clone() at the moment.
> >
> > This abstract class and immutability pattern is used by each of the
> > three RDF implementations of the interface.
> >
> >
> > This means it's safe to parse many files like this:
> >
> >  JenaRDFParser parser = new JenaRDFParser()
> >          .contentType(RDFSyntax.TURTLE)
> >          .target(g1);
> >
> >  // Parse multiple files in parallell
> >  Future future1 = parser.source(Paths.get("/tmp/graph1.ttl")).parse();
> >  Future future2 = parser.source(Paths.get("/tmp/graph2.ttl")).parse();
> >
> >  // Wait for both to finish
> >  future1.get(30, TimeUnit.Seconds);
> >  future2.get(30, TimeUnit.Seconds);
> >
> >
> > Above the second call to parser.source(..) is not a problem (even if
> > it happens in another thread) as both .source() calls return separate
> > cloned JenaRDFParser instances and don't mutate the "parser" instance.
> >
> >
> > Now a valid critique of such a pattern is that it's wasteful to make
> > clones that are immediately thrown away, for instance the above
> > future1/future2 example would cause 7 instantiations of JenaRDFParser()
> > (.parse() does some finalization in a final clone).
> >
> > I have not done any benchmarking, but envision that as parsing RDF files
> > generally creates many thousands of RDFTerm instances, a couple of
> > extra parser instantiations shouldn't make a big dent in the bill.
> >
> >
> > The typical builder pattern is rather to return "this" and mutate fields
> > directly. This is obviously not thread-safe but generally achives the
> > same. Multiple threads would however instead have to create a brand new
> > parser from start and set all the options themselves.  (Or they could a
> > similar .clone() method from a carefully locked down "proto" instance
> > that everyone has to be careful not to call any mutating methods on)
> >
> >
> > Now multiple threads is now very common because of Java 8 streams, and
> > for instance it would not be unthinkable that a parallelStream() is
> > mapped as a .source() or .target() on such a parser builder. The Futures
> > fit somewhat into this picture as well.   I think this would get
> > complicated with having to set everything again rather than just pass
> > "parser::source" as a method reference.
> >
> >
> >
> >
> > As this was just thrown together as an experiment (and we've not yet
> > tried writing the corresponding RDFWriter interface), I think we could
> > revisit how to do this pattern, hopefully drawing on the wider
> > experience of the Apache Commons developers on this list.
> >
> >
> >
> > Roughly here are my requirements:
> >
> > * Ability to set various options (e.g. path of file to parse, format,
> >   destination) in any order
> > * Avoid mega-constructors/multi-arg methods
> > * Some options take multiple types, e.g a source can be given as a Path,
> >   InputStream or IRI (URL) -- but would override each other
> >
> > Nice to have:
> >
> > * Reusable (e.g. set options that are common, but modify
> >   source() to parse multiple files)
> > * Thread-safe (e.g. immediate reuse of builder before a parsing session
> >   has started or finished)
> > * Easy to serialize / keep around (should not keep references to big
> >   objects in memory uneccessarily)
> > * Parsing in the background, e.g. return Future. (implementation manages
> >   threads/queues as it pleases)
> >
> > Unclear:
> >
> > * How to select between the multiple RDFParser instances?
> >   I was thinking the RDF interface could have a .makeParser() method,
> >   but not all parsers can take all formats, so possibly some kind of
> >   registry or "what can you handle" mechanism might be needed.
> >
> >
> > How do you think we should proceed? Mutable or immutable? How should the
> > settings be kept? Fields, map, or what?  Does it make sense with an
> > interface, abstract class (keeps settings) and implementations
> > (processess settings), or should we have a single ParserFactory class
> > and have a new internal interface below?
> >
> >
> > --
> > Stian Soiland-Reyes
> > University of Manchester
> > http://www.esciencelab.org.uk/
> > http://orcid.org/0000-0001-9842-9718
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> > For additional commands, e-mail: dev-help@commons.apache.org
> >
> >
>
>
> --
> Matt Sicker <boards@gmail.com>
>



-- 
E-Mail: garydgregory@gmail.com | ggregory@apache.org
Java Persistence with Hibernate, Second Edition
<https://www.amazon.com/gp/product/1617290459/ref=as_li_tl?ie=UTF8&camp=1789&creative=9325&creativeASIN=1617290459&linkCode=as2&tag=garygregory-20&linkId=cadb800f39946ec62ea2b1af9fe6a2b8>

<http:////ir-na.amazon-adsystem.com/e/ir?t=garygregory-20&l=am2&o=1&a=1617290459>
JUnit in Action, Second Edition
<https://www.amazon.com/gp/product/1935182021/ref=as_li_tl?ie=UTF8&camp=1789&creative=9325&creativeASIN=1935182021&linkCode=as2&tag=garygregory-20&linkId=31ecd1f6b6d1eaf8886ac902a24de418%22>

<http:////ir-na.amazon-adsystem.com/e/ir?t=garygregory-20&l=am2&o=1&a=1935182021>
Spring Batch in Action
<https://www.amazon.com/gp/product/1935182951/ref=as_li_tl?ie=UTF8&camp=1789&creative=9325&creativeASIN=1935182951&linkCode=%7B%7BlinkCode%7D%7D&tag=garygregory-20&linkId=%7B%7Blink_id%7D%7D%22%3ESpring+Batch+in+Action>
<http:////ir-na.amazon-adsystem.com/e/ir?t=garygregory-20&l=am2&o=1&a=1935182951>
Blog: http://garygregory.wordpress.com
Home: http://garygregory.com/
Tweet! http://twitter.com/GaryGregory

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message