calcite-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Christian Beikov <>
Subject Materialization performance
Date Sun, 27 Aug 2017 13:20:02 GMT
Hey, I have been looking a bit into how materialized views perform 
during the planning because of a very long test 
run(MaterializationTest#testJoinMaterializationUKFK6) and the current 
state is problematic.

CalcitePrepareImpl#getMaterializations always reparses the SQL and down 
the line, there is a lot of expensive work(e.g. predicate and lineage 
determination) done during planning that could easily be pre-calculated 
and cached during materialization creation.

There is also a bit of a thread safety problem with the current 
implementation. Unless there is a different safety mechanism that I 
don't see, the sharing of the MaterializationService and thus also the 
maps in MaterializationActor via a static instance between multiple 
threads is problematic.

Since I mentioned thread safety, how is Calcite supposed to be used in a 
multi-threaded environment? Currently I use a connection pool that 
initializes the schema on new connections, but that is not really nice. 
I suppose caches are also bound to the connection? A thread safe context 
that can be shared between connections would be nice to avoid all that 
repetitive work.

Are these known issues which you have thought about how to fix or should 
I log JIRAs for these and fix them to the best of my knowledge? I'd more 
or less keep the service shared but would implement it using a copy on 
write strategy since I'd expect seldom schema changes after startup.

Regarding the repetitive work that partly happens during planning, I'd 
suggest doing that during materialization registration instead like it 
is already mentioned CalcitePrepareImpl#populateMaterializations. Would 
that be ok?


Mit freundlichen Grüßen,
*Christian Beikov*

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message