cocoon-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Grzegorz Kossakowski <>
Subject Re: Proper way to cache pipelines with SQL Transformer?
Date Sun, 13 Jan 2008 21:38:16 GMT
Peter Wyngaard pisze:
> Hello,
> One of my first goals with cocoon was to great a simple, read-only,
> RESTful interface to some of the objects in our database.  So, for
> example, I'd like to have a set of simple URLs like:
> http://localhost:8888/myblock/data/study.xml?study_name={study_name}
> <http://localhost:8888/myblock/data/study.xml?study_name=%7bstudy_name%7d>
> Throwing this together took no time, after I got over the spring
> datasource issues and cocoon-databases-bridge issues in Cocoon 2.2.

Thanks for positive words about Cocoon 2.2 you are spreading! :)

> I quickly found that are database operations are pretty expensive, and
> since our database contains a lot of read-only data, it would be nice to
> cache.  I understand why SQL Transformer isn't cacheable, and thought I
> might just subclass it and implement Cacheable for my application.  But
> before I dove in that deep, I thought it would be simpler to use
> ExpiresCachingProcessingPipeline.


I would like to explain something: I cut off your detailed explanation not because I don't
like them
(because I do!) but I think they are not needed to be quoted. Nevertheless, such e-mails are
a great
benefit to the community. They give a chance others to understand the background of your problems
and learn much more so keep them sending Peter.

> Here were my requirements:
> * need to include request parameters as part of the cache key
> * need to sort the request parameters so that "…?a=1&b=2" and
> "…?b=2&a=1" are cached only once, not twice
> * need to exclude any special request parameters, like "purge=true" from
> the cache key
> I couldn't find a way to do this with the existing InputModules, so I
> created my own that has one attribute, "request-params", that helps with
> this.  Once added as an <input-module> named "cache-keygen", I was then
> able to do the following:
>       <map:parameter name="purge-cache" value="{request-param:purge}" />
>       <map:parameter name="cache-key"
> value="{request:sitemapURI}?{cache-keygen:request-params}" />
> This worked great.  But I can't help but think that I missed the proper
> cocoon way of doing this.  Did I?

I think you have done it properly. The fact that you have not found an InputModule meeting
quite casual requirements proves that not everything in Cocoon is implemented already. :)

Actually, such Input Module (or better Expression Language implementation, see[1]) allowing
generate cache keys like this would be a nice addition to Cocoon.

> After working with this solution for a bit, I discovered two things:
> 1.  That IdentifierCacheKey puts a prefix of "IK:{true,false}:" on my
> cache-key, and that the true/false indicates whether it was an external
> pipeline call or not.
> 2.  The external pipelines that invoked these internal pipelines with a
> "cocoon:" URL passed their request parameters along.
> So issue #1 led me to decide to make all these SQL Transformer pipelines
> internal-only, so that the cache prefix is always "IK:false:", because I
> don't need to keep two copies of everything in the cache, one "true" and
> one "false".  So far, this hasn't been a problem for me.  I had
> initially made it external, as sometimes I am going to want to get these
> "raw" database objects, and other times other pipelines are going to
> aggregate and transform them into other objects.

There is another strong argument for making your cached pipelines internal-only that you probably
missed. Have you wondered what happens if pipeline cached using your method of generating
cache key
is called with additional parameters that have absolutely no use in your pipeline? Of course
generates duplicate cache entry different only in cache keys. If you made your pipelines external
you would probably end up with overloaded cache because various crawlers add many meaningless
parameters for their own reason.

> Issue #2 was also a surprise.  Let's say I create a pipeline that
> aggregates a study and some other stuff.  For example:
> http://localhost:8888/myblock/data/studyData.xml?study_name={study_name}&type={type}
> <http://localhost:8888/myblock/data/studyData.xml?study_name=%7bstudy_name%7d&type=%7btype%7d>
> And this pipeline called an internal pipeline to get the study object using:
> cocoon:/data/study?study_name={study_name}
> Since all the request parameters of the "parent" are passed on to the
> "child" study pipeline, this results in a cache key:
> IK:false:data/study?study_name=…&type=…
> which is a shame, because I only need one cached version of the study. 
> So after some more digging in the source, I discovered the concept of
> "raw" cocoon requests.  As far as I can tell, the "raw" request just
> prevents the chaining of parameters and attributes up the call stack.
> So, this just meant I have to call all my internal pipelines with
> "cocoon:raw:" instead of "cocoon:".  This worked, and I haven't
> discovered any other side-effects of "raw" yet.
> And that's the story so far.
> So I'm happy to have made it this far, and things are working well. 
> These roadblocks, however, left me feeling that I must not be doing
> things the "cocoon way".

I think that your approach if fine but are trying to apply "new" ideas (REST) to the architecture
that was developed in ages when people wanted to have access to everything, everywhere. That's
explanation for cocoon: protocol passing request parameters to sub-requests. As Tobia already
pointed out, sometimes it's even handy.

Since you have figured it out yourself that there exists "raw" syntax your problem is solved.
However, I see you are developing your application from scratch using Cocoon 2.2. Then I would
suggest to you to use a newer replacement for cocoon: protocol called servlet: protocol. Servlet:
protocol is a part of bigger unit called Servlet Service Framework that is very interesting
especially when RESTful approach comes into play.

Documentation of SSF is in preparation right now, but there are some bits worth reading already
not officially published yet, see[2].

Moreover, I think that this[3] e-mail may interest you also. To put it short: I really recommend
using servlet: protocol instead of cocoon: in Cocoon 2.2.

Getting back to the topic and your question, debate about to pass or not to pass request parameters
to internal requests has quite long tradition and repapered again as servlet: protocol was
implemented. I was a strong proponent of leaving servlet: protocol "clean" but I had to abate
view. You check it the corresponding thread[4] yourself for details. Now it turns out that
we will
need to reinvent equivalent for "cocoon:raw:" which is not what I'm happy about...

> For example, the idea that the query string parameters are not made part
> of the cache-key by default, was surprising.  Is the "cocoon way" to not
> put request parameters in the query string?  Using the query string is a
> fairly standard practice in designing RESTful interfaces.

As I said earlier, Cocoon is quite old so RESTful ideas did not affect it from the beginning.
The is
another side of this issue: if you make everything into cache key you will soon find this
not perfect too. There are some tricky question always open like which parameters should be
into account and which not. Therefore Cocoon leaves the responsibility of generating cache
key in
right hands - developer's ones.

> Thanks!  I have more stories to come.

You welcome Peter.


Grzegorz Kossakowski
Committer and PMC Member of Apache Cocoon

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message