HI Daniel,
Please see my comments below
Thanks in advance,
Siegfried Goeschl
> On 29.02.2020, at 21:02, Daniel Dekany <daniel.dekany@gmail.com> wrote:
>
>>
>> I try to provide a useful name even when the content is coming from an
>> URL
>
>
> When is it recommended to rely on that though? Because utilizing that means
> that renaming a data source file can break the process, even if you call
> freemarker-cli with the up to date file name. And if that happens depends
> on what you (or an other random colleague!) have dug inside the templates.
> So I guess we better just don't support this. Less code and less things to
> document too.
>
Actually not recommended but we have named data sources for less than 24 hours
>
>> I think we have a different understanding what a "Document" / "Datasource
>> / DataSource" should do
>
>
> Thing is, eventually (most certainly pre-1.0, as it influences
> architecture), certain needs will have to addressed, somehow. Then we will
> see what "things" we really need. For now I though we need "things" that
> are much more than paths, and encapsulate the "how to load the data"
> aspect. I called them data sources, but maybe we should called them "data
> loaders" to free up data sources for the more primitive thing. Some
> needs/doubts to address, *later*: Is it really the best approach for users
> to load/parse data sources programmatically (that coded is written in FTL,
> inside the templates)? Also, is the template the right place for doing
> that, because, when multiple templates (or just multiple template *runs* of
> the same template, each generating a different output file) needs common
> data, they shouldn't load it again and again. Also, different topic, can we
> handle the case "transparently" enough when the data is not coming from a
> file?
This is a command line tool where we have little idea what the user will do or abuse
* How does a "data loader" knows that it is responsible to load a file
* What should as "CSV data loader" should do - parse it into a list of records or stream one by one?
* How to handle the case if you have multiple potential data loaders for a single file?
I'm leaning towards building blocks where the user controls the work to be done even it requires one to two extra lines of FTL code
>
> The joy of programming - I did not intend to use "name:group" together with
>> wildcards :-)
>
>
> For a CLI tool, I guess we agree that it should work. So maybe, like this
> (here logs and foos meant to be "groups"):
> --data-source logs file1.log file2.log fileN.log --data-source foos
> foo1.csv foo2.csv fooN.csv --data-source bar bar.xlsx
>
> It so happens that here you don't really have a good control about the
> number of files associated to the name, so, maybe yet another reason to not
> differentiate names and groups.
>
> I Disagree here - I think using a name would be used more often. I added
>> the "group" as an afterthought since some grouping could be useful
>
>
> We do agree in that. What I said is that the *syntax* should be so that the
> group comes first. It's still optional. Like this:
> --data-source group:name /somewhere
> --data-source name /somewhere
That's comes down to personal preferences, e.g. chown uses "owner[:group] "
>
> On Sat, Feb 29, 2020 at 7:34 PM Siegfried Goeschl <
> siegfried.goeschl@gmail.com> wrote:
>
>> HI Daniel,
>>
>> Seem my comments below
>>
>> Thanks in advance,
>>
>> Siegfried Goeschl
>>
>>
>>> On 29.02.2020, at 19:08, Daniel Dekany <daniel.dekany@gmail.com> wrote:
>>>
>>> FREEMARKER-135 freemarker-generator-cli: Support user-supplied names for
>>> datasources
>>>
>>> So, I can do this to have both a name an a group associated to a data
>>> source:
>>> --datasource someName:someGroup=somewhere/something
>>
>> Correct
>>
>>> Or if I only want a name, but not a group (or an "" group actually -
>>> bug?), then:
>>> --datasource someName=somewhere/something
>>
>> Correct
>>
>>>
>>> Or if only a group but not a name (or a "" name actually) then:
>>> --datasource :someGroup=somewhere/something
>>
>> Mhmm, that would be unintended functionality from my side - current
>> approach is that every "Document" / "Datasource / DataSource" is named
>>
>>>
>>> A name must identify exactly 1 data source, while a group identifies a
>> list
>>> of data sources.
>>
>> No, every "Document" / "Datasource / DataSource" has a name currently but
>> uniqueness is not enforced. Only if you want to get a "Document" /
>> "Datasource / DataSource" with it's exact name I checked for exactly one
>> search hit and throw an exception. I try to provide a useful name even when
>> the content is coming from an URL or STDIN (and I will probably add
>> environment variables as "Document" / "Datasource / DataSource", e.g
>> configuration in the cloud as JSON content passed as environment variable)
>>
>>>
>>> Is that this idea, that the a data source can be part of a group, and
>> then
>>> is also possibly identifiable with a name comes from an use case? I mean,
>>> it's possibly important somewhere, but if so, then it's strange that you
>>> can put something into only a single group. If we need this kind of
>> thing,
>>> then perhaps you should be just allowed to associate the data source
>> with a
>>> list of names (kind of like tagging), and then when the template wants to
>>> get something by name, it will tell there if it expects exactly one or a
>>> list of data sources. Then you don't need to introduce two terms in the
>>> documentation either (names and groups). Again, if we want this at all,
>>> instead of just going with a data source that itself gives a list. (And
>> if
>>> not, how will we handle a data source that loads from a non-file source?)
>>
>> I actually thought of implementing tagging but considered a "group"
>> sufficient.
>>
>> * If you don't define anything everything goes into the "default" group
>> * For individual documents you can define a name and an optional group
>>
>> I think we have a different understanding what a "Document" / "Datasource
>> / DataSource" should do
>>
>> * It is a dumb
>> * It is lazy since data is only loaded on demand
>> * There is no automagic like "oh, this is a JSON file, so let's go to the
>> JSON tool and create a map readily accessible in the data model"
>>
>>>
>>> Note that the current command line syntax doesn't work well with shell
>>> wildcard expansion. Like this:
>>> --datasource :someGroup=logs/*.log
>>> will try to expand ":someGroup=logs/*.log", and because it finds nothing
>>> (and because the rules of sh and the like is a mess), you will get the
>>> parameter value as is, without * expanded.
>>
>> The joy of programming - I did not intend to use "name:group" together
>> with wildcards :-)
>>
>>>
>>> Also, I think the syntax with colon should be flipped, because on other
>>> places foo:bar usually means that foo is the bigger unit (the container),
>>> and bar is the smaller unit (the child).
>>
>> I Disagree here - I think using a name would be used more often. I added
>> the "group" as an afterthought since some grouping could be useful
>>
>>>
>>> On Sat, Feb 29, 2020 at 5:03 PM Siegfried Goeschl <
>>> siegfried.goeschl@gmail.com> wrote:
>>>
>>>> Hi Daniel,
>>>>
>>>> I'm an enterprise developer - bad habits die hard :-)
>>>>
>>>> So I closed the following tickets and merged the branches
>>>>
>>>> 1) FREEMARKER-129 freemarker-generator: Merge "freemarker-cli" into
>>>> "freemarker-generator"
>>>> 2) FREEMARKER-134 freemarker-generator: Rename "Document" to
>> "Datasource"
>>>> 3) FREEMARKER-135 freemarker-generator-cli: Support user-supplied names
>>>> for datasources
>>>>
>>>> Thanks in advance,
>>>>
>>>> Siegfried Goeschl
>>>>
>>>>
>>>>> On 29.02.2020, at 12:19, Daniel Dekany <daniel.dekany@gmail.com>
>> wrote:
>>>>>
>>>>> Yeah, and of course, you can merge that branch. You can even work on
>> the
>>>>> master directly after all.
>>>>>
>>>>> On Sat, Feb 29, 2020 at 12:17 PM Daniel Dekany <
>> daniel.dekany@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> But, I do recognize the cattle use case (several "faceless" files with
>>>>>> common format/schema). Only, my idea is to push that complexity on the
>>>> data
>>>>>> source. The "data source" concept shields the rest of the application
>>>> from
>>>>>> the details of how the data is stored or retrieved. So, a data source
>>>> might
>>>>>> loads a bunch of log files from a directory, and present them as a
>>>> single
>>>>>> big table, or like a list of tables, etc. So I want to deal with the
>>>> cattle
>>>>>> use case, but the question is what part of the of architecture will
>> deal
>>>>>> with this complication, with other words, how do you box things. Why
>> my
>>>>>> initial bet is to stuff that complication into the "data source"
>>>>>> implementation(s) is that data sources are inherently varied. Some
>>>> returns
>>>>>> a table-like thing, some have multiple named tables (worksheets in
>>>> Excel),
>>>>>> some returns tree of nodes (XML), etc. So then, some might returns a
>>>>>> list-of-list-of log records, or just a single list of log-records (put
>>>>>> together from daily log files). That way cattles don't add to
>> conceptual
>>>>>> complexity. Now, you might be aware of cases where the cattle concept
>>>> must
>>>>>> be more exposed than this, and the we can't box things like this. But
>>>> this
>>>>>> is what I tried to express.
>>>>>>
>>>>>> Regarding "output generators", and how that applies on the command
>>>> line. I
>>>>>> think it's important that the common core between Maven and
>>>> command-line is
>>>>>> as fat as possible. Ideally, they are just two syntax to set up the
>> same
>>>>>> thing. Mostly at least. So, if you specify a template file to the CLI
>>>>>> application, in a way so that it causes it to process that template to
>>>>>> generate a single output, then there you have just defined an "output
>>>>>> generator" (even if it wasn't explicitly called like that in the
>> command
>>>>>> line). If you specify 3 csv files to the CLI application, in a way so
>>>> that
>>>>>> it causes it to generate 3 output files, then you have just defined 3
>>>>>> "output generators" there (there's at least one template specified
>> there
>>>>>> too, but that wasn't an "output generator" itself, it was just an
>>>> attribute
>>>>>> of the 3 output generators). If you specify 1 template, and 3 csv
>>>> files, in
>>>>>> a way so that it will yield 4 output files (1 for the template, 3 for
>>>> the
>>>>>> csv-s), then you have defined 4 output generators there. If you have a
>>>> data
>>>>>> source that loads a list of 3 entities (say, 3 csv files, so it's a
>>>> list of
>>>>>> tables then), and you have 2 templates, and you tell the CLI to
>> execute
>>>>>> each template for each item in said data source, then you have just
>>>> defined
>>>>>> 6 "output generators".
>>>>>>
>>>>>> On Fri, Feb 28, 2020 at 11:08 AM Siegfried Goeschl <
>>>>>> siegfried.goeschl@gmail.com> wrote:
>>>>>>
>>>>>>> Hi Daniel,
>>>>>>>
>>>>>>> That all depends on your mental model and work you do, expectations,
>>>>>>> experience :-)
>>>>>>>
>>>>>>>
>>>>>>> __Document Handling__
>>>>>>>
>>>>>>> *"But I think actually we have no good use case for list of documents
>>>>>>> that's passed at once to a single template run, so, we can just
>> ignore
>>>>>>> that complication"*
>>>>>>>
>>>>>>> In my case that's not a complication but my daily business - I'm
>>>>>>> regularly wading through access logs - yesterday probably a couple of
>>>>>>> hundreds access logs across two staging sites to help tracking some
>>>>>>> strange API gateway issues :-)
>>>>>>>
>>>>>>> My gut feeling is (borrowing from
>>>>>>>
>>>>>>>
>>>>
>> https://medium.com/@Joachim8675309/devops-concepts-pets-vs-cattle-2380b5aab313
>>>>>>> )
>>>>>>>
>>>>>>> 1. You have a few lovely named documents / templates - `pets`
>>>>>>> 2. You have tons of anonymous documents / templates to process -
>>>>>>> `cattle`
>>>>>>> 3. The "grey area" comes into play when mixing `pets & cattle`
>>>>>>>
>>>>>>> `freemarker-cli` was built with 2) in mind and I want to cover 1)
>> since
>>>>>>> it is equally important and common.
>>>>>>>
>>>>>>>
>>>>>>> __Template And Document Processing Modes__
>>>>>>>
>>>>>>> IMHO it is important to answer the following question : "How many
>>>>>>> outputs do you get when rendering 2 template and 3 datasources? Two,
>>>>>>> Three or Six?"
>>>>>>>
>>>>>>> Your answer is influenced by your mental model / experience
>>>>>>>
>>>>>>> * When wading through tons of CSV files, access logs, etc. the answer
>>>> is
>>>>>>> "2"
>>>>>>> * When doing source code generation the obvious answer is "6"
>>>>>>> * Can't image a use case which results in "3" but I'm pretty sure we
>>>>>>> will encounter one
>>>>>>>
>>>>>>> __Template and document mode probably shouldn't exist__
>>>>>>>
>>>>>>> That's hard for me to fully understand - I definitely lack your
>>>> insights
>>>>>>> & experience writing such tools :-)
>>>>>>>
>>>>>>> Defining the `Output Generator` is the underlying model for the Maven
>>>>>>> plugin (and probably FMPP).
>>>>>>>
>>>>>>> I'm not sure if this applies for command lines at least not in the
>> way
>>>> I
>>>>>>> use them (or would like to use them)
>>>>>>>
>>>>>>>
>>>>>>> Thanks in advance,
>>>>>>>
>>>>>>> Siegfried Goeschl
>>>>>>>
>>>>>>> PS: Can/shall I merge the PR to bring in `freemarker-cli`?
>>>>>>>
>>>>>>>
>>>>>>> On 28 Feb 2020, at 9:14, Daniel Dekany wrote:
>>>>>>>
>>>>>>>> Yeah, "data source" is surely a too popular name, but for reason.
>>>>>>>> Anyone
>>>>>>>> has other ideas?
>>>>>>>>
>>>>>>>> As of naming data sources and such. One thing I was wondering about
>>>>>>>> back
>>>>>>>> then is how to deal with list of documents given to a template,
>> versus
>>>>>>>> exactly 1 document given to a template. But I think actually we have
>>>>>>>> no
>>>>>>>> good use case for list of documents that's passed at once to a
>> single
>>>>>>>> template run, so, we can just ignore that complication. A document
>> has
>>>>>>>> a
>>>>>>>> name, and that's always just a single document, not a collection, as
>>>>>>>> far as
>>>>>>>> the template is concerned. (We can have multiple documents per run,
>>>>>>>> but
>>>>>>>> those normally yield separate output generators, so it's still only
>>>>>>>> one
>>>>>>>> document per template.) However, we can have data source types
>>>>>>>> (document
>>>>>>>> types with old terminology) that collect together multiple data
>> files.
>>>>>>>> So
>>>>>>>> then that complexity is encapsulated into the data source type, and
>>>>>>>> doesn't
>>>>>>>> complicate the overall architecture. That's another case when a data
>>>>>>>> source
>>>>>>>> is not just a file. Like maybe there's a data source type that loads
>>>>>>>> all
>>>>>>>> the CSV-s from a directory, into a single big table (I had such
>> case),
>>>>>>>> or
>>>>>>>> even into a list of tables. Or, as I mentioned already, a data
>> source
>>>>>>>> is
>>>>>>>> maybe an SQL query on a JDBC data source (and we got the first term
>>>>>>>> clash... JDBC also call them data sources).
>>>>>>>>
>>>>>>>> Template and document mode probably shouldn't exist from user
>>>>>>>> perspective
>>>>>>>> either, at least not as a global option that must apply to
>> everything
>>>>>>>> in a
>>>>>>>> run. They could just give the files that define the "output
>>>>>>>> generators",
>>>>>>>> and some of them will be templates, some of them are data files, in
>>>>>>>> which
>>>>>>>> case a template need to be associated with them (and there can be a
>>>>>>>> couple
>>>>>>>> of ways of doing that). And then again, there are the cases where
>> you
>>>>>>>> want
>>>>>>>> to create one output generator per entity from some data source.
>>>>>>>>
>>>>>>>> On Fri, Feb 28, 2020 at 8:23 AM Siegfried Goeschl <
>>>>>>>> siegfried.goeschl@gmail.com> wrote:
>>>>>>>>
>>>>>>>>> Hi Daniel,
>>>>>>>>>
>>>>>>>>> See my comments below - and thanks for your patience and input :-)
>>>>>>>>>
>>>>>>>>> *Renaming Document To DataSource*
>>>>>>>>>
>>>>>>>>> Yes, makes sense. I tried to avoid since I'm using javax.activation
>>>>>>>>> and
>>>>>>>>> its DataSource.
>>>>>>>>>
>>>>>>>>> *Template And Document Mode*
>>>>>>>>>
>>>>>>>>> Agreed - I think it is a valuable abstraction for the user but it
>> is
>>>>>>>>> not
>>>>>>>>> an implementation concept :-)
>>>>>>>>>
>>>>>>>>> *Document Without Symbolic Names*
>>>>>>>>>
>>>>>>>>> Also agreed and it is going to change but I have not settled my
>> mind
>>>>>>>>> yet
>>>>>>>>> what exactly to implement.
>>>>>>>>>
>>>>>>>>> Thanks in advance,
>>>>>>>>>
>>>>>>>>> Siegfried Goeschl
>>>>>>>>>
>>>>>>>>> On 28 Feb 2020, at 1:05, Daniel Dekany wrote:
>>>>>>>>>
>>>>>>>>> A few quick thoughts on that:
>>>>>>>>>
>>>>>>>>> - We should replace the "document" term with something more
>> speaking.
>>>>>>>>> It
>>>>>>>>> doesn't tell that it's some kind of input. Also, most of these
>> inputs
>>>>>>>>> aren't something that people typically call documents. Like a csv
>>>>>>>>> file, or
>>>>>>>>> a database table, which is not even a file (OK we don't support
>> such
>>>>>>>>> thing
>>>>>>>>> at the moment). I think, maybe "data source" is a safe enough term.
>>>>>>>>> (It
>>>>>>>>> also rhymes with data model.)
>>>>>>>>> - You have separate "template" and "document" "mode", that applies
>> to
>>>>>>>>> a
>>>>>>>>> whole run. I think such specialization won't be helpful. We could
>>>>>>>>> just say,
>>>>>>>>> on the conceptual level at lest, that we need a set of "outputs
>>>>>>>>> generators". An output generator is an object (in the API) that
>>>>>>>>> specifies a
>>>>>>>>> template, a data-model (where the data-model is possibly populated
>>>>>>>>> with
>>>>>>>>> "documents"), and an output "sink" (a file path, or stdout), and
>> can
>>>>>>>>> generate the output itself. A practical way of defining the output
>>>>>>>>> generators in a CLI application is via a bunch of files, each
>>>>>>>>> defining an
>>>>>>>>> output generator. Some of those files is maybe a template (that you
>>>>>>>>> can
>>>>>>>>> even detect from the file extension), or a data file that we
>>>>>>>>> currently call
>>>>>>>>> a "document". They could freely mix inside the same run. I have
>> also
>>>>>>>>> met
>>>>>>>>> use case when you have a single table (single "document"), and each
>>>>>>>>> record
>>>>>>>>> in it yields an output file. That can also be described in some
>> file
>>>>>>>>> format, or really in any other way, like directly in command line
>>>>>>>>> argument,
>>>>>>>>> via API, etc.
>>>>>>>>> - You have multiple documents without associated symbolical name in
>>>>>>>>> some
>>>>>>>>> examples. Templates can't identify those then in a well
>> maintainable
>>>>>>>>> way.
>>>>>>>>> The actual file name is often not a good identifier, can change
>> over
>>>>>>>>> time,
>>>>>>>>> and you might don't even have good control over it, like you
>> already
>>>>>>>>> receive it as a parameter from somewhere else, or someone
>>>>>>>>> moves/renames
>>>>>>>>> that files that you need to read. Index is also not very good, but
>> I
>>>>>>>>> have
>>>>>>>>> written about that earlier.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Wed, Feb 26, 2020 at 9:33 AM Siegfried Goeschl <
>>>>>>>>> siegfried.goeschl@gmail.com> wrote:
>>>>>>>>>
>>>>>>>>> Hi folks,
>>>>>>>>>
>>>>>>>>> still wrapping my side around but assembled some thoughts here -
>>>>>>>>> https://gist.github.com/sgoeschl/b09b343a761b31a6c790d882167ff449
>>>>>>>>>
>>>>>>>>> Thanks in advance,
>>>>>>>>>
>>>>>>>>> Siegfried Goeschl
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 23 Feb 2020, at 23:14, Daniel Dekany <ddekany@apache.org>
>> wrote:
>>>>>>>>>
>>>>>>>>> What you are describing is more like the angle that FMPP took
>>>>>>>>> initially,
>>>>>>>>> where templates drive things, they generate the output for
>> themselves
>>>>>>>>>
>>>>>>>>> (even
>>>>>>>>>
>>>>>>>>> multiple output files if they wish). By default output files name
>>>>>>>>> (and
>>>>>>>>> relative path) is deduced from template name. There was also a
>> global
>>>>>>>>> data-model, built in a configuration file (or equally, built via
>>>>>>>>> command
>>>>>>>>> line arguments, or both mixed), from which templates get whatever
>>>>>>>>> data
>>>>>>>>>
>>>>>>>>> they
>>>>>>>>>
>>>>>>>>> are interested in. Take a look at the figures here:
>>>>>>>>> http://fmpp.sourceforge.net/qtour.html. Later, this concept was
>>>>>>>>>
>>>>>>>>> generalized
>>>>>>>>>
>>>>>>>>> a bit more, because you could add XML files at the same place where
>>>>>>>>> you
>>>>>>>>> have the templates, and then you could associate transform
>> templates
>>>>>>>>> to
>>>>>>>>>
>>>>>>>>> the
>>>>>>>>>
>>>>>>>>> XML files (based on path pattern and/or the XML document element).
>>>>>>>>> Now
>>>>>>>>> that's like what freemarker-generator had initially (data files
>> drive
>>>>>>>>> output, and the template is there to transform it).
>>>>>>>>>
>>>>>>>>> So I think the generic mental model would like this:
>>>>>>>>>
>>>>>>>>> 1. You got files that drive the process, let's call them *generator
>>>>>>>>> files* for now. Usually, each generator file yields an output file
>>>>>>>>> (but
>>>>>>>>> maybe even multiple output files, as you might saw in the last
>>>>>>>>> figure).
>>>>>>>>> These generator files can be of many types, like XML, JSON, XLSX
>> (as
>>>>>>>>>
>>>>>>>>> in the
>>>>>>>>>
>>>>>>>>> original freemarker-generator), and even templates (as is the norm
>> in
>>>>>>>>> FMPP). If the file is not a template, then you got a set of
>>>>>>>>> transformer
>>>>>>>>> templates (-t CLI option) in a separate directory, which can be
>>>>>>>>>
>>>>>>>>> associated
>>>>>>>>>
>>>>>>>>> with the generator files base on name patterns, and even based on
>>>>>>>>>
>>>>>>>>> content
>>>>>>>>>
>>>>>>>>> (schema usually). If the generator file is a template (so that's a
>>>>>>>>> positional @Parameter CLI argument that happens to be an *.ftl, and
>>>>>>>>> is
>>>>>>>>>
>>>>>>>>> not
>>>>>>>>>
>>>>>>>>> a template file specified after the "-t" option), then you just
>>>>>>>>> Template.process(...) it, and it prints what the output will be.
>>>>>>>>> 2. You also have a set of variables, the global data-model, that
>>>>>>>>> contains commonly useful stuff, like what you now call parameters
>>>>>>>>> (CLI
>>>>>>>>> -Pname=value), but also maybe data loaded from JSON, XML, etc..
>> Those
>>>>>>>>>
>>>>>>>>> data
>>>>>>>>>
>>>>>>>>> files aren't "generator files". Templates just use them if they
>> need
>>>>>>>>>
>>>>>>>>> them.
>>>>>>>>>
>>>>>>>>> An important thing here is to reuse the same mechanism to read and
>>>>>>>>>
>>>>>>>>> parse
>>>>>>>>>
>>>>>>>>> those data files, which was used in templates when transforming
>>>>>>>>>
>>>>>>>>> generator
>>>>>>>>>
>>>>>>>>> files. So we need a common format for specifying how to load data
>>>>>>>>>
>>>>>>>>> files.
>>>>>>>>>
>>>>>>>>> That's maybe just FTL that #assigns to the variables, or maybe more
>>>>>>>>> declarative format.
>>>>>>>>>
>>>>>>>>> What I have described in the original post here was a less generic
>>>>>>>>> form
>>>>>>>>>
>>>>>>>>> of
>>>>>>>>>
>>>>>>>>> this, as I tried to be true with the original approach. I though
>> the
>>>>>>>>> proposal will be drastic enough as it is... :) There, the "main"
>>>>>>>>> document
>>>>>>>>> is the "generator file" from point 1, the "-t" template is the
>>>>>>>>> transform
>>>>>>>>> template for the "main" document, and the other named documents
>>>>>>>>> ("users",
>>>>>>>>> "groups") is a poor man's shared data-model from point 2 (together
>>>>>>>>> with
>>>>>>>>> with -PName=value).
>>>>>>>>>
>>>>>>>>> There's further somewhat confusing thing to get right with the
>>>>>>>>> list-of-documents (`DocuentList`, `NamedDocumentLists`) thing
>> though.
>>>>>>>>> In
>>>>>>>>> the model above, as per point 1, if you list multiple data files,
>>>>>>>>> each
>>>>>>>>>
>>>>>>>>> will
>>>>>>>>>
>>>>>>>>> generate a separate output file. So, if you need take in a list of
>>>>>>>>> files
>>>>>>>>>
>>>>>>>>> to
>>>>>>>>>
>>>>>>>>> transform it to a single output file (or at least with a single
>>>>>>>>> transform
>>>>>>>>> template execution), then you have to be explicit about that, as
>>>>>>>>> that's
>>>>>>>>>
>>>>>>>>> not
>>>>>>>>>
>>>>>>>>> the default behavior anymore. But it's still absolutely possible.
>>>>>>>>> Imagine
>>>>>>>>> it as a "list of XLSX-es" is itself like a file format. You need
>> some
>>>>>>>>> CLI
>>>>>>>>> (and Maven config, etc.) syntax to express that, but that shouldn't
>>>>>>>>> be a
>>>>>>>>> big deal.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Sun, Feb 23, 2020 at 9:43 PM Siegfried Goeschl <
>>>>>>>>> siegfried.goeschl@gmail.com> wrote:
>>>>>>>>>
>>>>>>>>> Hi Daniel,
>>>>>>>>>
>>>>>>>>> Good timing - I was looking at a similar problem from different
>> angle
>>>>>>>>> yesterday (see below)
>>>>>>>>>
>>>>>>>>> Don't have enough time to answer your email in detail now - will do
>>>>>>>>> that
>>>>>>>>> tomorrow evening
>>>>>>>>>
>>>>>>>>> Thanks in advance,
>>>>>>>>>
>>>>>>>>> Siegfried Goeschl
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> ===. START
>>>>>>>>> # FreeMarker CLI Improvement
>>>>>>>>> ## Support Of Multiple Template Files
>>>>>>>>> Currently we support the following combinations
>>>>>>>>>
>>>>>>>>> * Single template and no data files
>>>>>>>>> * Single template and one or more data files
>>>>>>>>>
>>>>>>>>> But we can not support the following use case which is quite
>> typical
>>>>>>>>> in
>>>>>>>>> the cloud
>>>>>>>>>
>>>>>>>>> __Convert multiple templates with a single data file, e.g copying a
>>>>>>>>> directory of configuration files using a JSON configuration file__
>>>>>>>>>
>>>>>>>>> ## Implementation notes
>>>>>>>>> * When we copy a directory we can remove the `ftl`extension on the
>>>>>>>>> fly
>>>>>>>>> * We might need an `exclude` filter for the copy operation
>>>>>>>>> * Initially resolve to a list of template files and process one
>> after
>>>>>>>>> another
>>>>>>>>> * Need to calculate the output file location and extension
>>>>>>>>> * We need to rename the existing command line parameters (see
>> below)
>>>>>>>>> * Do we need multiple include and exclude filter?
>>>>>>>>> * Do we need file versus directory filters?
>>>>>>>>>
>>>>>>>>> ### Command Line Options
>>>>>>>>> ```
>>>>>>>>> --input-encoding : Encoding of the documents
>>>>>>>>> --output-encoding : Encoding of the rendered template
>>>>>>>>> --template-encoding : Encoding of the template
>>>>>>>>> --output : Output file or directory
>>>>>>>>> --include-document : Include pattern for documents
>>>>>>>>> --exclude-document : Exclude pattern for documents
>>>>>>>>> --include-template: Include pattern for templates
>>>>>>>>> --exclude-template : Exclude pattern for templates
>>>>>>>>> ```
>>>>>>>>>
>>>>>>>>> ### Command Line Examples
>>>>>>>>> ```text
>>>>>>>>> # Copy all FTL templates found in "ext/config" to the "/config"
>>>>>>>>>
>>>>>>>>> directory
>>>>>>>>>
>>>>>>>>> using the data from "config.json"
>>>>>>>>>
>>>>>>>>> freemarker-cli -t ./ext/config --include-template *.ftl --o /config
>>>>>>>>>
>>>>>>>>> config.json
>>>>>>>>>
>>>>>>>>> freemarker-cli --template ./ext/config --include-template *.ftl
>>>>>>>>>
>>>>>>>>> --output
>>>>>>>>>
>>>>>>>>> /config config.json
>>>>>>>>>
>>>>>>>>> # Bascically the same using a named document "configuration"
>>>>>>>>> # It might make sense to expose "conf" directly in the FreeMarker
>>>>>>>>> data
>>>>>>>>> model
>>>>>>>>> # It might make sens to allow URIs for loading documents
>>>>>>>>>
>>>>>>>>> freemarker-cli -t ./ext/config/*.ftl -o /config -d
>>>>>>>>>
>>>>>>>>> configuration=config.json
>>>>>>>>>
>>>>>>>>> freemarker-cli --template ./ext/config --include-template *.ftl
>>>>>>>>>
>>>>>>>>> --output
>>>>>>>>>
>>>>>>>>> /config --document configuration=config.json
>>>>>>>>>
>>>>>>>>> freemarker-cli --template ./ext/config --include-template *.ftl
>>>>>>>>>
>>>>>>>>> --output
>>>>>>>>>
>>>>>>>>> /config --document configuration=file:///config.json
>>>>>>>>>
>>>>>>>>> # Bascically the same using an environment variable as named
>> document
>>>>>>>>>
>>>>>>>>> freemarker-cli -t ./ext/config --include-template *.ftl -o /config
>> -d
>>>>>>>>>
>>>>>>>>> configuration=env:///CONFIGURATION
>>>>>>>>>
>>>>>>>>> freemarker-cli --template ./ext/config --include-template *.ftl
>>>>>>>>>
>>>>>>>>> --output
>>>>>>>>>
>>>>>>>>> /config --document configuration=env:///CONFIGURATION
>>>>>>>>> ```
>>>>>>>>> === END
>>>>>>>>>
>>>>>>>>> On 23.02.2020, at 16:37, Daniel Dekany <ddekany@apache.org> wrote:
>>>>>>>>>
>>>>>>>>> Input documents is a fundamental concept in freemarker-generator,
>> so
>>>>>>>>> we
>>>>>>>>> should think about that more, and probably refine/rework how it's
>>>>>>>>> done.
>>>>>>>>>
>>>>>>>>> Currently it works like this, with CLI at least.
>>>>>>>>>
>>>>>>>>> freemarker-cli
>>>>>>>>> -t access-report.ftl
>>>>>>>>> somewhere/foo-access-log.csv
>>>>>>>>>
>>>>>>>>> Then in access-report.ftl you have to do something like this:
>>>>>>>>>
>>>>>>>>> <#assign doc = Documents.get(0)>
>>>>>>>>> ... process doc here
>>>>>>>>>
>>>>>>>>> (The more idiomatic Documents[0] won't work. Actually, that lead
>> to a
>>>>>>>>>
>>>>>>>>> funny
>>>>>>>>>
>>>>>>>>> chain of coincidences: It returned the string "D", then
>>>>>>>>>
>>>>>>>>> CSVTool.parse(...)
>>>>>>>>>
>>>>>>>>> happily parsed that to a table with the single column "D", and 0
>>>>>>>>> rows,
>>>>>>>>>
>>>>>>>>> and
>>>>>>>>>
>>>>>>>>> as there were 0 rows, the template didn't run into an error because
>>>>>>>>> row.myExpectedColumn refers to a missing column either, so the
>>>>>>>>> process
>>>>>>>>> finished with success. (: Pretty unlucky for sure. The root was
>>>>>>>>> unintentionally breaking a FreeMarker idiom though; eventually we
>>>>>>>>> will
>>>>>>>>>
>>>>>>>>> have
>>>>>>>>>
>>>>>>>>> to work on those too, but, different topic.)
>>>>>>>>>
>>>>>>>>> However, actually multiple input documents can be passed in:
>>>>>>>>>
>>>>>>>>> freemarker-cli
>>>>>>>>> -t access-report.ftl
>>>>>>>>> somewhere/foo-access-log.csv
>>>>>>>>> somewhere/bar-access-log.csv
>>>>>>>>>
>>>>>>>>> Above template will still work, though then you ignored all but the
>>>>>>>>>
>>>>>>>>> first
>>>>>>>>>
>>>>>>>>> document. So if you expect any number of input documents, you
>>>>>>>>> probably
>>>>>>>>>
>>>>>>>>> will
>>>>>>>>>
>>>>>>>>> have to do this:
>>>>>>>>>
>>>>>>>>> <#list Documents.list as doc>
>>>>>>>>> ... process doc here
>>>>>>>>> </#list>
>>>>>>>>>
>>>>>>>>> (The more idiomatic <#list Documents as doc> won't work; but again,
>>>>>>>>>
>>>>>>>>> those
>>>>>>>>>
>>>>>>>>> we will work out in a different thread.)
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> So, what would be better, in my opinion. I start out from what I
>>>>>>>>> think
>>>>>>>>>
>>>>>>>>> are
>>>>>>>>>
>>>>>>>>> the common uses cases, in decreasing order of frequency. Goal is to
>>>>>>>>>
>>>>>>>>> make
>>>>>>>>>
>>>>>>>>> those less error prone for the users, and simpler to express.
>>>>>>>>>
>>>>>>>>> USE CASE 1
>>>>>>>>>
>>>>>>>>> You have exactly 1 input documents, which is therefore simply "the"
>>>>>>>>> document in the mind of the user. This is probably the typical use
>>>>>>>>>
>>>>>>>>> case,
>>>>>>>>>
>>>>>>>>> but at least the use case users typically start out from when
>>>>>>>>> starting
>>>>>>>>>
>>>>>>>>> the
>>>>>>>>>
>>>>>>>>> work.
>>>>>>>>>
>>>>>>>>> freemarker-cli
>>>>>>>>> -t access-report.ftl
>>>>>>>>> somewhere/foo-access-log.csv
>>>>>>>>>
>>>>>>>>> Then `Documents.get(0)` is not very fitting. Most importantly it's
>>>>>>>>>
>>>>>>>>> error
>>>>>>>>>
>>>>>>>>> prone, because if the user passed in more than 1 documents (can
>> even
>>>>>>>>>
>>>>>>>>> happen
>>>>>>>>>
>>>>>>>>> totally accidentally, like if the user was lazy and used a wildcard
>>>>>>>>>
>>>>>>>>> that
>>>>>>>>>
>>>>>>>>> the shell exploded), the template will silently ignore the rest of
>>>>>>>>> the
>>>>>>>>> documents, and the singe document processed will be practically
>>>>>>>>> picked
>>>>>>>>> randomly. The user might won't notice that and submits a bad report
>>>>>>>>> or
>>>>>>>>>
>>>>>>>>> such.
>>>>>>>>>
>>>>>>>>> I think that in this use case the document should be simply
>> referred
>>>>>>>>> as
>>>>>>>>> `Document` in the template. When you have multiple documents there,
>>>>>>>>> referring to `Document` should be an error, saying that the
>> template
>>>>>>>>>
>>>>>>>>> was
>>>>>>>>>
>>>>>>>>> made to process a single document only.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> USE CASE 2
>>>>>>>>>
>>>>>>>>> You have multiple input documents, but each has different role
>>>>>>>>>
>>>>>>>>> (different
>>>>>>>>>
>>>>>>>>> schema, maybe different file type). Like, you pass in users.csv and
>>>>>>>>> groups.csv. Each has difference schema, and so you want to access
>>>>>>>>> them
>>>>>>>>> differently, but in the same template.
>>>>>>>>>
>>>>>>>>> freemarker-cli
>>>>>>>>> [...]
>>>>>>>>> --named-document users somewhere/foo-users.csv
>>>>>>>>> --named-document groups somewhere/foo-groups.csv
>>>>>>>>>
>>>>>>>>> Then in the template you could refer to them as:
>>>>>>>>>
>>>>>>>>> `NamedDocuments.users`,
>>>>>>>>>
>>>>>>>>> and `NamedDocuments.groups`.
>>>>>>>>>
>>>>>>>>> Use Case 1, and 2 can be unified into a coherent concept, where
>>>>>>>>>
>>>>>>>>> `Document`
>>>>>>>>>
>>>>>>>>> is just a shorthand for `NamedDocuments.main`. It's called "main"
>>>>>>>>>
>>>>>>>>> because
>>>>>>>>>
>>>>>>>>> that's "the" document the template is about, but then you have to
>>>>>>>>> added
>>>>>>>>> some helper documents, with symbolic names representing their role.
>>>>>>>>>
>>>>>>>>> freemarker-cli
>>>>>>>>> -t access-report.ftl
>>>>>>>>> --document-name=main somewhere/foo-access-log.csv
>>>>>>>>> --document-name=users somewhere/foo-users.csv
>>>>>>>>> --document-name=groups somewhere/foo-groups.csv
>>>>>>>>>
>>>>>>>>> Here, `Document` still works in the template, and it refers to
>>>>>>>>> `somewhere/foo-access-log.csv`. (While omitting
>> --document-name=main
>>>>>>>>>
>>>>>>>>> above
>>>>>>>>>
>>>>>>>>> would be cleaner, I couldn't figure out how to do that with
>> Picocli.
>>>>>>>>> Anyway, for now the point is the concept, which is not specific to
>>>>>>>>>
>>>>>>>>> CLI.)
>>>>>>>>>
>>>>>>>>> USE CASE 3
>>>>>>>>>
>>>>>>>>> Here you have several of the same kind of documents. That has a
>> more
>>>>>>>>> generic sub-use-case, when you have explicitly named documents
>> (like
>>>>>>>>> "users" above), and for some you expect multiple input files.
>>>>>>>>>
>>>>>>>>> freemarker-cli
>>>>>>>>> -t access-report.ftl
>>>>>>>>> --document-name=main somewhere/foo-access-log.csv
>>>>>>>>> somewhere/bar-access-log.csv
>>>>>>>>> --document-name=users somewhere/foo-users.csv
>>>>>>>>> somewhere/bar-users.csv
>>>>>>>>> --document-name=groups somewhere/global-groups.csv
>>>>>>>>>
>>>>>>>>> The template must to be written with this use case in mind, as now
>> it
>>>>>>>>>
>>>>>>>>> has
>>>>>>>>>
>>>>>>>>> #list some of the documents. (I think in practice you hardly ever
>>>>>>>>> want
>>>>>>>>>
>>>>>>>>> to
>>>>>>>>>
>>>>>>>>> get a document by hard coded index. Either you don't know how many
>>>>>>>>> documents you have, so you can't use hard coded indexes, or you do,
>>>>>>>>> and
>>>>>>>>> each index has a specific meaning, but then you should name the
>>>>>>>>>
>>>>>>>>> documents
>>>>>>>>>
>>>>>>>>> instead, as using indexes is error prone, and hard to read.)
>>>>>>>>> Accessing that list of documents in the template, maybe could be
>> done
>>>>>>>>>
>>>>>>>>> like
>>>>>>>>>
>>>>>>>>> this:
>>>>>>>>> - For the "main" documents: `DocumentList`
>>>>>>>>> - For explicitly named documents, like "users":
>>>>>>>>>
>>>>>>>>> `NamedDocumentLists.users`
>>>>>>>>>
>>>>>>>>> SUMMING UP
>>>>>>>>>
>>>>>>>>> To unify all 3 use cases into a coherent concept:
>>>>>>>>> - `NamedDocumentLists.<name>` is the most generic form, and while
>> you
>>>>>>>>>
>>>>>>>>> can
>>>>>>>>>
>>>>>>>>> achieve everything with it, using it requires your template to
>> handle
>>>>>>>>>
>>>>>>>>> the
>>>>>>>>>
>>>>>>>>> most generic case too. So, I think it would be rarely used.
>>>>>>>>> - `DocumentList` is just a shorthand for `NamedDocumentLists.main`.
>>>>>>>>>
>>>>>>>>> It's
>>>>>>>>>
>>>>>>>>> used if you only have one kind of documents (single format and
>>>>>>>>> schema),
>>>>>>>>>
>>>>>>>>> but
>>>>>>>>>
>>>>>>>>> potentially multiple of them.
>>>>>>>>> - `NamedDocuments.<name>` expresses that you expect exactly 1
>>>>>>>>> document
>>>>>>>>>
>>>>>>>>> of
>>>>>>>>>
>>>>>>>>> the given name.
>>>>>>>>> - `Document` is just a shorthand for `NamedDocuments.main`. This is
>>>>>>>>> for
>>>>>>>>>
>>>>>>>>> the
>>>>>>>>>
>>>>>>>>> most natural/frequent use case.
>>>>>>>>>
>>>>>>>>> That's 4 possible ways of accessing your documents, which is a
>>>>>>>>>
>>>>>>>>> trade-off
>>>>>>>>>
>>>>>>>>> for the sake of these:
>>>>>>>>> - Catching CLI (or Maven, etc.) input where the template output
>>>>>>>>> likely
>>>>>>>>>
>>>>>>>>> will
>>>>>>>>>
>>>>>>>>> be wrong. That's only possible if the user can communicate its
>> intent
>>>>>>>>>
>>>>>>>>> in
>>>>>>>>>
>>>>>>>>> the template.
>>>>>>>>> - Users don't need to deal with concepts that are irrelevant in
>> their
>>>>>>>>> concrete use case. Just start with the trivial, `Document`, and
>> later
>>>>>>>>>
>>>>>>>>> if
>>>>>>>>>
>>>>>>>>> the need arises, generalize to named documents, document lists, or
>>>>>>>>>
>>>>>>>>> both.
>>>>>>>>>
>>>>>>>>> What do guys think?
>>>>>>>>>
>>>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Best regards,
>>>>>> Daniel Dekany
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Best regards,
>>>>> Daniel Dekany
>>>>
>>>>
>>>
>>> --
>>> Best regards,
>>> Daniel Dekany
>>
>>
>
> --
> Best regards,
> Daniel Dekany
|