HI Daniel, Please see my comments below Thanks in advance, Siegfried Goeschl > On 29.02.2020, at 21:02, Daniel Dekany wrote: > >> >> I try to provide a useful name even when the content is coming from an >> URL > > > When is it recommended to rely on that though? Because utilizing that means > that renaming a data source file can break the process, even if you call > freemarker-cli with the up to date file name. And if that happens depends > on what you (or an other random colleague!) have dug inside the templates. > So I guess we better just don't support this. Less code and less things to > document too. > Actually not recommended but we have named data sources for less than 24 hours > >> I think we have a different understanding what a "Document" / "Datasource >> / DataSource" should do > > > Thing is, eventually (most certainly pre-1.0, as it influences > architecture), certain needs will have to addressed, somehow. Then we will > see what "things" we really need. For now I though we need "things" that > are much more than paths, and encapsulate the "how to load the data" > aspect. I called them data sources, but maybe we should called them "data > loaders" to free up data sources for the more primitive thing. Some > needs/doubts to address, *later*: Is it really the best approach for users > to load/parse data sources programmatically (that coded is written in FTL, > inside the templates)? Also, is the template the right place for doing > that, because, when multiple templates (or just multiple template *runs* of > the same template, each generating a different output file) needs common > data, they shouldn't load it again and again. Also, different topic, can we > handle the case "transparently" enough when the data is not coming from a > file? This is a command line tool where we have little idea what the user will do or abuse * How does a "data loader" knows that it is responsible to load a file * What should as "CSV data loader" should do - parse it into a list of records or stream one by one? * How to handle the case if you have multiple potential data loaders for a single file? I'm leaning towards building blocks where the user controls the work to be done even it requires one to two extra lines of FTL code > > The joy of programming - I did not intend to use "name:group" together with >> wildcards :-) > > > For a CLI tool, I guess we agree that it should work. So maybe, like this > (here logs and foos meant to be "groups"): > --data-source logs file1.log file2.log fileN.log --data-source foos > foo1.csv foo2.csv fooN.csv --data-source bar bar.xlsx > > It so happens that here you don't really have a good control about the > number of files associated to the name, so, maybe yet another reason to not > differentiate names and groups. > > I Disagree here - I think using a name would be used more often. I added >> the "group" as an afterthought since some grouping could be useful > > > We do agree in that. What I said is that the *syntax* should be so that the > group comes first. It's still optional. Like this: > --data-source group:name /somewhere > --data-source name /somewhere That's comes down to personal preferences, e.g. chown uses "owner[:group] " > > On Sat, Feb 29, 2020 at 7:34 PM Siegfried Goeschl < > siegfried.goeschl@gmail.com> wrote: > >> HI Daniel, >> >> Seem my comments below >> >> Thanks in advance, >> >> Siegfried Goeschl >> >> >>> On 29.02.2020, at 19:08, Daniel Dekany wrote: >>> >>> FREEMARKER-135 freemarker-generator-cli: Support user-supplied names for >>> datasources >>> >>> So, I can do this to have both a name an a group associated to a data >>> source: >>> --datasource someName:someGroup=somewhere/something >> >> Correct >> >>> Or if I only want a name, but not a group (or an "" group actually - >>> bug?), then: >>> --datasource someName=somewhere/something >> >> Correct >> >>> >>> Or if only a group but not a name (or a "" name actually) then: >>> --datasource :someGroup=somewhere/something >> >> Mhmm, that would be unintended functionality from my side - current >> approach is that every "Document" / "Datasource / DataSource" is named >> >>> >>> A name must identify exactly 1 data source, while a group identifies a >> list >>> of data sources. >> >> No, every "Document" / "Datasource / DataSource" has a name currently but >> uniqueness is not enforced. Only if you want to get a "Document" / >> "Datasource / DataSource" with it's exact name I checked for exactly one >> search hit and throw an exception. I try to provide a useful name even when >> the content is coming from an URL or STDIN (and I will probably add >> environment variables as "Document" / "Datasource / DataSource", e.g >> configuration in the cloud as JSON content passed as environment variable) >> >>> >>> Is that this idea, that the a data source can be part of a group, and >> then >>> is also possibly identifiable with a name comes from an use case? I mean, >>> it's possibly important somewhere, but if so, then it's strange that you >>> can put something into only a single group. If we need this kind of >> thing, >>> then perhaps you should be just allowed to associate the data source >> with a >>> list of names (kind of like tagging), and then when the template wants to >>> get something by name, it will tell there if it expects exactly one or a >>> list of data sources. Then you don't need to introduce two terms in the >>> documentation either (names and groups). Again, if we want this at all, >>> instead of just going with a data source that itself gives a list. (And >> if >>> not, how will we handle a data source that loads from a non-file source?) >> >> I actually thought of implementing tagging but considered a "group" >> sufficient. >> >> * If you don't define anything everything goes into the "default" group >> * For individual documents you can define a name and an optional group >> >> I think we have a different understanding what a "Document" / "Datasource >> / DataSource" should do >> >> * It is a dumb >> * It is lazy since data is only loaded on demand >> * There is no automagic like "oh, this is a JSON file, so let's go to the >> JSON tool and create a map readily accessible in the data model" >> >>> >>> Note that the current command line syntax doesn't work well with shell >>> wildcard expansion. Like this: >>> --datasource :someGroup=logs/*.log >>> will try to expand ":someGroup=logs/*.log", and because it finds nothing >>> (and because the rules of sh and the like is a mess), you will get the >>> parameter value as is, without * expanded. >> >> The joy of programming - I did not intend to use "name:group" together >> with wildcards :-) >> >>> >>> Also, I think the syntax with colon should be flipped, because on other >>> places foo:bar usually means that foo is the bigger unit (the container), >>> and bar is the smaller unit (the child). >> >> I Disagree here - I think using a name would be used more often. I added >> the "group" as an afterthought since some grouping could be useful >> >>> >>> On Sat, Feb 29, 2020 at 5:03 PM Siegfried Goeschl < >>> siegfried.goeschl@gmail.com> wrote: >>> >>>> Hi Daniel, >>>> >>>> I'm an enterprise developer - bad habits die hard :-) >>>> >>>> So I closed the following tickets and merged the branches >>>> >>>> 1) FREEMARKER-129 freemarker-generator: Merge "freemarker-cli" into >>>> "freemarker-generator" >>>> 2) FREEMARKER-134 freemarker-generator: Rename "Document" to >> "Datasource" >>>> 3) FREEMARKER-135 freemarker-generator-cli: Support user-supplied names >>>> for datasources >>>> >>>> Thanks in advance, >>>> >>>> Siegfried Goeschl >>>> >>>> >>>>> On 29.02.2020, at 12:19, Daniel Dekany >> wrote: >>>>> >>>>> Yeah, and of course, you can merge that branch. You can even work on >> the >>>>> master directly after all. >>>>> >>>>> On Sat, Feb 29, 2020 at 12:17 PM Daniel Dekany < >> daniel.dekany@gmail.com> >>>>> wrote: >>>>> >>>>>> But, I do recognize the cattle use case (several "faceless" files with >>>>>> common format/schema). Only, my idea is to push that complexity on the >>>> data >>>>>> source. The "data source" concept shields the rest of the application >>>> from >>>>>> the details of how the data is stored or retrieved. So, a data source >>>> might >>>>>> loads a bunch of log files from a directory, and present them as a >>>> single >>>>>> big table, or like a list of tables, etc. So I want to deal with the >>>> cattle >>>>>> use case, but the question is what part of the of architecture will >> deal >>>>>> with this complication, with other words, how do you box things. Why >> my >>>>>> initial bet is to stuff that complication into the "data source" >>>>>> implementation(s) is that data sources are inherently varied. Some >>>> returns >>>>>> a table-like thing, some have multiple named tables (worksheets in >>>> Excel), >>>>>> some returns tree of nodes (XML), etc. So then, some might returns a >>>>>> list-of-list-of log records, or just a single list of log-records (put >>>>>> together from daily log files). That way cattles don't add to >> conceptual >>>>>> complexity. Now, you might be aware of cases where the cattle concept >>>> must >>>>>> be more exposed than this, and the we can't box things like this. But >>>> this >>>>>> is what I tried to express. >>>>>> >>>>>> Regarding "output generators", and how that applies on the command >>>> line. I >>>>>> think it's important that the common core between Maven and >>>> command-line is >>>>>> as fat as possible. Ideally, they are just two syntax to set up the >> same >>>>>> thing. Mostly at least. So, if you specify a template file to the CLI >>>>>> application, in a way so that it causes it to process that template to >>>>>> generate a single output, then there you have just defined an "output >>>>>> generator" (even if it wasn't explicitly called like that in the >> command >>>>>> line). If you specify 3 csv files to the CLI application, in a way so >>>> that >>>>>> it causes it to generate 3 output files, then you have just defined 3 >>>>>> "output generators" there (there's at least one template specified >> there >>>>>> too, but that wasn't an "output generator" itself, it was just an >>>> attribute >>>>>> of the 3 output generators). If you specify 1 template, and 3 csv >>>> files, in >>>>>> a way so that it will yield 4 output files (1 for the template, 3 for >>>> the >>>>>> csv-s), then you have defined 4 output generators there. If you have a >>>> data >>>>>> source that loads a list of 3 entities (say, 3 csv files, so it's a >>>> list of >>>>>> tables then), and you have 2 templates, and you tell the CLI to >> execute >>>>>> each template for each item in said data source, then you have just >>>> defined >>>>>> 6 "output generators". >>>>>> >>>>>> On Fri, Feb 28, 2020 at 11:08 AM Siegfried Goeschl < >>>>>> siegfried.goeschl@gmail.com> wrote: >>>>>> >>>>>>> Hi Daniel, >>>>>>> >>>>>>> That all depends on your mental model and work you do, expectations, >>>>>>> experience :-) >>>>>>> >>>>>>> >>>>>>> __Document Handling__ >>>>>>> >>>>>>> *"But I think actually we have no good use case for list of documents >>>>>>> that's passed at once to a single template run, so, we can just >> ignore >>>>>>> that complication"* >>>>>>> >>>>>>> In my case that's not a complication but my daily business - I'm >>>>>>> regularly wading through access logs - yesterday probably a couple of >>>>>>> hundreds access logs across two staging sites to help tracking some >>>>>>> strange API gateway issues :-) >>>>>>> >>>>>>> My gut feeling is (borrowing from >>>>>>> >>>>>>> >>>> >> https://medium.com/@Joachim8675309/devops-concepts-pets-vs-cattle-2380b5aab313 >>>>>>> ) >>>>>>> >>>>>>> 1. You have a few lovely named documents / templates - `pets` >>>>>>> 2. You have tons of anonymous documents / templates to process - >>>>>>> `cattle` >>>>>>> 3. The "grey area" comes into play when mixing `pets & cattle` >>>>>>> >>>>>>> `freemarker-cli` was built with 2) in mind and I want to cover 1) >> since >>>>>>> it is equally important and common. >>>>>>> >>>>>>> >>>>>>> __Template And Document Processing Modes__ >>>>>>> >>>>>>> IMHO it is important to answer the following question : "How many >>>>>>> outputs do you get when rendering 2 template and 3 datasources? Two, >>>>>>> Three or Six?" >>>>>>> >>>>>>> Your answer is influenced by your mental model / experience >>>>>>> >>>>>>> * When wading through tons of CSV files, access logs, etc. the answer >>>> is >>>>>>> "2" >>>>>>> * When doing source code generation the obvious answer is "6" >>>>>>> * Can't image a use case which results in "3" but I'm pretty sure we >>>>>>> will encounter one >>>>>>> >>>>>>> __Template and document mode probably shouldn't exist__ >>>>>>> >>>>>>> That's hard for me to fully understand - I definitely lack your >>>> insights >>>>>>> & experience writing such tools :-) >>>>>>> >>>>>>> Defining the `Output Generator` is the underlying model for the Maven >>>>>>> plugin (and probably FMPP). >>>>>>> >>>>>>> I'm not sure if this applies for command lines at least not in the >> way >>>> I >>>>>>> use them (or would like to use them) >>>>>>> >>>>>>> >>>>>>> Thanks in advance, >>>>>>> >>>>>>> Siegfried Goeschl >>>>>>> >>>>>>> PS: Can/shall I merge the PR to bring in `freemarker-cli`? >>>>>>> >>>>>>> >>>>>>> On 28 Feb 2020, at 9:14, Daniel Dekany wrote: >>>>>>> >>>>>>>> Yeah, "data source" is surely a too popular name, but for reason. >>>>>>>> Anyone >>>>>>>> has other ideas? >>>>>>>> >>>>>>>> As of naming data sources and such. One thing I was wondering about >>>>>>>> back >>>>>>>> then is how to deal with list of documents given to a template, >> versus >>>>>>>> exactly 1 document given to a template. But I think actually we have >>>>>>>> no >>>>>>>> good use case for list of documents that's passed at once to a >> single >>>>>>>> template run, so, we can just ignore that complication. A document >> has >>>>>>>> a >>>>>>>> name, and that's always just a single document, not a collection, as >>>>>>>> far as >>>>>>>> the template is concerned. (We can have multiple documents per run, >>>>>>>> but >>>>>>>> those normally yield separate output generators, so it's still only >>>>>>>> one >>>>>>>> document per template.) However, we can have data source types >>>>>>>> (document >>>>>>>> types with old terminology) that collect together multiple data >> files. >>>>>>>> So >>>>>>>> then that complexity is encapsulated into the data source type, and >>>>>>>> doesn't >>>>>>>> complicate the overall architecture. That's another case when a data >>>>>>>> source >>>>>>>> is not just a file. Like maybe there's a data source type that loads >>>>>>>> all >>>>>>>> the CSV-s from a directory, into a single big table (I had such >> case), >>>>>>>> or >>>>>>>> even into a list of tables. Or, as I mentioned already, a data >> source >>>>>>>> is >>>>>>>> maybe an SQL query on a JDBC data source (and we got the first term >>>>>>>> clash... JDBC also call them data sources). >>>>>>>> >>>>>>>> Template and document mode probably shouldn't exist from user >>>>>>>> perspective >>>>>>>> either, at least not as a global option that must apply to >> everything >>>>>>>> in a >>>>>>>> run. They could just give the files that define the "output >>>>>>>> generators", >>>>>>>> and some of them will be templates, some of them are data files, in >>>>>>>> which >>>>>>>> case a template need to be associated with them (and there can be a >>>>>>>> couple >>>>>>>> of ways of doing that). And then again, there are the cases where >> you >>>>>>>> want >>>>>>>> to create one output generator per entity from some data source. >>>>>>>> >>>>>>>> On Fri, Feb 28, 2020 at 8:23 AM Siegfried Goeschl < >>>>>>>> siegfried.goeschl@gmail.com> wrote: >>>>>>>> >>>>>>>>> Hi Daniel, >>>>>>>>> >>>>>>>>> See my comments below - and thanks for your patience and input :-) >>>>>>>>> >>>>>>>>> *Renaming Document To DataSource* >>>>>>>>> >>>>>>>>> Yes, makes sense. I tried to avoid since I'm using javax.activation >>>>>>>>> and >>>>>>>>> its DataSource. >>>>>>>>> >>>>>>>>> *Template And Document Mode* >>>>>>>>> >>>>>>>>> Agreed - I think it is a valuable abstraction for the user but it >> is >>>>>>>>> not >>>>>>>>> an implementation concept :-) >>>>>>>>> >>>>>>>>> *Document Without Symbolic Names* >>>>>>>>> >>>>>>>>> Also agreed and it is going to change but I have not settled my >> mind >>>>>>>>> yet >>>>>>>>> what exactly to implement. >>>>>>>>> >>>>>>>>> Thanks in advance, >>>>>>>>> >>>>>>>>> Siegfried Goeschl >>>>>>>>> >>>>>>>>> On 28 Feb 2020, at 1:05, Daniel Dekany wrote: >>>>>>>>> >>>>>>>>> A few quick thoughts on that: >>>>>>>>> >>>>>>>>> - We should replace the "document" term with something more >> speaking. >>>>>>>>> It >>>>>>>>> doesn't tell that it's some kind of input. Also, most of these >> inputs >>>>>>>>> aren't something that people typically call documents. Like a csv >>>>>>>>> file, or >>>>>>>>> a database table, which is not even a file (OK we don't support >> such >>>>>>>>> thing >>>>>>>>> at the moment). I think, maybe "data source" is a safe enough term. >>>>>>>>> (It >>>>>>>>> also rhymes with data model.) >>>>>>>>> - You have separate "template" and "document" "mode", that applies >> to >>>>>>>>> a >>>>>>>>> whole run. I think such specialization won't be helpful. We could >>>>>>>>> just say, >>>>>>>>> on the conceptual level at lest, that we need a set of "outputs >>>>>>>>> generators". An output generator is an object (in the API) that >>>>>>>>> specifies a >>>>>>>>> template, a data-model (where the data-model is possibly populated >>>>>>>>> with >>>>>>>>> "documents"), and an output "sink" (a file path, or stdout), and >> can >>>>>>>>> generate the output itself. A practical way of defining the output >>>>>>>>> generators in a CLI application is via a bunch of files, each >>>>>>>>> defining an >>>>>>>>> output generator. Some of those files is maybe a template (that you >>>>>>>>> can >>>>>>>>> even detect from the file extension), or a data file that we >>>>>>>>> currently call >>>>>>>>> a "document". They could freely mix inside the same run. I have >> also >>>>>>>>> met >>>>>>>>> use case when you have a single table (single "document"), and each >>>>>>>>> record >>>>>>>>> in it yields an output file. That can also be described in some >> file >>>>>>>>> format, or really in any other way, like directly in command line >>>>>>>>> argument, >>>>>>>>> via API, etc. >>>>>>>>> - You have multiple documents without associated symbolical name in >>>>>>>>> some >>>>>>>>> examples. Templates can't identify those then in a well >> maintainable >>>>>>>>> way. >>>>>>>>> The actual file name is often not a good identifier, can change >> over >>>>>>>>> time, >>>>>>>>> and you might don't even have good control over it, like you >> already >>>>>>>>> receive it as a parameter from somewhere else, or someone >>>>>>>>> moves/renames >>>>>>>>> that files that you need to read. Index is also not very good, but >> I >>>>>>>>> have >>>>>>>>> written about that earlier. >>>>>>>>> >>>>>>>>> >>>>>>>>> On Wed, Feb 26, 2020 at 9:33 AM Siegfried Goeschl < >>>>>>>>> siegfried.goeschl@gmail.com> wrote: >>>>>>>>> >>>>>>>>> Hi folks, >>>>>>>>> >>>>>>>>> still wrapping my side around but assembled some thoughts here - >>>>>>>>> https://gist.github.com/sgoeschl/b09b343a761b31a6c790d882167ff449 >>>>>>>>> >>>>>>>>> Thanks in advance, >>>>>>>>> >>>>>>>>> Siegfried Goeschl >>>>>>>>> >>>>>>>>> >>>>>>>>> On 23 Feb 2020, at 23:14, Daniel Dekany >> wrote: >>>>>>>>> >>>>>>>>> What you are describing is more like the angle that FMPP took >>>>>>>>> initially, >>>>>>>>> where templates drive things, they generate the output for >> themselves >>>>>>>>> >>>>>>>>> (even >>>>>>>>> >>>>>>>>> multiple output files if they wish). By default output files name >>>>>>>>> (and >>>>>>>>> relative path) is deduced from template name. There was also a >> global >>>>>>>>> data-model, built in a configuration file (or equally, built via >>>>>>>>> command >>>>>>>>> line arguments, or both mixed), from which templates get whatever >>>>>>>>> data >>>>>>>>> >>>>>>>>> they >>>>>>>>> >>>>>>>>> are interested in. Take a look at the figures here: >>>>>>>>> http://fmpp.sourceforge.net/qtour.html. Later, this concept was >>>>>>>>> >>>>>>>>> generalized >>>>>>>>> >>>>>>>>> a bit more, because you could add XML files at the same place where >>>>>>>>> you >>>>>>>>> have the templates, and then you could associate transform >> templates >>>>>>>>> to >>>>>>>>> >>>>>>>>> the >>>>>>>>> >>>>>>>>> XML files (based on path pattern and/or the XML document element). >>>>>>>>> Now >>>>>>>>> that's like what freemarker-generator had initially (data files >> drive >>>>>>>>> output, and the template is there to transform it). >>>>>>>>> >>>>>>>>> So I think the generic mental model would like this: >>>>>>>>> >>>>>>>>> 1. You got files that drive the process, let's call them *generator >>>>>>>>> files* for now. Usually, each generator file yields an output file >>>>>>>>> (but >>>>>>>>> maybe even multiple output files, as you might saw in the last >>>>>>>>> figure). >>>>>>>>> These generator files can be of many types, like XML, JSON, XLSX >> (as >>>>>>>>> >>>>>>>>> in the >>>>>>>>> >>>>>>>>> original freemarker-generator), and even templates (as is the norm >> in >>>>>>>>> FMPP). If the file is not a template, then you got a set of >>>>>>>>> transformer >>>>>>>>> templates (-t CLI option) in a separate directory, which can be >>>>>>>>> >>>>>>>>> associated >>>>>>>>> >>>>>>>>> with the generator files base on name patterns, and even based on >>>>>>>>> >>>>>>>>> content >>>>>>>>> >>>>>>>>> (schema usually). If the generator file is a template (so that's a >>>>>>>>> positional @Parameter CLI argument that happens to be an *.ftl, and >>>>>>>>> is >>>>>>>>> >>>>>>>>> not >>>>>>>>> >>>>>>>>> a template file specified after the "-t" option), then you just >>>>>>>>> Template.process(...) it, and it prints what the output will be. >>>>>>>>> 2. You also have a set of variables, the global data-model, that >>>>>>>>> contains commonly useful stuff, like what you now call parameters >>>>>>>>> (CLI >>>>>>>>> -Pname=value), but also maybe data loaded from JSON, XML, etc.. >> Those >>>>>>>>> >>>>>>>>> data >>>>>>>>> >>>>>>>>> files aren't "generator files". Templates just use them if they >> need >>>>>>>>> >>>>>>>>> them. >>>>>>>>> >>>>>>>>> An important thing here is to reuse the same mechanism to read and >>>>>>>>> >>>>>>>>> parse >>>>>>>>> >>>>>>>>> those data files, which was used in templates when transforming >>>>>>>>> >>>>>>>>> generator >>>>>>>>> >>>>>>>>> files. So we need a common format for specifying how to load data >>>>>>>>> >>>>>>>>> files. >>>>>>>>> >>>>>>>>> That's maybe just FTL that #assigns to the variables, or maybe more >>>>>>>>> declarative format. >>>>>>>>> >>>>>>>>> What I have described in the original post here was a less generic >>>>>>>>> form >>>>>>>>> >>>>>>>>> of >>>>>>>>> >>>>>>>>> this, as I tried to be true with the original approach. I though >> the >>>>>>>>> proposal will be drastic enough as it is... :) There, the "main" >>>>>>>>> document >>>>>>>>> is the "generator file" from point 1, the "-t" template is the >>>>>>>>> transform >>>>>>>>> template for the "main" document, and the other named documents >>>>>>>>> ("users", >>>>>>>>> "groups") is a poor man's shared data-model from point 2 (together >>>>>>>>> with >>>>>>>>> with -PName=value). >>>>>>>>> >>>>>>>>> There's further somewhat confusing thing to get right with the >>>>>>>>> list-of-documents (`DocuentList`, `NamedDocumentLists`) thing >> though. >>>>>>>>> In >>>>>>>>> the model above, as per point 1, if you list multiple data files, >>>>>>>>> each >>>>>>>>> >>>>>>>>> will >>>>>>>>> >>>>>>>>> generate a separate output file. So, if you need take in a list of >>>>>>>>> files >>>>>>>>> >>>>>>>>> to >>>>>>>>> >>>>>>>>> transform it to a single output file (or at least with a single >>>>>>>>> transform >>>>>>>>> template execution), then you have to be explicit about that, as >>>>>>>>> that's >>>>>>>>> >>>>>>>>> not >>>>>>>>> >>>>>>>>> the default behavior anymore. But it's still absolutely possible. >>>>>>>>> Imagine >>>>>>>>> it as a "list of XLSX-es" is itself like a file format. You need >> some >>>>>>>>> CLI >>>>>>>>> (and Maven config, etc.) syntax to express that, but that shouldn't >>>>>>>>> be a >>>>>>>>> big deal. >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> On Sun, Feb 23, 2020 at 9:43 PM Siegfried Goeschl < >>>>>>>>> siegfried.goeschl@gmail.com> wrote: >>>>>>>>> >>>>>>>>> Hi Daniel, >>>>>>>>> >>>>>>>>> Good timing - I was looking at a similar problem from different >> angle >>>>>>>>> yesterday (see below) >>>>>>>>> >>>>>>>>> Don't have enough time to answer your email in detail now - will do >>>>>>>>> that >>>>>>>>> tomorrow evening >>>>>>>>> >>>>>>>>> Thanks in advance, >>>>>>>>> >>>>>>>>> Siegfried Goeschl >>>>>>>>> >>>>>>>>> >>>>>>>>> ===. START >>>>>>>>> # FreeMarker CLI Improvement >>>>>>>>> ## Support Of Multiple Template Files >>>>>>>>> Currently we support the following combinations >>>>>>>>> >>>>>>>>> * Single template and no data files >>>>>>>>> * Single template and one or more data files >>>>>>>>> >>>>>>>>> But we can not support the following use case which is quite >> typical >>>>>>>>> in >>>>>>>>> the cloud >>>>>>>>> >>>>>>>>> __Convert multiple templates with a single data file, e.g copying a >>>>>>>>> directory of configuration files using a JSON configuration file__ >>>>>>>>> >>>>>>>>> ## Implementation notes >>>>>>>>> * When we copy a directory we can remove the `ftl`extension on the >>>>>>>>> fly >>>>>>>>> * We might need an `exclude` filter for the copy operation >>>>>>>>> * Initially resolve to a list of template files and process one >> after >>>>>>>>> another >>>>>>>>> * Need to calculate the output file location and extension >>>>>>>>> * We need to rename the existing command line parameters (see >> below) >>>>>>>>> * Do we need multiple include and exclude filter? >>>>>>>>> * Do we need file versus directory filters? >>>>>>>>> >>>>>>>>> ### Command Line Options >>>>>>>>> ``` >>>>>>>>> --input-encoding : Encoding of the documents >>>>>>>>> --output-encoding : Encoding of the rendered template >>>>>>>>> --template-encoding : Encoding of the template >>>>>>>>> --output : Output file or directory >>>>>>>>> --include-document : Include pattern for documents >>>>>>>>> --exclude-document : Exclude pattern for documents >>>>>>>>> --include-template: Include pattern for templates >>>>>>>>> --exclude-template : Exclude pattern for templates >>>>>>>>> ``` >>>>>>>>> >>>>>>>>> ### Command Line Examples >>>>>>>>> ```text >>>>>>>>> # Copy all FTL templates found in "ext/config" to the "/config" >>>>>>>>> >>>>>>>>> directory >>>>>>>>> >>>>>>>>> using the data from "config.json" >>>>>>>>> >>>>>>>>> freemarker-cli -t ./ext/config --include-template *.ftl --o /config >>>>>>>>> >>>>>>>>> config.json >>>>>>>>> >>>>>>>>> freemarker-cli --template ./ext/config --include-template *.ftl >>>>>>>>> >>>>>>>>> --output >>>>>>>>> >>>>>>>>> /config config.json >>>>>>>>> >>>>>>>>> # Bascically the same using a named document "configuration" >>>>>>>>> # It might make sense to expose "conf" directly in the FreeMarker >>>>>>>>> data >>>>>>>>> model >>>>>>>>> # It might make sens to allow URIs for loading documents >>>>>>>>> >>>>>>>>> freemarker-cli -t ./ext/config/*.ftl -o /config -d >>>>>>>>> >>>>>>>>> configuration=config.json >>>>>>>>> >>>>>>>>> freemarker-cli --template ./ext/config --include-template *.ftl >>>>>>>>> >>>>>>>>> --output >>>>>>>>> >>>>>>>>> /config --document configuration=config.json >>>>>>>>> >>>>>>>>> freemarker-cli --template ./ext/config --include-template *.ftl >>>>>>>>> >>>>>>>>> --output >>>>>>>>> >>>>>>>>> /config --document configuration=file:///config.json >>>>>>>>> >>>>>>>>> # Bascically the same using an environment variable as named >> document >>>>>>>>> >>>>>>>>> freemarker-cli -t ./ext/config --include-template *.ftl -o /config >> -d >>>>>>>>> >>>>>>>>> configuration=env:///CONFIGURATION >>>>>>>>> >>>>>>>>> freemarker-cli --template ./ext/config --include-template *.ftl >>>>>>>>> >>>>>>>>> --output >>>>>>>>> >>>>>>>>> /config --document configuration=env:///CONFIGURATION >>>>>>>>> ``` >>>>>>>>> === END >>>>>>>>> >>>>>>>>> On 23.02.2020, at 16:37, Daniel Dekany wrote: >>>>>>>>> >>>>>>>>> Input documents is a fundamental concept in freemarker-generator, >> so >>>>>>>>> we >>>>>>>>> should think about that more, and probably refine/rework how it's >>>>>>>>> done. >>>>>>>>> >>>>>>>>> Currently it works like this, with CLI at least. >>>>>>>>> >>>>>>>>> freemarker-cli >>>>>>>>> -t access-report.ftl >>>>>>>>> somewhere/foo-access-log.csv >>>>>>>>> >>>>>>>>> Then in access-report.ftl you have to do something like this: >>>>>>>>> >>>>>>>>> <#assign doc = Documents.get(0)> >>>>>>>>> ... process doc here >>>>>>>>> >>>>>>>>> (The more idiomatic Documents[0] won't work. Actually, that lead >> to a >>>>>>>>> >>>>>>>>> funny >>>>>>>>> >>>>>>>>> chain of coincidences: It returned the string "D", then >>>>>>>>> >>>>>>>>> CSVTool.parse(...) >>>>>>>>> >>>>>>>>> happily parsed that to a table with the single column "D", and 0 >>>>>>>>> rows, >>>>>>>>> >>>>>>>>> and >>>>>>>>> >>>>>>>>> as there were 0 rows, the template didn't run into an error because >>>>>>>>> row.myExpectedColumn refers to a missing column either, so the >>>>>>>>> process >>>>>>>>> finished with success. (: Pretty unlucky for sure. The root was >>>>>>>>> unintentionally breaking a FreeMarker idiom though; eventually we >>>>>>>>> will >>>>>>>>> >>>>>>>>> have >>>>>>>>> >>>>>>>>> to work on those too, but, different topic.) >>>>>>>>> >>>>>>>>> However, actually multiple input documents can be passed in: >>>>>>>>> >>>>>>>>> freemarker-cli >>>>>>>>> -t access-report.ftl >>>>>>>>> somewhere/foo-access-log.csv >>>>>>>>> somewhere/bar-access-log.csv >>>>>>>>> >>>>>>>>> Above template will still work, though then you ignored all but the >>>>>>>>> >>>>>>>>> first >>>>>>>>> >>>>>>>>> document. So if you expect any number of input documents, you >>>>>>>>> probably >>>>>>>>> >>>>>>>>> will >>>>>>>>> >>>>>>>>> have to do this: >>>>>>>>> >>>>>>>>> <#list Documents.list as doc> >>>>>>>>> ... process doc here >>>>>>>>> >>>>>>>>> >>>>>>>>> (The more idiomatic <#list Documents as doc> won't work; but again, >>>>>>>>> >>>>>>>>> those >>>>>>>>> >>>>>>>>> we will work out in a different thread.) >>>>>>>>> >>>>>>>>> >>>>>>>>> So, what would be better, in my opinion. I start out from what I >>>>>>>>> think >>>>>>>>> >>>>>>>>> are >>>>>>>>> >>>>>>>>> the common uses cases, in decreasing order of frequency. Goal is to >>>>>>>>> >>>>>>>>> make >>>>>>>>> >>>>>>>>> those less error prone for the users, and simpler to express. >>>>>>>>> >>>>>>>>> USE CASE 1 >>>>>>>>> >>>>>>>>> You have exactly 1 input documents, which is therefore simply "the" >>>>>>>>> document in the mind of the user. This is probably the typical use >>>>>>>>> >>>>>>>>> case, >>>>>>>>> >>>>>>>>> but at least the use case users typically start out from when >>>>>>>>> starting >>>>>>>>> >>>>>>>>> the >>>>>>>>> >>>>>>>>> work. >>>>>>>>> >>>>>>>>> freemarker-cli >>>>>>>>> -t access-report.ftl >>>>>>>>> somewhere/foo-access-log.csv >>>>>>>>> >>>>>>>>> Then `Documents.get(0)` is not very fitting. Most importantly it's >>>>>>>>> >>>>>>>>> error >>>>>>>>> >>>>>>>>> prone, because if the user passed in more than 1 documents (can >> even >>>>>>>>> >>>>>>>>> happen >>>>>>>>> >>>>>>>>> totally accidentally, like if the user was lazy and used a wildcard >>>>>>>>> >>>>>>>>> that >>>>>>>>> >>>>>>>>> the shell exploded), the template will silently ignore the rest of >>>>>>>>> the >>>>>>>>> documents, and the singe document processed will be practically >>>>>>>>> picked >>>>>>>>> randomly. The user might won't notice that and submits a bad report >>>>>>>>> or >>>>>>>>> >>>>>>>>> such. >>>>>>>>> >>>>>>>>> I think that in this use case the document should be simply >> referred >>>>>>>>> as >>>>>>>>> `Document` in the template. When you have multiple documents there, >>>>>>>>> referring to `Document` should be an error, saying that the >> template >>>>>>>>> >>>>>>>>> was >>>>>>>>> >>>>>>>>> made to process a single document only. >>>>>>>>> >>>>>>>>> >>>>>>>>> USE CASE 2 >>>>>>>>> >>>>>>>>> You have multiple input documents, but each has different role >>>>>>>>> >>>>>>>>> (different >>>>>>>>> >>>>>>>>> schema, maybe different file type). Like, you pass in users.csv and >>>>>>>>> groups.csv. Each has difference schema, and so you want to access >>>>>>>>> them >>>>>>>>> differently, but in the same template. >>>>>>>>> >>>>>>>>> freemarker-cli >>>>>>>>> [...] >>>>>>>>> --named-document users somewhere/foo-users.csv >>>>>>>>> --named-document groups somewhere/foo-groups.csv >>>>>>>>> >>>>>>>>> Then in the template you could refer to them as: >>>>>>>>> >>>>>>>>> `NamedDocuments.users`, >>>>>>>>> >>>>>>>>> and `NamedDocuments.groups`. >>>>>>>>> >>>>>>>>> Use Case 1, and 2 can be unified into a coherent concept, where >>>>>>>>> >>>>>>>>> `Document` >>>>>>>>> >>>>>>>>> is just a shorthand for `NamedDocuments.main`. It's called "main" >>>>>>>>> >>>>>>>>> because >>>>>>>>> >>>>>>>>> that's "the" document the template is about, but then you have to >>>>>>>>> added >>>>>>>>> some helper documents, with symbolic names representing their role. >>>>>>>>> >>>>>>>>> freemarker-cli >>>>>>>>> -t access-report.ftl >>>>>>>>> --document-name=main somewhere/foo-access-log.csv >>>>>>>>> --document-name=users somewhere/foo-users.csv >>>>>>>>> --document-name=groups somewhere/foo-groups.csv >>>>>>>>> >>>>>>>>> Here, `Document` still works in the template, and it refers to >>>>>>>>> `somewhere/foo-access-log.csv`. (While omitting >> --document-name=main >>>>>>>>> >>>>>>>>> above >>>>>>>>> >>>>>>>>> would be cleaner, I couldn't figure out how to do that with >> Picocli. >>>>>>>>> Anyway, for now the point is the concept, which is not specific to >>>>>>>>> >>>>>>>>> CLI.) >>>>>>>>> >>>>>>>>> USE CASE 3 >>>>>>>>> >>>>>>>>> Here you have several of the same kind of documents. That has a >> more >>>>>>>>> generic sub-use-case, when you have explicitly named documents >> (like >>>>>>>>> "users" above), and for some you expect multiple input files. >>>>>>>>> >>>>>>>>> freemarker-cli >>>>>>>>> -t access-report.ftl >>>>>>>>> --document-name=main somewhere/foo-access-log.csv >>>>>>>>> somewhere/bar-access-log.csv >>>>>>>>> --document-name=users somewhere/foo-users.csv >>>>>>>>> somewhere/bar-users.csv >>>>>>>>> --document-name=groups somewhere/global-groups.csv >>>>>>>>> >>>>>>>>> The template must to be written with this use case in mind, as now >> it >>>>>>>>> >>>>>>>>> has >>>>>>>>> >>>>>>>>> #list some of the documents. (I think in practice you hardly ever >>>>>>>>> want >>>>>>>>> >>>>>>>>> to >>>>>>>>> >>>>>>>>> get a document by hard coded index. Either you don't know how many >>>>>>>>> documents you have, so you can't use hard coded indexes, or you do, >>>>>>>>> and >>>>>>>>> each index has a specific meaning, but then you should name the >>>>>>>>> >>>>>>>>> documents >>>>>>>>> >>>>>>>>> instead, as using indexes is error prone, and hard to read.) >>>>>>>>> Accessing that list of documents in the template, maybe could be >> done >>>>>>>>> >>>>>>>>> like >>>>>>>>> >>>>>>>>> this: >>>>>>>>> - For the "main" documents: `DocumentList` >>>>>>>>> - For explicitly named documents, like "users": >>>>>>>>> >>>>>>>>> `NamedDocumentLists.users` >>>>>>>>> >>>>>>>>> SUMMING UP >>>>>>>>> >>>>>>>>> To unify all 3 use cases into a coherent concept: >>>>>>>>> - `NamedDocumentLists.` is the most generic form, and while >> you >>>>>>>>> >>>>>>>>> can >>>>>>>>> >>>>>>>>> achieve everything with it, using it requires your template to >> handle >>>>>>>>> >>>>>>>>> the >>>>>>>>> >>>>>>>>> most generic case too. So, I think it would be rarely used. >>>>>>>>> - `DocumentList` is just a shorthand for `NamedDocumentLists.main`. >>>>>>>>> >>>>>>>>> It's >>>>>>>>> >>>>>>>>> used if you only have one kind of documents (single format and >>>>>>>>> schema), >>>>>>>>> >>>>>>>>> but >>>>>>>>> >>>>>>>>> potentially multiple of them. >>>>>>>>> - `NamedDocuments.` expresses that you expect exactly 1 >>>>>>>>> document >>>>>>>>> >>>>>>>>> of >>>>>>>>> >>>>>>>>> the given name. >>>>>>>>> - `Document` is just a shorthand for `NamedDocuments.main`. This is >>>>>>>>> for >>>>>>>>> >>>>>>>>> the >>>>>>>>> >>>>>>>>> most natural/frequent use case. >>>>>>>>> >>>>>>>>> That's 4 possible ways of accessing your documents, which is a >>>>>>>>> >>>>>>>>> trade-off >>>>>>>>> >>>>>>>>> for the sake of these: >>>>>>>>> - Catching CLI (or Maven, etc.) input where the template output >>>>>>>>> likely >>>>>>>>> >>>>>>>>> will >>>>>>>>> >>>>>>>>> be wrong. That's only possible if the user can communicate its >> intent >>>>>>>>> >>>>>>>>> in >>>>>>>>> >>>>>>>>> the template. >>>>>>>>> - Users don't need to deal with concepts that are irrelevant in >> their >>>>>>>>> concrete use case. Just start with the trivial, `Document`, and >> later >>>>>>>>> >>>>>>>>> if >>>>>>>>> >>>>>>>>> the need arises, generalize to named documents, document lists, or >>>>>>>>> >>>>>>>>> both. >>>>>>>>> >>>>>>>>> What do guys think? >>>>>>>>> >>>>>>>>> >>>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> Best regards, >>>>>> Daniel Dekany >>>>>> >>>>> >>>>> >>>>> -- >>>>> Best regards, >>>>> Daniel Dekany >>>> >>>> >>> >>> -- >>> Best regards, >>> Daniel Dekany >> >> > > -- > Best regards, > Daniel Dekany