uima-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Marshall Schor <...@schor.com>
Subject Re: progress in Nexus / Pom updates
Date Thu, 06 May 2010 21:29:14 GMT
uima-as now converted to new build (in branch).

Now on to the sandbox. -Marshall

On 5/5/2010 3:34 PM, Marshall Schor wrote:
> I think that the uimaj part is now done (at least I don't know of more
> bugs).
>
> The latest round was to get the distr and eclipse update site stuff working.
>
> The felix-bundle builds were altered to run the goal "manifest", which
> only produces the manifest.  The packaging type was changed to "jar",
> and the normal Apache jar release stuff which copies in the required
> license and notice headers, now works.
>
> I also adopted the osgiVersion approach to naming the Eclipse plugins,
> while keeping the maven versions using the standard Maven conventions. 
> We'll know if that's the right approach eventually, when we try out the
> release plugin :-).
>
> Most of the inter-project dependencies are now eliminated - this means
> you should be able to check out one project, and do mvn install and it
> should build :-).
>
> Exceptions are the aggregating projects - because their <module>
> elements refer to the modules by relative path, and the -distr projects,
> because they find the things being distributed using relative paths.
>
> Next is uima-as - I'll work on a branch, again, for this.
>
> -Marshall
>
> On 4/30/2010 2:13 PM, Marshall Schor wrote:
>   
>> The next chapter in this saga... (apologies for the long post). If you
>> won't be writing docbooks or your docbooks won't be cross-referenced to
>> any other docbooks in the uima bookshelf, then you can skip reading
>> this, unless you want to be entertained :-) .
>>
>> This is all about olinks. Olinks allow cross-referencing and
>> hyperlinking among documents, using extra saved information about the
>> target document being linked to (as contrasted with plain href style
>> links, which only have the link url). For instance, in PDFs, there's
>> extra info enabling the referring doc to say "page 123 in document abc".
>> For PDF and HTML, it allows the referring text to include a hyperlink
>> with the text begin the target document's title, and maybe number (if it
>> has numbered items - such as our chapter / section numbers in the main
>> UIMA documentation). So you can get a link that looks like this:
>>
>> see Section 1.5.1, “Annotator Methods”
>> <http://uima.apache.org/downloads/releaseDocs/2.3.0-incubating/docs/html/tutorials_and_users_guides/tutorials_and_users_guides.html#ugr.tug.aae.contract_for_annotator_methods>
>> for ...
>>
>> where the 1.5.1 was generated by docbook processing, and the "Annotator
>> Methods" was the title of that section.
>>
>> To make olinks work, each time a docbook is processed, an extra database
>> of info for that docbook is created, containing just the info needed for
>> this. This database, together with some other data about how the
>> multiple interlinking docbooks are arranged, is needed when processing a
>> docbook, to resolve these things.
>>
>> So - where to store this information? We previously had stored this in
>> SVN. This was unsatisfactory because it caused interdependencies among
>> checked-out projects, where one project (having these databases) had to
>> be checked out into a specific, fixed directory layout with respect to
>> other (using) projects. The Maven way to get around this is to put these
>> things into the maven repository.
>>
>> Since there's one database per docbook, I though it could best be stored
>> as an additional maven attached file for the project. Then you could
>> "depend" on the project, and download that artifact. This would place a
>> burden on docbook users - they would need to specify additional POM info
>> to get these things downloaded.
>>
>> So I tried that, and it worked fine for individual book processing. Then
>> I tried using an aggregator POM specifying the 4 main UIMA docbooks (now
>> moved to separate projects), and since these all refer (that is, olink )
>> to each other, this violated a maven principle of no circular dependency
>> relationships. These really are circular relationships, but they resolve
>> when you run docbook multiple times :-) .
>>
>> To fix this, I went to a scheme where there is just one additional
>> project (I'm calling it uima-docbook-olink-dbs) that will have just one
>> attached artifact, a zip file of all the needed docbook olink data for
>> all the docbooks in UIMA. (This could be a large set - besides the 4
>> main books, we have one for uima-as, and there's other books for many of
>> the sandbox projects, and one for some special tooling - like the
>> PearPackagingMavenPlugin).
>>
>> This project is at the level 1-SNAPSHOT, and I think it will stay there.
>> This is because it's always being updated in part by each docbook
>> processing run, and we currently don't have a concept of needing any but
>> the latest versions of things. Note that releases will capture the
>> result of using the then current (at the time of the build) version of
>> these databases. I could imagine some fancy use cases that might not be
>> well supported - such as working on several versions at once, but I'll
>> let those use cases materialize first before trying to address them :-)
>> . Here's how this set of olink data will be used.
>>
>> 1) new users start by checking out and running a build which invokes our
>> docbook processing. This uses the dependency:unpack goal to find this
>> artifact in the maven snapshot repo (in the Apache infrastructure's
>> version of Nexus), where it lives - it will have the latest "deployed"
>> (that is, uploaded) set of olink data, for all docbooks that are using
>> olink.
>>
>> 2) The dependency:unpack will first download this zip to the local repo
>> if it isn't already there. If it is already there, it will check to see
>> if the snapshot in the repository is newer, and if so, will download
>> that. It then unpacks that to a spot where all projects being built on
>> this workstation for this user can find it.
>>
>> 3) The rest of the docbook build uses this olink data, and also, as a
>> side-effect of running on a particular document, adds or updates the
>> existing olink data for the current document being processed.
>>
>> In thinking about where to store the unzipped form of the olink
>> databases, I hit upon the idea of storing it in the local .m2 repo, in
>> the uima-docbook-olink-dbs project, but as an additional directory
>> (called docbook-olink) which is *not* attached - so it won't be uploaded.
>>
>> This has a couple of nice side effects - once installed and unzipped,
>> unless someone else "deploys" an update of this data to the snapshot
>> repo, the download and unpack steps can be somewhat skipped. And,
>> whenever someone doing some docbook builds is happy with their results,
>> they've as a side effect been creating additional or updated olink info
>> for one or more books, and to make these available to others, they just
>> need to "deploy" these back to the snapshot repository. (Note that that
>> deploy step runs a POM which first gets any updates made by someone else
>> for other docbooks, that might have happened in the meanwhile, so what's
>> uploaded is the latest version of all docbooks (except for collisions
>> where two have checked out the and are processing the very same docbook
>> - in which case the last one wins...).
>>
>> Testing revealed that this seems to work, with one exception - when I
>> ran the deploy from within m2eclipse, it nicely uploaded the POM , but
>> gave a message about Failed to Upload [400]. After much googling that
>> didn't identify the issue, I tried this from the command line, and it
>> worked. A few more tries isolated this to an apparent issue in the
>> "built-in" version of maven that m2eclipse 0.0.10 uses, which is a
>> version of maven 3.0-alpha-6. I found that maven 2.2.1 and 3.0-beta-1
>> both work, even when run from m2eclipse. So if you are using m2eclipse,
>> I recommend you
>> 1) use the maven preferences to install a link to 3.0-beta-1, and set it
>> as the default
>> 2) if previous use of m2eclipse created any run configurations, you have
>> to manually update each one of those - there's a menu pull-down at the
>> bottom of the main run configuration page for each one, labeled "Maven
>> Runtime", where you can switch this.
>>
>> Next steps will be verifying that the overwrite-if-newer is working for
>> using dependency:unpack for individual unpacked files, then I'll
>> probably go and check a bunch of this in :-)
>>
>> I've started writing a new web page for our site describing how to do
>> docbooks, the uima bookshelf concept, etc., which I'll need to update...
>>
>> -Marshall
>>
>>
>>
>>
>> On 4/26/2010 11:17 PM, Marshall Schor wrote:
>>   
>>     
>>> Docbook story:  Most of the afternoon was spent tracking down a bug,
>>> which turned out to be formerly hidden by Maven 2.2.1, but which Maven
>>> 3.0 exposed (I'm trying Maven 3.0 beta 1 - it seems to run faster/better
>>> :-) ).  The symptom was a report that the "catalog file" could not be found.
>>>
>>> The bug is that if you ask in a plugin to load a resource at the top
>>> level, using the string "/xxx.xml" for instance, it fails.  This is
>>> because that leading "/" makes the Java classloader.getResource(aString)
>>> fail.  To fix, just drop the leading "/". 
>>>
>>> I've reported this along with the fix to the docbkx project - they use
>>> this to load the "catalog.xml" file that comes with docbook 4.x and 5.0
>>> distributions. 
>>>
>>> So, now, after all that, I'm starting to get docbook building again,
>>> this time with fully factored parent plugins.  The olink stuff I'm going
>>> to try to do by using maven "attachments", and going for a strategy of
>>> only 1 docbook per project (I've split the uima-docbooks project, which
>>> held 4 docbooks, into 4 projects, each holding one docbook). 
>>>
>>> This aligns the approach with the way Sandbox projects are doing
>>> documentation - they have the project produce the 1 main artifact (a
>>> jar), and now it will also produce (when I'm fininshed :-) ) an
>>> additional "attached" artifact - the olink data for the pdf and html
>>> versions. 
>>>
>>> This will allow other docbooks which want to hyperlink to a reference in
>>> the first docbook to be able to do so. (OLinking is like normal
>>> hyperlinking, except that information about the target is known, so for
>>> PDFs, the link includes the "book" + page number in the book, and it
>>> includes locating the other book via a relative directory path.).
>>>
>>> It looks like I'll be able to put all the gorp (that's a technical term
>>> :-) ) for docbook formatting, like boiler plate, title pages, things to
>>> enable xInclude, fonts, css stuff,
>>> customization xsl layers, etc. into a shared "resource bundle" that
>>> projects will be able to fetch (from their local .m2 repository, or from
>>> the big repo in the sky).
>>>
>>> -Marshall
>>>
>>> On 4/22/2010 4:03 PM, Marshall Schor wrote:
>>>   
>>>     
>>>       
>>>> progress -
>>>>
>>>> the uimaj/branches/mavenAlign branch should now build all of the Java
>>>> components.  There are 2 new aggregate (only) POMs for this, to build in
>>>> batch, called aggregate-pom-uimaj and aggregate-pom-uimaj-eclipse-plugins.
>>>>
>>>> More checking to do to verify the build is ok.
>>>>
>>>> Next to tackle: docbooks, then the assemblies.
>>>>
>>>> -Marshall
>>>>
>>>> On 4/19/2010 5:16 PM, Marshall Schor wrote:
>>>>   
>>>>     
>>>>       
>>>>         
>>>>> Progress - created a common eclipse-plugin parent pom, and got the
>>>>> ep-configurator eclipse project to build.
>>>>>
>>>>> I noticed as a side effect of checking things that our 2.3.0 build for
>>>>> these artifacts are missing the License, Notice, etc. in the Jar
>>>>> manifest.  The new structure of parent poms corrects this in a uniform
>>>>> way :-)
>>>>>
>>>>> -Marshall
>>>>>
>>>>> On 4/19/2010 10:42 AM, Marshall Schor wrote:
>>>>>   
>>>>>     
>>>>>       
>>>>>         
>>>>>           
>>>>>> Progress -
>>>>>>
>>>>>> To handle the many Jars that need the extra bit in their Notice file(s),
>>>>>> I made a version of the remote-resource "bundle" that includes a
>>>>>> placeholder for additional text following the standard NOTICE boiler
plate.
>>>>>>
>>>>>> Then I made a version of the parent pom for uimaj (uimaj-ibm-notice)
>>>>>> which uses this extra remote resource, and sets the additional text
to
>>>>>> the required boilerplate for those jars which were originally coming
>>>>>> from IBM. 
>>>>>>
>>>>>> Now, JVinci has the right notice file...
>>>>>>
>>>>>> next problems I'm working on for JVinci: The implementation url is
>>>>>> incorrect (it's for the parent-pom), and the project title META-INF
>>>>>> which we used to have, is missing.
>>>>>>
>>>>>> -Marshall
>>>>>>
>>>>>> On 4/15/2010 5:17 PM, Marshall Schor wrote:
>>>>>>   
>>>>>>     
>>>>>>       
>>>>>>         
>>>>>>           
>>>>>>             
>>>>>>> Progress -
>>>>>>>
>>>>>>> I made a new top-level node in the uima tree called "build" -
for
>>>>>>> artifacts that we won't normally be including in assemblies,
but which
>>>>>>> are instead build things.
>>>>>>>
>>>>>>> In there, I put a folder called "parent-poms" - the intent is
to keep
>>>>>>> these organized in one place.
>>>>>>>
>>>>>>> I made a top level pom for the whole project, which inherits
from the
>>>>>>> common Apache pom version 7.  The common Apache pom connects
the deploy
>>>>>>> / release process with the Nexus repository.
>>>>>>>
>>>>>>> I also made a top level pom for just the main UIMA Java SDK -
>>>>>>> corresponding sort of to the former uimaj pom, except it doesn't
have
>>>>>>> any aggregation stuff.
>>>>>>>
>>>>>>> BTW, in fiddling with the poms, I'm following the recommended
ordering
>>>>>>> for elements in the POM, listed here:
>>>>>>> http://maven.apache.org/developers/conventions/code.html  (scroll
3/4 of
>>>>>>> the way toward the bottom)
>>>>>>>
>>>>>>> After fiddling with my .m2/settings.xml files per the instructions
on
>>>>>>> migrating to Nexus, both install and deploy worked (deploy was
for a
>>>>>>> SNAPSHOT - no real releases :-) ).
>>>>>>>
>>>>>>> You can see the deployed artifacts on repository.apache.org in
the
>>>>>>> Snapshots area.
>>>>>>>
>>>>>>> I'm now trying to see how to set up projects whose poms inherit
from
>>>>>>> uimaj.  First trying jVinci.  I'm comparing what gets built to
what was
>>>>>>> built for 2.3.0-incubating.
>>>>>>> One difference - a bunch of our components have slightly different
>>>>>>> Notices needed, so I'll fix that.
>>>>>>>
>>>>>>> Another thing to fix: thinking about when to run RAT.  Some projects
put
>>>>>>> it into a profile - so you can run it when you want to.  It could
also
>>>>>>> be in the apache-release profile - so it's always run when doing
a
>>>>>>> release candidate.  Unless there's a better idea, I'll add this.
>>>>>>>
>>>>>>> -Marshall
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>   
>>>>>>>     
>>>>>>>       
>>>>>>>         
>>>>>>>           
>>>>>>>             
>>>>>>>               
>>>>>>   
>>>>>>     
>>>>>>       
>>>>>>         
>>>>>>           
>>>>>>             
>>>>>   
>>>>>     
>>>>>       
>>>>>         
>>>>>           
>>>>   
>>>>     
>>>>       
>>>>         
>>>   
>>>     
>>>       
>>   
>>     
>
>   

Mime
View raw message