uima-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Marshall Schor <...@schor.com>
Subject Re: progress in Nexus / Pom updates
Date Wed, 05 May 2010 19:34:53 GMT
I think that the uimaj part is now done (at least I don't know of more
bugs).

The latest round was to get the distr and eclipse update site stuff working.

The felix-bundle builds were altered to run the goal "manifest", which
only produces the manifest.  The packaging type was changed to "jar",
and the normal Apache jar release stuff which copies in the required
license and notice headers, now works.

I also adopted the osgiVersion approach to naming the Eclipse plugins,
while keeping the maven versions using the standard Maven conventions. 
We'll know if that's the right approach eventually, when we try out the
release plugin :-).

Most of the inter-project dependencies are now eliminated - this means
you should be able to check out one project, and do mvn install and it
should build :-).

Exceptions are the aggregating projects - because their <module>
elements refer to the modules by relative path, and the -distr projects,
because they find the things being distributed using relative paths.

Next is uima-as - I'll work on a branch, again, for this.

-Marshall

On 4/30/2010 2:13 PM, Marshall Schor wrote:
> The next chapter in this saga... (apologies for the long post). If you
> won't be writing docbooks or your docbooks won't be cross-referenced to
> any other docbooks in the uima bookshelf, then you can skip reading
> this, unless you want to be entertained :-) .
>
> This is all about olinks. Olinks allow cross-referencing and
> hyperlinking among documents, using extra saved information about the
> target document being linked to (as contrasted with plain href style
> links, which only have the link url). For instance, in PDFs, there's
> extra info enabling the referring doc to say "page 123 in document abc".
> For PDF and HTML, it allows the referring text to include a hyperlink
> with the text begin the target document's title, and maybe number (if it
> has numbered items - such as our chapter / section numbers in the main
> UIMA documentation). So you can get a link that looks like this:
>
> see Section 1.5.1, “Annotator Methods”
> <http://uima.apache.org/downloads/releaseDocs/2.3.0-incubating/docs/html/tutorials_and_users_guides/tutorials_and_users_guides.html#ugr.tug.aae.contract_for_annotator_methods>
> for ...
>
> where the 1.5.1 was generated by docbook processing, and the "Annotator
> Methods" was the title of that section.
>
> To make olinks work, each time a docbook is processed, an extra database
> of info for that docbook is created, containing just the info needed for
> this. This database, together with some other data about how the
> multiple interlinking docbooks are arranged, is needed when processing a
> docbook, to resolve these things.
>
> So - where to store this information? We previously had stored this in
> SVN. This was unsatisfactory because it caused interdependencies among
> checked-out projects, where one project (having these databases) had to
> be checked out into a specific, fixed directory layout with respect to
> other (using) projects. The Maven way to get around this is to put these
> things into the maven repository.
>
> Since there's one database per docbook, I though it could best be stored
> as an additional maven attached file for the project. Then you could
> "depend" on the project, and download that artifact. This would place a
> burden on docbook users - they would need to specify additional POM info
> to get these things downloaded.
>
> So I tried that, and it worked fine for individual book processing. Then
> I tried using an aggregator POM specifying the 4 main UIMA docbooks (now
> moved to separate projects), and since these all refer (that is, olink )
> to each other, this violated a maven principle of no circular dependency
> relationships. These really are circular relationships, but they resolve
> when you run docbook multiple times :-) .
>
> To fix this, I went to a scheme where there is just one additional
> project (I'm calling it uima-docbook-olink-dbs) that will have just one
> attached artifact, a zip file of all the needed docbook olink data for
> all the docbooks in UIMA. (This could be a large set - besides the 4
> main books, we have one for uima-as, and there's other books for many of
> the sandbox projects, and one for some special tooling - like the
> PearPackagingMavenPlugin).
>
> This project is at the level 1-SNAPSHOT, and I think it will stay there.
> This is because it's always being updated in part by each docbook
> processing run, and we currently don't have a concept of needing any but
> the latest versions of things. Note that releases will capture the
> result of using the then current (at the time of the build) version of
> these databases. I could imagine some fancy use cases that might not be
> well supported - such as working on several versions at once, but I'll
> let those use cases materialize first before trying to address them :-)
> . Here's how this set of olink data will be used.
>
> 1) new users start by checking out and running a build which invokes our
> docbook processing. This uses the dependency:unpack goal to find this
> artifact in the maven snapshot repo (in the Apache infrastructure's
> version of Nexus), where it lives - it will have the latest "deployed"
> (that is, uploaded) set of olink data, for all docbooks that are using
> olink.
>
> 2) The dependency:unpack will first download this zip to the local repo
> if it isn't already there. If it is already there, it will check to see
> if the snapshot in the repository is newer, and if so, will download
> that. It then unpacks that to a spot where all projects being built on
> this workstation for this user can find it.
>
> 3) The rest of the docbook build uses this olink data, and also, as a
> side-effect of running on a particular document, adds or updates the
> existing olink data for the current document being processed.
>
> In thinking about where to store the unzipped form of the olink
> databases, I hit upon the idea of storing it in the local .m2 repo, in
> the uima-docbook-olink-dbs project, but as an additional directory
> (called docbook-olink) which is *not* attached - so it won't be uploaded.
>
> This has a couple of nice side effects - once installed and unzipped,
> unless someone else "deploys" an update of this data to the snapshot
> repo, the download and unpack steps can be somewhat skipped. And,
> whenever someone doing some docbook builds is happy with their results,
> they've as a side effect been creating additional or updated olink info
> for one or more books, and to make these available to others, they just
> need to "deploy" these back to the snapshot repository. (Note that that
> deploy step runs a POM which first gets any updates made by someone else
> for other docbooks, that might have happened in the meanwhile, so what's
> uploaded is the latest version of all docbooks (except for collisions
> where two have checked out the and are processing the very same docbook
> - in which case the last one wins...).
>
> Testing revealed that this seems to work, with one exception - when I
> ran the deploy from within m2eclipse, it nicely uploaded the POM , but
> gave a message about Failed to Upload [400]. After much googling that
> didn't identify the issue, I tried this from the command line, and it
> worked. A few more tries isolated this to an apparent issue in the
> "built-in" version of maven that m2eclipse 0.0.10 uses, which is a
> version of maven 3.0-alpha-6. I found that maven 2.2.1 and 3.0-beta-1
> both work, even when run from m2eclipse. So if you are using m2eclipse,
> I recommend you
> 1) use the maven preferences to install a link to 3.0-beta-1, and set it
> as the default
> 2) if previous use of m2eclipse created any run configurations, you have
> to manually update each one of those - there's a menu pull-down at the
> bottom of the main run configuration page for each one, labeled "Maven
> Runtime", where you can switch this.
>
> Next steps will be verifying that the overwrite-if-newer is working for
> using dependency:unpack for individual unpacked files, then I'll
> probably go and check a bunch of this in :-)
>
> I've started writing a new web page for our site describing how to do
> docbooks, the uima bookshelf concept, etc., which I'll need to update...
>
> -Marshall
>
>
>
>
> On 4/26/2010 11:17 PM, Marshall Schor wrote:
>   
>> Docbook story:  Most of the afternoon was spent tracking down a bug,
>> which turned out to be formerly hidden by Maven 2.2.1, but which Maven
>> 3.0 exposed (I'm trying Maven 3.0 beta 1 - it seems to run faster/better
>> :-) ).  The symptom was a report that the "catalog file" could not be found.
>>
>> The bug is that if you ask in a plugin to load a resource at the top
>> level, using the string "/xxx.xml" for instance, it fails.  This is
>> because that leading "/" makes the Java classloader.getResource(aString)
>> fail.  To fix, just drop the leading "/". 
>>
>> I've reported this along with the fix to the docbkx project - they use
>> this to load the "catalog.xml" file that comes with docbook 4.x and 5.0
>> distributions. 
>>
>> So, now, after all that, I'm starting to get docbook building again,
>> this time with fully factored parent plugins.  The olink stuff I'm going
>> to try to do by using maven "attachments", and going for a strategy of
>> only 1 docbook per project (I've split the uima-docbooks project, which
>> held 4 docbooks, into 4 projects, each holding one docbook). 
>>
>> This aligns the approach with the way Sandbox projects are doing
>> documentation - they have the project produce the 1 main artifact (a
>> jar), and now it will also produce (when I'm fininshed :-) ) an
>> additional "attached" artifact - the olink data for the pdf and html
>> versions. 
>>
>> This will allow other docbooks which want to hyperlink to a reference in
>> the first docbook to be able to do so. (OLinking is like normal
>> hyperlinking, except that information about the target is known, so for
>> PDFs, the link includes the "book" + page number in the book, and it
>> includes locating the other book via a relative directory path.).
>>
>> It looks like I'll be able to put all the gorp (that's a technical term
>> :-) ) for docbook formatting, like boiler plate, title pages, things to
>> enable xInclude, fonts, css stuff,
>> customization xsl layers, etc. into a shared "resource bundle" that
>> projects will be able to fetch (from their local .m2 repository, or from
>> the big repo in the sky).
>>
>> -Marshall
>>
>> On 4/22/2010 4:03 PM, Marshall Schor wrote:
>>   
>>     
>>> progress -
>>>
>>> the uimaj/branches/mavenAlign branch should now build all of the Java
>>> components.  There are 2 new aggregate (only) POMs for this, to build in
>>> batch, called aggregate-pom-uimaj and aggregate-pom-uimaj-eclipse-plugins.
>>>
>>> More checking to do to verify the build is ok.
>>>
>>> Next to tackle: docbooks, then the assemblies.
>>>
>>> -Marshall
>>>
>>> On 4/19/2010 5:16 PM, Marshall Schor wrote:
>>>   
>>>     
>>>       
>>>> Progress - created a common eclipse-plugin parent pom, and got the
>>>> ep-configurator eclipse project to build.
>>>>
>>>> I noticed as a side effect of checking things that our 2.3.0 build for
>>>> these artifacts are missing the License, Notice, etc. in the Jar
>>>> manifest.  The new structure of parent poms corrects this in a uniform
>>>> way :-)
>>>>
>>>> -Marshall
>>>>
>>>> On 4/19/2010 10:42 AM, Marshall Schor wrote:
>>>>   
>>>>     
>>>>       
>>>>         
>>>>> Progress -
>>>>>
>>>>> To handle the many Jars that need the extra bit in their Notice file(s),
>>>>> I made a version of the remote-resource "bundle" that includes a
>>>>> placeholder for additional text following the standard NOTICE boiler
plate.
>>>>>
>>>>> Then I made a version of the parent pom for uimaj (uimaj-ibm-notice)
>>>>> which uses this extra remote resource, and sets the additional text to
>>>>> the required boilerplate for those jars which were originally coming
>>>>> from IBM. 
>>>>>
>>>>> Now, JVinci has the right notice file...
>>>>>
>>>>> next problems I'm working on for JVinci: The implementation url is
>>>>> incorrect (it's for the parent-pom), and the project title META-INF
>>>>> which we used to have, is missing.
>>>>>
>>>>> -Marshall
>>>>>
>>>>> On 4/15/2010 5:17 PM, Marshall Schor wrote:
>>>>>   
>>>>>     
>>>>>       
>>>>>         
>>>>>           
>>>>>> Progress -
>>>>>>
>>>>>> I made a new top-level node in the uima tree called "build" - for
>>>>>> artifacts that we won't normally be including in assemblies, but
which
>>>>>> are instead build things.
>>>>>>
>>>>>> In there, I put a folder called "parent-poms" - the intent is to
keep
>>>>>> these organized in one place.
>>>>>>
>>>>>> I made a top level pom for the whole project, which inherits from
the
>>>>>> common Apache pom version 7.  The common Apache pom connects the
deploy
>>>>>> / release process with the Nexus repository.
>>>>>>
>>>>>> I also made a top level pom for just the main UIMA Java SDK -
>>>>>> corresponding sort of to the former uimaj pom, except it doesn't
have
>>>>>> any aggregation stuff.
>>>>>>
>>>>>> BTW, in fiddling with the poms, I'm following the recommended ordering
>>>>>> for elements in the POM, listed here:
>>>>>> http://maven.apache.org/developers/conventions/code.html  (scroll
3/4 of
>>>>>> the way toward the bottom)
>>>>>>
>>>>>> After fiddling with my .m2/settings.xml files per the instructions
on
>>>>>> migrating to Nexus, both install and deploy worked (deploy was for
a
>>>>>> SNAPSHOT - no real releases :-) ).
>>>>>>
>>>>>> You can see the deployed artifacts on repository.apache.org in the
>>>>>> Snapshots area.
>>>>>>
>>>>>> I'm now trying to see how to set up projects whose poms inherit from
>>>>>> uimaj.  First trying jVinci.  I'm comparing what gets built to what
was
>>>>>> built for 2.3.0-incubating.
>>>>>> One difference - a bunch of our components have slightly different
>>>>>> Notices needed, so I'll fix that.
>>>>>>
>>>>>> Another thing to fix: thinking about when to run RAT.  Some projects
put
>>>>>> it into a profile - so you can run it when you want to.  It could
also
>>>>>> be in the apache-release profile - so it's always run when doing
a
>>>>>> release candidate.  Unless there's a better idea, I'll add this.
>>>>>>
>>>>>> -Marshall
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>   
>>>>>>     
>>>>>>       
>>>>>>         
>>>>>>           
>>>>>>             
>>>>>   
>>>>>     
>>>>>       
>>>>>         
>>>>>           
>>>>   
>>>>     
>>>>       
>>>>         
>>>   
>>>     
>>>       
>>   
>>     
>
>   

Mime
View raw message