airavata-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Marcus Christie (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (AIRAVATA-2741) Ideas for better way to deal with arbitrary output files than ARCHIVE
Date Fri, 06 Apr 2018 17:40:00 GMT

     [ https://issues.apache.org/jira/browse/AIRAVATA-2741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Marcus Christie updated AIRAVATA-2741:
--------------------------------------
    Description: 
Just want to capture some details of recent conversations with [~eroma_a] and [~spamidig]
on how to improve Airavata capabilities so we can move beyond using ARCHIVE.  The ARCHIVE
capability is a bit of a hack and causes some issues for us. Just briefly, here are some of
the problems:
* pulls back absolutely every file but some aren't needed and some intermediate files are
very large. For some applications it isn't even practical to use ARCHIVE
* pulls back duplicates of Application Output files, further filling gateway data storage
* these files are basically opaque to Airavata, so there is a limit on what can be done in
a programmatic way for some of these files

Here are some potential improvements:
* improve wildcard support: allow specifying a wildcard that can match a single or multiple
files. For multiple files these can all be registered as a URI_COLLECTION type data output.
(Side note: I'm not sure what all is currently supported with the wildcard support, need to
investigate)
* Show all of the job files in the portal, including ones that aren't defined as Application
Outputs and haven't actually been staged back to the portal, and allow the user to request
pulling back one of these other files. This would be nice because there are certainly going
to be cases where a file is generated that wasn't anticipated (either lack of configuration
or just something truly not anticipatable). Would mean needing to register every file in the
job directory, not just the Application Outputs (not sure where, replica catalog?). Would
also mean we need backend task execution support for fetching these files as needed.


  was:
Just want to capture some details of recent conversations with [~eroma_a] and [~spamidig]
on how to improve Airavata capabilities so we can move beyond using ARCHIVE.  The ARCHIVE
capability is a bit of a hack and causes some issues for us. Just briefly, here are some of
the problems:
* pulls back absolutely every file but some aren't needed and some intermediate files are
very large. For some applications it isn't even practical to use ARCHIVE
* pulls back duplicates of Application Output files, further filling gateway data storage
* these files are basically opaque to Airavata, so there is a limit on what can be done in
a programmatic way for some of these files

Here are some potential improvements:
* improve wildcard support: allow specifying a wildcard that can match a single or multiple
files. For multiple files these can all be registered as a URI_COLLECTION type data output.
* Show all of the job files in the portal, including ones that aren't defined as Application
Outputs and haven't actually been staged back to the portal, and allow the user to request
pulling back one of these other files. This would be nice because there are certainly going
to be cases where a file is generated that wasn't anticipated (either lack of configuration
or just something truly not anticipatable). Would mean needing to register every file in the
job directory, not just the Application Outputs (not sure where, replica catalog?). Would
also mean we need backend task execution support for fetching these files as needed.



> Ideas for better way to deal with arbitrary output files than ARCHIVE
> ---------------------------------------------------------------------
>
>                 Key: AIRAVATA-2741
>                 URL: https://issues.apache.org/jira/browse/AIRAVATA-2741
>             Project: Airavata
>          Issue Type: Improvement
>            Reporter: Marcus Christie
>            Assignee: Marcus Christie
>            Priority: Major
>
> Just want to capture some details of recent conversations with [~eroma_a] and [~spamidig]
on how to improve Airavata capabilities so we can move beyond using ARCHIVE.  The ARCHIVE
capability is a bit of a hack and causes some issues for us. Just briefly, here are some of
the problems:
> * pulls back absolutely every file but some aren't needed and some intermediate files
are very large. For some applications it isn't even practical to use ARCHIVE
> * pulls back duplicates of Application Output files, further filling gateway data storage
> * these files are basically opaque to Airavata, so there is a limit on what can be done
in a programmatic way for some of these files
> Here are some potential improvements:
> * improve wildcard support: allow specifying a wildcard that can match a single or multiple
files. For multiple files these can all be registered as a URI_COLLECTION type data output.
(Side note: I'm not sure what all is currently supported with the wildcard support, need to
investigate)
> * Show all of the job files in the portal, including ones that aren't defined as Application
Outputs and haven't actually been staged back to the portal, and allow the user to request
pulling back one of these other files. This would be nice because there are certainly going
to be cases where a file is generated that wasn't anticipated (either lack of configuration
or just something truly not anticipatable). Would mean needing to register every file in the
job directory, not just the Application Outputs (not sure where, replica catalog?). Would
also mean we need backend task execution support for fetching these files as needed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message