falcon-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sowmya Ramesh (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (FALCON-594) Process lineage information for Retention policies
Date Wed, 13 Aug 2014 18:18:12 GMT

    [ https://issues.apache.org/jira/browse/FALCON-594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14095846#comment-14095846
] 

Sowmya Ramesh edited comment on FALCON-594 at 8/13/14 6:17 PM:
---------------------------------------------------------------

Multiple approaches have been identified for adding lineage information for eviction policy.

*Approach 1:*

On execution of eviction policy delete the identified feed instance vertices from graph. For
completeness the associated entities vertices should also be deleted i.e. cascade delete.

Pros:
- As the identified feed instance vertices are deleted graph DB won't keep growing and hence
no storage space issues.

Cons:
- Since eviction history is not preserved this information cannot be retrieved at later point
of time.

*Approach 2:*

- On execution of eviction policy delete the identified feed instance vertices [cascade delete].
- For each identified feed entity vertex create a common Evicted vertex and add an edge with
label "evicted". Add a property to identify the feed instance vertex evicted [fi], timestamp
of eviction[ti], WF id[wi]. Instead of creating a new common vertex self loop can be added

Pros:
- As the identified feed instance vertices are deleted graph DB won't keep growing and hence
no storage space issues
- Some details about eviction is being stored in graph DB. This would enable getting details
about eviction

Cons:
- Compared to Approach 1 requires more storage as we store some details related to eviction
- For each evicted instance property [fi, ti, wi] is added. In order to get the eviction details
this property has to be parsed leading to performance issues

*Approach 3:*
Create a common Evicted vertex and on execution of eviction policy add an edge label "evicted"
from each identified feed instance vertex to this.

Pros:
- Approach is simple in terms of implementation
- Retaining all the details of evicted feed instances for historical queries

Cons:
- Storage and performance issues as graphDB keeps growing

*Approach 4:*
On execution of retention policy add "evicted" property to each identified feed instance vertex.
Do some cleanup based on time limit that ought to be available to avoid graph DB from growing
leading to storage/performance related issues [FALCON-335|https://issues.apache.org/jira/browse/FALCON-335].

Pros:
- Retaining all the details of evicted feed instances for historical queries

Cons:
-  Storage and performance issues as graphDB keeps growing

In addition the decision to purge the vertices can be based on user input to preserve the
history or not. In this case multiple approaches has to be implemented. 
Instead of deleting vertices right away there can be time limit to do DB cleanup.

Approach 4 is identified as a feasible solution. Please comment if you have any concerns or
inputs.

Thanks!




was (Author: sowmyaramesh):
Multiple approaches have been identified for adding lineage information for eviction policy.

*Approach 1:*

On execution of eviction policy delete the identified feed instance vertices from graph. For
completeness the associated entities vertices should also be deleted i.e. cascade delete.

Pros:
- As the identified feed instance vertices are deleted graph DB won't keep growing and hence
no storage space issues.

Cons:
- Since eviction history is not preserved this information cannot be retrieved at later point
of time.

*Approach 2:*

- On execution of eviction policy delete the identified feed instance vertices [cascade delete].
- For each identified feed entity vertex create a common Evicted vertex and add an edge with
label "evicted". Add a property to identify the feed instance vertex evicted [fi], timestamp
of eviction[ti], WF id[wi]. Instead of creating a new common vertex self loop can be added

Pros:
- As the identified feed instance vertices are deleted graph DB won't keep growing and hence
no storage space issues
- Some details about eviction is being stored in graph DB. This would enable getting details
about eviction

Cons:
- Compared to Approach 1 requires more storage as we store some details related to eviction
- For each evicted instance property [fi, ti, wi] is added. In order to get the eviction details
this property has to be parsed leading to performance issues

*Approach 3:*
Create a common Evicted vertex and on execution of eviction policy add an edge label "evicted"
from each identified feed instance vertex to this.

Pros:
- Approach is simple in terms of implementation
- Retaining all the details of evicted feed instances for historical queries

Cons:
- Storage and performance issues as graphDB keeps growing

*Approach 4*
On execution of retention policy add "evicted" property to each identified feed instance vertex.
Do some cleanup based on time limit that ought to be available to avoid graph DB from growing
leading to storage/performance related issues [FALCON-335|https://issues.apache.org/jira/browse/FALCON-335].

Pros:
- Retaining all the details of evicted feed instances for historical queries

Cons:
-  Storage and performance issues as graphDB keeps growing

In addition the decision to purge the vertices can be based on user input to preserve the
history or not. In this case multiple approaches has to be implemented. 
Instead of deleting vertices right away there can be time limit to do DB cleanup.

Approach 4 is identified as a feasible solution. Please comment if you have any concerns or
inputs.

Thanks!



> Process lineage information for Retention policies
> --------------------------------------------------
>
>                 Key: FALCON-594
>                 URL: https://issues.apache.org/jira/browse/FALCON-594
>             Project: Falcon
>          Issue Type: Sub-task
>            Reporter: Sowmya Ramesh
>            Assignee: Sowmya Ramesh
>
> Falcon currently addresses process executions and not data lifecycle policies. This task
should address adding this information.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message