trafodion-codereview mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From zellerh <...@git.apache.org>
Subject [GitHub] incubator-trafodion pull request #683: new spj of startwith and connectby
Date Fri, 02 Sep 2016 01:38:52 GMT
Github user zellerh commented on a diff in the pull request:

    https://github.com/apache/incubator-trafodion/pull/683#discussion_r77282407
  
    --- Diff: core/sql/lib_mgmt/src/main/java/org/trafodion/libmgmt/FileMgmt.java ---
    @@ -547,6 +558,64 @@ private static void checkFile(String fname, int dataSize) throws
SQLException {
     			throw new SQLException("Jar file size is over the threshold[100Mb]");
     		}
     	}
    +	
    +	public static void connectBy(String columnName, String originSql, String startWithSql,
String startWithKey,ResultSet[] rs) throws SQLException{
    +		String[] out =new String[1];
    +		getStartWith(startWithSql, startWithKey, out); 
    +		String[] s = out[0].replaceAll("^\\[|\\]$", "").split("\\s*,\\s*");
    +		String clause = " ";
    +		for(int i = 0;i<s.length;i++){
    +			if(i == s.length -1){
    +				clause += columnName + "=\'" + s[i] + "\'";
    +			}else{
    +				clause += columnName + "=\'" + s[i] + "\' or ";
    +			}
    +		}
    +		String connectSql = String.format(originSql, clause);
    +		Connection conn = getConn();
    +		PreparedStatement ps = conn.prepareStatement(connectSql);
    +		try{
    +			rs[0] = ps.executeQuery();
    +		} catch(SQLException e){
    +			LOG.error(originSql,e);
    +			throw e;
    +		} 
    +	}
    +	
    +	
    +	public static void getStartWith(String sql, String startWithKey, String[] out) throws
SQLException{
    +		String sqlTest = String.format(sql, startWithKey);
    +		int searchLevel = 0;
    +		ArrayList<String> ids = new ArrayList<String>();
    +		Connection conn = getConn();
    +		PreparedStatement ps = conn.prepareStatement(sqlTest);
    +		try{
    +			ids.addAll(bfsSearch(sql,startWithKey,searchLevel,ps));
    +			out[0]= Arrays.toString(ids.toArray());
    +		} catch(SQLException e){
    +			LOG.error(sql,e);
    +			throw e;
    +		} 
    +	}
    +	
    +	private static ArrayList<String> bfsSearch(String sql, String startWithKey,int
searchLevel, PreparedStatement ps) throws SQLException{
    --- End diff --
    
    Now I understand the comment people are making about the lack of parallel execution for
this approach.
    
    This is a recursive method that returns an array of strings I see several disadvantages
with this approach:
    
    1. We process every key serially, instead of using a parallel query for all keys of a
given level.
    2. We may get very many strings back, millions, maybe hundreds of millions. This will
not scale.
    3. Having to convert the start with key to a string is not ideal, what if we have multiple
columns?
    4. It's a recursive call and therefore could overflow the stack.
    5. Cycle detection would require searching the string arrays.
    
    Here would be another solution:
    
    1. Create a volatile table to hold the ids (parent, child, level). Note: This step may
be a problem, I remember that there are issues with creating volatile tables in SPJs, but
maybe we can do this in the master executor. If we need to produce an error if we encounter
a cycle, make the "parent" column(s) a unique key.
    2. Delete any existing data from the volatile table.
    3. Process the "start with" query by inserting the first set of data into the volatile
table.
    4. In a loop, populate the volatile table with the data for the subsequent levels.
    5. The loop ends when the insert statement into the volatile table inserts 0 rows, or
when we exceed the maximum number of levels.
    6. The caller of the SPJ can now retrieve the result in the volatile table.
    
    
    
    



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

Mime
View raw message