hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (Jira)" <j...@apache.org>
Subject [jira] [Work logged] (HADOOP-17281) Implement FileSystem.listStatusIterator() in S3AFileSystem
Date Mon, 05 Oct 2020 16:32:00 GMT

     [ https://issues.apache.org/jira/browse/HADOOP-17281?focusedWorklogId=495428&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-495428
]

ASF GitHub Bot logged work on HADOOP-17281:
-------------------------------------------

                Author: ASF GitHub Bot
            Created on: 05/Oct/20 16:31
            Start Date: 05/Oct/20 16:31
    Worklog Time Spent: 10m 
      Work Description: steveloughran commented on a change in pull request #2354:
URL: https://github.com/apache/hadoop/pull/2354#discussion_r499720582



##########
File path: hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/fs/contract/ContractTestUtils.java
##########
@@ -1453,6 +1453,52 @@ public static TreeScanResults treeWalk(FileSystem fs, Path path)
     return list;
   }
 
+  /**
+   * Convert a remote iterator over file status results into a list.
+   * The utility equivalents in commons collection and guava cannot be
+   * used here, as this is a different interface, one whose operators
+   * can throw IOEs.
+   * @param iterator input iterator
+   * @return the file status entries as a list.
+   * @throws IOException
+   */
+  public static List<? extends FileStatus> iteratorToList(
+          RemoteIterator<? extends FileStatus> iterator) throws IOException {
+    ArrayList<FileStatus> list = new ArrayList<>();
+    while (iterator.hasNext()) {
+      list.add(iterator.next());
+    }
+    return list;
+  }
+
+
+  /**
+   * Convert a remote iterator over file status results into a list.
+   * This uses {@link RemoteIterator#next()} calls only, expecting
+   * a raised {@link NoSuchElementException} exception to indicate that
+   * the end of the listing has been reached. This iteration strategy is
+   * designed to verify that the implementation of the remote iterator
+   * generates results and terminates consistently with the {@code hasNext/next}
+   * iteration. More succinctly "verifies that the {@code next()} operator
+   * isn't relying on {@code hasNext()} to always be called during an iteration.
+   * @param iterator input iterator
+   * @return the status entries as a list.
+   * @throws IOException IO problems
+   */
+  @SuppressWarnings("InfiniteLoopStatement")
+  public static List<? extends FileStatus> iteratorToListThroughNextCallsAlone(
+          RemoteIterator<? extends FileStatus> iterator) throws IOException {
+    ArrayList<FileStatus> list = new ArrayList<>();

Review comment:
       same: `List` as variable type
   
   Actually, it may make sense to type this method 
   ```
   RemoteIterator<T extends FileStatus>
   ```
   and have 
   ```
   List<T>
   ```
   and
   ```
   RemoteIterator<T>
   ```
    where appropriate

##########
File path: hadoop-common-project/hadoop-common/src/site/markdown/filesystem/filesystem.md
##########
@@ -294,6 +294,20 @@ any optimizations.
 The atomicity and consistency constraints are as for
 `listStatus(Path, PathFilter)`.
 
+### `RemoteIterator<FileStatus> listStatusIterator(Path p)`
+Return an iterator enumerating the `FileStatus` entries under
+a path. This is similar to `listStatus(Path)` except the fact that
+rather than returning an entire list, an iterator is returned.
+The result is exactly the same as listStatus, provided no other caller 

Review comment:
       use `listStatus()` with backticks

##########
File path: hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/fs/contract/AbstractContractRootDirectoryTest.java
##########
@@ -242,6 +243,13 @@ public void testSimpleRootListing() throws IOException {
             + "listStatus = " + listStatusResult
             + "listFiles = " + listFilesResult,
         fileList.size() <= statuses.length);
+    List<FileStatus> statusList = (List<FileStatus>) iteratorToList(
+            fs.listStatusIterator(root));
+    String listStatusItrRes = join(statusList, "\n");

Review comment:
       if you can have both sets of listing as collections, use Assertions.assertThat() as
you've done earlier

##########
File path: hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/fs/contract/AbstractContractGetFileStatusTest.java
##########
@@ -322,6 +355,22 @@ public void testListStatusFile() throws Throwable {
     verifyStatusArrayMatchesFile(f, getFileSystem().listStatus(f));
   }
 
+  @Test
+  public void testListStatusIteratorFile() throws Throwable {
+    describe("test the listStatusIterator(path) on a file");
+    Path f = touchf("listStItrFile");
+    List<FileStatus> statusList = (List<FileStatus>) iteratorToList(
+            getFileSystem().listStatusIterator(f));
+    assertEquals("size of file list returned", 1, statusList.size());
+    assertIsNamedFile(f, statusList.get(0));
+    List<FileStatus> statusList2 =
+            (List<FileStatus>) iteratorToListThroughNextCallsAlone(
+                    getFileSystem().listStatusIterator(f));
+    assertEquals("size of file list returned through next() calls",

Review comment:
       AssertionsAssertThat to assert that the list size == 1




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Issue Time Tracking
-------------------

    Worklog Id:     (was: 495428)
    Time Spent: 1h  (was: 50m)

> Implement FileSystem.listStatusIterator() in S3AFileSystem
> ----------------------------------------------------------
>
>                 Key: HADOOP-17281
>                 URL: https://issues.apache.org/jira/browse/HADOOP-17281
>             Project: Hadoop Common
>          Issue Type: Task
>          Components: fs/s3
>    Affects Versions: 3.3.0
>            Reporter: Mukund Thakur
>            Assignee: Mukund Thakur
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 1h
>  Remaining Estimate: 0h
>
> Currently S3AFileSystem only implements listStatus() api which returns an array. Once
we implement the listStatusIterator(), clients can benefit from the async listing done recently 
> https://issues.apache.org/jira/browse/HADOOP-17074  by performing some tasks on files
while iterating them.
>  
> CC [~stevel]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org


Mime
View raw message