ctakes-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tim Miller <timothy.mil...@childrens.harvard.edu>
Subject Re: files vs strings in collection reader
Date Tue, 07 May 2013 19:49:48 GMT
This sounds like a job for... science! I'll try some experiments and see 
if it makes a difference.

On 05/07/2013 03:42 PM, Masanz, James J. wrote:
> do you have any numbers of what sort of impact this will actually have?  Not clear to
me what the savings would be from. Instantiating objects either way.  Should we be just initializing
the ArrayList to something other than the default size?
> -- James
>> -----Original Message-----
>> From: dev-return-1580-Masanz.James=mayo.edu@ctakes.apache.org [mailto:dev-
>> return-1580-Masanz.James=mayo.edu@ctakes.apache.org] On Behalf Of Tim
>> Miller
>> Sent: Tuesday, May 07, 2013 2:18 PM
>> To: dev@ctakes.apache.org
>> Subject: files vs strings in collection reader
>> The FilesInDirectoryCollectionReader creates an arraylist of java.io.File
>> objects when it is initialized. For large datasets (~50k
>> files) this is substantial time overhead and probably memory as well.
>> Seems like it would be more efficient to use Strings instead of Files
>> there and just open the File object when getNext() is called. It is pretty
>> easy to implement, any downside to making this switch?
>> Tim

View raw message