hama-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Edward J. Yoon" <edwardy...@apache.org>
Subject Re: BSP Task Input/InputSplit Filename
Date Wed, 22 May 2013 08:19:38 GMT
Good luck.

BTW, if you have to manage a lot of documents, I think you need to
merge documents into map or sequence file (document ID key and
document value pairs) on HDFS. Apache Nutch will be helpful. Then, you
can create a inverted index MR program by editing few lines of the
word-count MR example.

On Wed, May 22, 2013 at 4:42 PM, Steven van Beelen <smcvbeelen@gmail.com> wrote:
> For a project I'm trying to implement an Inverted Indexing algorithm, which
> has a 'term' and 'postingslist', in which the postings list consists of a
> 'document id' and 'payload' (in my case term frequency per document).
> I was thinking of inserting multiple different documents and taking the
> filename as documentID, hence the necessity.
> But I've found a way to work around this problem of mine by using different
> input which does not require the filename to be retrievable in a BSP task.
> If I will be needing it later on in my project and am working on it, I'll
> let you know.
> Thanks for the help thus far!
> On Wed, May 22, 2013 at 1:16 AM, Edward J. Yoon <edwardyoon@apache.org>wrote:
>> Hi,
>> Short answer is no, we don't provide API for what you are trying to do.
>> However, it can be added easily. See BSPPeerImpl.initInput() method,
>> InputSplit interface and FileSplit classes.
>> Why do you need that function? If there's reasonable necessity, Let's
>> add it together.
>> On Tue, May 21, 2013 at 7:04 PM, Steven van Beelen <smcvbeelen@gmail.com>
>> wrote:
>> > Hi all,
>> >
>> > The title says it: is there a way to retrieve the filename of the
>> > input/inputsplit a BSP Task is working on? I've been looking for some
>> time
>> > in the docs and source files, but cannot seem to find if one is able to
>> > retrieve the filename/pathname from the input used.
>> >
>> > Cheers
>> --
>> Best Regards, Edward J. Yoon
>> @eddieyoon

Best Regards, Edward J. Yoon

View raw message