manifoldcf-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Karl Wright <daddy...@gmail.com>
Subject Re: schedule information
Date Tue, 23 Dec 2014 12:36:28 GMT
Hi Jitu,

This is nothing like what I recommended for you to do.  I said to look in
WorkerThread.  Inside the ProcessActivity class, you will have access to
both the RepositoryDocument object and the IJobDescription object for that
job.

Karl


On Tue, Dec 23, 2014 at 7:31 AM, Jitu <abjitu@gmail.com> wrote:

> Hi Karl,
>
> I checked the source code and in IncrementalIngester.java at line 555 of
> checkFetchDocument() method we are checking for forced metadata match of
> previous run and current run. if there is a change then file is considered
> updated. So Please advice on how to send a parameter to output connector
> from StartupThread class which changes for every job execution?
>
> Thanks,
> Jitu
>
> On Tue, Dec 23, 2014 at 5:32 PM, Jitu <abjitu@gmail.com> wrote:
>
>> Hi Karl,
>>
>> Thanks for your support. Here is what i tried. In StartupThread.java
>> inside run method. i am trying to create one unique id called InstanceId
>> and store it as part of forcedMetaData which will be sent to
>> outputconnector. It all works fine. But when i re-run the same job again
>> and again all files are getting crawled again. Is this because forced
>> metadata is getting changed? is forced metadata used to check whether the
>> file is updated or not?
>>
>> code snippet:
>>
>>                   final String instanceId = IDFactory.make(threadContext);
>>                   // Only now record the fact that we are trying to start
>> the job.
>>
>> connectionMgr.recordHistory(jobDescription.getConnectionName(),
>>                     null,connectionMgr.ACTIVITY_JOBSTART,null,
>>
>> jobID.toString()+"("+jobDescription.getDescription()+")",null,instanceId,null);
>>                   jobDescription.clearForcedMetadata();
>>
>> jobDescription.addForcedMetadataValue("JOB_INSTANCE_ID", instanceId);
>>                   jobManager.save(jobDescription);
>>
>>
>> Thanks,
>> Jitu
>>
>> On Mon, Dec 22, 2014 at 6:58 PM, Karl Wright <daddywri@gmail.com> wrote:
>>
>>> Hi Jitu,
>>>
>>> Your client's needs seem rather unusual, and will potentially be
>>> somewhat expensive performance-wise.  So unless I hear from others as well
>>> that this is a key feature, there's no point in contributing a patch.
>>>
>>> You will of course need to keep track of whatever changes you develop so
>>> that you can later upgrade to newer versions of MCF.
>>>
>>> Thanks,
>>> Karl
>>>
>>>
>>> On Mon, Dec 22, 2014 at 8:14 AM, Jitu <abjitu@gmail.com> wrote:
>>>
>>>> Hi Karl,
>>>>
>>>> Thanks for the quick reply and support. This is exactly what i was
>>>> looking for. Thank you so much. If i modify WorkerThread.java do i need to
>>>> submit a patch for the same?
>>>>
>>>> Thanks,
>>>> Jitu
>>>>
>>>> On Mon, Dec 22, 2014 at 4:12 PM, Karl Wright <daddywri@gmail.com>
>>>> wrote:
>>>>
>>>>> Hi Jitu,
>>>>>
>>>>> I'm sorry for the miscommunication.  What I meant is that without any
>>>>> modifications, you can add the job's name as metadata for all documents
>>>>> indexed with the job.
>>>>>
>>>>> If you need to index hard-wired metadata for every job run, you will
>>>>> need to modify WorkerThread.java.  The IJobDescription object is readily
>>>>> available there, but you will also need to write a SQL query to obtain
the
>>>>> job's start time.
>>>>>
>>>>> Karl
>>>>>
>>>>>
>>>>> On Mon, Dec 22, 2014 at 4:33 AM, Jitu <abjitu@gmail.com> wrote:
>>>>>
>>>>>> Hi Karl,
>>>>>>           Thanks for the quick reply and support. i have gone through
>>>>>> the source code of "ForcedMetadataConnector.java" as well as  end
user
>>>>>> document "
>>>>>> http://manifoldcf.apache.org/release/trunk/en_US/end-user-documentation.html#metadataadjuster".
>>>>>> It says we can add a string constant for every job run. but for my
client
>>>>>> requirement he wants to know what all files crawled for every run
of the
>>>>>> job. so to search that i need to a send unique id of every job run
as part
>>>>>> of metadata. this unique id changes for every job run so i cannot
use
>>>>>> ForcedMetadataConnector. you advised "It's certainly possible to
add the
>>>>>> current job's start time field as hard-wired metadata" Please let
me know
>>>>>> how to achieve it.
>>>>>>
>>>>>> Thanks,
>>>>>> Jitu
>>>>>>
>>>>>> On Fri, Dec 19, 2014 at 1:09 PM, Karl Wright <daddywri@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Hi Jitu,
>>>>>>>
>>>>>>> You can certainly add a unique string associated with a job to
every
>>>>>>> document using the Metadata Adjuster transformation connector
(which of
>>>>>>> course can be the job name).  The time of indexing is already
sent as a
>>>>>>> metadata field (can't remember which one off the top of my head,
but I'm
>>>>>>> sure you can find it).  What you can't get, mainly because it
basically has
>>>>>>> little meaning in MCF, is the time the job was started.  It's
certainly
>>>>>>> possible to add the current job's start time field as hard-wired
metadata,
>>>>>>> but I bet your client would prefer the actual time of indexing
of the
>>>>>>> document anyhow.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Karl
>>>>>>>
>>>>>>>
>>>>>>> On Fri, Dec 19, 2014 at 2:30 AM, Jitu <abjitu@gmail.com>
wrote:
>>>>>>>>
>>>>>>>> Hi Karl,
>>>>>>>>             Thanks for all your support. For one of our customer
>>>>>>>> they need job scheduled information to be sent as part of
output connector.
>>>>>>>> Basically my customer wants to know what all files are indexed
in one job
>>>>>>>> run using solr search.
>>>>>>>>
>>>>>>>> For example if my job ran on 17th dec 2014 at 11:23 AM then
i will
>>>>>>>> send a unique string say "JobName 17-12-2014 11:23" as part
of
>>>>>>>> file metadata to solr output connector. During solr search
it will use this
>>>>>>>> string to search what all files are indexed as part of this
string or job
>>>>>>>> run.
>>>>>>>>
>>>>>>>> Please correct me if i am wrong or suggest me how to achive
it.
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Jitu
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Mime
View raw message