manifoldcf-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jitu <abj...@gmail.com>
Subject Re: schedule information
Date Tue, 23 Dec 2014 12:02:46 GMT
Hi Karl,

Thanks for your support. Here is what i tried. In StartupThread.java inside
run method. i am trying to create one unique id called InstanceId and store
it as part of forcedMetaData which will be sent to outputconnector. It all
works fine. But when i re-run the same job again and again all files are
getting crawled again. Is this because forced metadata is getting changed?
is forced metadata used to check whether the file is updated or not?

code snippet:

                  final String instanceId = IDFactory.make(threadContext);
                  // Only now record the fact that we are trying to start
the job.

connectionMgr.recordHistory(jobDescription.getConnectionName(),
                    null,connectionMgr.ACTIVITY_JOBSTART,null,

jobID.toString()+"("+jobDescription.getDescription()+")",null,instanceId,null);
                  jobDescription.clearForcedMetadata();
                  jobDescription.addForcedMetadataValue("JOB_INSTANCE_ID",
instanceId);
                  jobManager.save(jobDescription);


Thanks,
Jitu

On Mon, Dec 22, 2014 at 6:58 PM, Karl Wright <daddywri@gmail.com> wrote:

> Hi Jitu,
>
> Your client's needs seem rather unusual, and will potentially be somewhat
> expensive performance-wise.  So unless I hear from others as well that this
> is a key feature, there's no point in contributing a patch.
>
> You will of course need to keep track of whatever changes you develop so
> that you can later upgrade to newer versions of MCF.
>
> Thanks,
> Karl
>
>
> On Mon, Dec 22, 2014 at 8:14 AM, Jitu <abjitu@gmail.com> wrote:
>
>> Hi Karl,
>>
>> Thanks for the quick reply and support. This is exactly what i was
>> looking for. Thank you so much. If i modify WorkerThread.java do i need to
>> submit a patch for the same?
>>
>> Thanks,
>> Jitu
>>
>> On Mon, Dec 22, 2014 at 4:12 PM, Karl Wright <daddywri@gmail.com> wrote:
>>
>>> Hi Jitu,
>>>
>>> I'm sorry for the miscommunication.  What I meant is that without any
>>> modifications, you can add the job's name as metadata for all documents
>>> indexed with the job.
>>>
>>> If you need to index hard-wired metadata for every job run, you will
>>> need to modify WorkerThread.java.  The IJobDescription object is readily
>>> available there, but you will also need to write a SQL query to obtain the
>>> job's start time.
>>>
>>> Karl
>>>
>>>
>>> On Mon, Dec 22, 2014 at 4:33 AM, Jitu <abjitu@gmail.com> wrote:
>>>
>>>> Hi Karl,
>>>>           Thanks for the quick reply and support. i have gone through
>>>> the source code of "ForcedMetadataConnector.java" as well as  end user
>>>> document "
>>>> http://manifoldcf.apache.org/release/trunk/en_US/end-user-documentation.html#metadataadjuster".
>>>> It says we can add a string constant for every job run. but for my client
>>>> requirement he wants to know what all files crawled for every run of the
>>>> job. so to search that i need to a send unique id of every job run as part
>>>> of metadata. this unique id changes for every job run so i cannot use
>>>> ForcedMetadataConnector. you advised "It's certainly possible to add the
>>>> current job's start time field as hard-wired metadata" Please let me know
>>>> how to achieve it.
>>>>
>>>> Thanks,
>>>> Jitu
>>>>
>>>> On Fri, Dec 19, 2014 at 1:09 PM, Karl Wright <daddywri@gmail.com>
>>>> wrote:
>>>>
>>>>> Hi Jitu,
>>>>>
>>>>> You can certainly add a unique string associated with a job to every
>>>>> document using the Metadata Adjuster transformation connector (which
of
>>>>> course can be the job name).  The time of indexing is already sent as
a
>>>>> metadata field (can't remember which one off the top of my head, but
I'm
>>>>> sure you can find it).  What you can't get, mainly because it basically
has
>>>>> little meaning in MCF, is the time the job was started.  It's certainly
>>>>> possible to add the current job's start time field as hard-wired metadata,
>>>>> but I bet your client would prefer the actual time of indexing of the
>>>>> document anyhow.
>>>>>
>>>>> Thanks,
>>>>> Karl
>>>>>
>>>>>
>>>>> On Fri, Dec 19, 2014 at 2:30 AM, Jitu <abjitu@gmail.com> wrote:
>>>>>>
>>>>>> Hi Karl,
>>>>>>             Thanks for all your support. For one of our customer
they
>>>>>> need job scheduled information to be sent as part of output connector.
>>>>>> Basically my customer wants to know what all files are indexed in
one job
>>>>>> run using solr search.
>>>>>>
>>>>>> For example if my job ran on 17th dec 2014 at 11:23 AM then i will
>>>>>> send a unique string say "JobName 17-12-2014 11:23" as part of file
>>>>>> metadata to solr output connector. During solr search it will use
this
>>>>>> string to search what all files are indexed as part of this string
or job
>>>>>> run.
>>>>>>
>>>>>> Please correct me if i am wrong or suggest me how to achive it.
>>>>>>
>>>>>> Thanks,
>>>>>> Jitu
>>>>>>
>>>>>
>>>>
>>>
>>
>

Mime
View raw message