lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mark <javam...@gmail.com>
Subject Re: SimplePostTool with extracted Outlook messages
Date Tue, 27 Jan 2015 20:55:45 GMT
In the end I didn't find a way to add a new file/ mime type for recursing a
folder.

So I added msg to the static dtring and Mime map.

private static final String DEFAULT_FILE_TYPES =
"xml,json,csv,pdf,doc,docx,ppt,pptx,xls,xlsx,odt,odp,ods,ott,otp,ots,rtf,htm,html,txt,log,msg";

mimeMap.put("msg", "application/vnd.ms-outlook");

Regards

Mark


On 27 January 2015 at 18:39, Mark <javamark@gmail.com> wrote:

> Hi Alex,
>
> On an individual file basis that would work, since you could set the ID on
> an individual basis.
>
> However recuring a folder it doesn't work, and worse still the server
> complains, unless on the server side you can use the UpdateRequestProcessor
> chains with  UUID generator as you suggested.
>
> Thanks for eveyones suggestions.
>
> Regards
>
> Mark
>
> On 27 January 2015 at 18:01, Alexandre Rafalovitch <arafalov@gmail.com>
> wrote:
>
>> Your IDs seem to be the file names, which you are probably also getting
>> from your parsing the file. Can't you just set (or copyField) that as an
>> ID
>> on the Solr side?
>>
>> Alternatively, if you don't actually have good IDs, you could look into
>> UpdateRequestProcessor chains with  UUID generator.
>>
>> Regards,
>>
>>    Alex.
>> On 27/01/2015 12:24 pm, "Mark" <javamark@gmail.com> wrote:
>>
>> > Thanks Eric
>> >
>> > However
>> >
>> > java -classpath dist/solr-core-4.10.3.jar -Dauto=true
>> > org.apache.solr.util.SimplePostTool C:/temp/samplemsg/*.msg
>> >
>> > Fails with:
>> >
>> > osting files to base url http://localhost:8983/solr/update..
>> > ntering auto mode. File endings considered are
>> >
>> >
>> xml,json,csv,pdf,doc,docx,ppt,pptx,xls,xlsx,odt,odp,ods,ott,otp,ots,rtf,htm,html,txt,log
>> > implePostTool: WARNING: Skipping
>> > 000000006252671B765A1748992DF1A6403BDF81A4A02A00.msg. Unsupported file
>> type
>> > for auto mode.
>> > implePostTool: WARNING: Skipping
>> > 000000006252671B765A1748992DF1A6403BDF81A4A02B00.msg. Unsupported file
>> type
>> > for auto mode.
>> > implePostTool: WARNING: Skipping
>> > 000000006252671B765A1748992DF1A6403BDF81A4A02C00.msg. Unsupported file
>> type
>> > for auto mode.
>> >
>> > That's where I started looking into extending or adding support for
>> > additional types.
>> >
>> > Looking into the code as it stands passing you own URL as well as
>> asking it
>> > to recurse a folder means that is requires an ID strategy - which I
>> believe
>> > is lacking.
>> >
>> > Reagrds
>> >
>> > Mark
>> >
>> >
>>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message