lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Zheng Lin Edwin Yeo <edwinye...@gmail.com>
Subject Re: Problem with the Content Field during Solr Indexing
Date Sun, 01 Nov 2015 01:46:16 GMT
Hi Shruti,

>From what I understand, the /update/extract handler is for indexing
rich-text documents, and does not support ".png" files.

It only supports the following files format: pdf, doc, docx, ppt, pptx,
xls, xlsx, odt, odp, ods, ott, otp, ots, rtf, htm, html, txt, log
If you use the default post.jar, I believe the other formats will get
filtered out.

When I tried to index ".png" file in my custom handler, it just index "<p>
<p>" in the content.

Regards,
Edwin



On 31 October 2015 at 09:35, Shruti Mundra <mundra@usc.edu> wrote:

> Hi Edwin,
>
> The file extension of the image file is ".png" and we are following this
> url for indexing:
> "
>
> http://blog.thedigitalgroup.com/vijaym/wp-content/uploads/sites/11/2015/07/SolrImageExtract.png
> "
>
> Thanks and Regards,
> Shruti Mundra
>
> On Thu, Oct 29, 2015 at 8:33 PM, Zheng Lin Edwin Yeo <edwinyeozl@gmail.com
> >
> wrote:
>
> > The "\n" actually means new line as decoded by Solr from the indexed
> > document.
> >
> > What is your file extension of your image file, and which method are you
> > using to do the indexing?
> >
> > Regards,
> > Edwin
> >
> >
> > On 30 October 2015 at 04:38, Shruti Mundra <mundra@usc.edu> wrote:
> >
> > > Hi,
> > >
> > > When I'm trying index an image file directly to Solr, the attribute
> > > content, consists of trails of "\n"s and not the data.
> > > We are successful in getting the metadata for that image.
> > >
> > > Can anyone help us out on how we could get the content along with the
> > > Metadata.
> > >
> > > Thanks!
> > >
> > > - Shruti Mundra
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message