lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Anatharaman, Srinatha (Contractor)" <Srinatha_Ananthara...@comcast.com>
Subject RE: Issues with Solr Morphline reading RFC822 files
Date Tue, 14 Feb 2017 15:17:49 GMT
>From the original email below lines are not indexed, These are metadata appears before
the actual email

> Received: from resqmta-po-08v.sys.XXXX.net ([196.114.154.167])
>        by csp-imta02.westchester.pa.bo.XXXX.net with bizsmtp
>        id EClZ1u0013cy81c01E9enp; Wed, 30 Nov 2016 14:09:38 +0000
> Received: from resimta-po-14v.sys. XXXX.net ([96.114.154.142])
>        by resqmta-po-08v.sys.XXXX.net with SMTP
>        id C5ZqcRB3e2dNjC5ZqcQvHl; Wed, 30 Nov 2016 14:09:38 +0000
> Received: from outgoingemail1.digitalrightscorp.com ([69.36.73.150])
>        by resimta-po-14v.sys.XXXX.net with SMTP
>        id C5ZNcJfg9npCYC5Zcceh9K; Wed, 30 Nov 2016 14:09:25 +0000
> X-Xfinity-Message-Heuristics: IPv6:N;TLS=0;SPF=0;DMARC=
> Received: from outgoingemail1-69-150 (localhost [127.0.0.1])
>        by outgoingemail1. XXXXXRightsCorp.com (Postfix) with ESMTP id 15EB7100419
>        for <dmca@XXXX.net>; Wed, 30 Nov 2016 06:05:52 -0800 (PST)
> From: APMC@XXXXXRightsCorp.com
> To: dmca@XXXX.net
> Message-ID: 
> <551271522.6.1480514752082.JavaMail.root@outgoingemail1-69-150>



-----Original Message-----
From: Dave [mailto:hastings.recursive@gmail.com] 
Sent: Monday, February 13, 2017 5:49 PM
To: solr-user@lucene.apache.org
Subject: Re: Issues with Solr Morphline reading RFC822 files

Can't see what's color coded in the email. 

> On Feb 13, 2017, at 5:35 PM, Anatharaman, Srinatha (Contractor) <Srinatha_Anantharaman@comcast.com>
wrote:
> 
> Hi,
> 
> I am loading email files which are in RFC822 format into SolrCloud 
> using Flume But some meta data of the emails is not getting loaded to Solr.
> Please find below sample email, text which is colored in Bold Red is 
> ignored by Solr I can read this files ONLY using org.apache.tika.parser.mail.RFC822Parser
Parser, If I want to read it using TXTparser Solr ignores the files with error "No supported
MIME type found for _attachment_mimetype=message/rfc822"
> 
> How do I overcome this issue? I want to read the emails files without 
> losing single word from the file
> 
> Received: from resqmta-po-08v.sys.XXXX.net ([196.114.154.167])
>        by csp-imta02.westchester.pa.bo.XXXX.net with bizsmtp
>        id EClZ1u0013cy81c01E9enp; Wed, 30 Nov 2016 14:09:38 +0000
> Received: from resimta-po-14v.sys. XXXX.net ([96.114.154.142])
>        by resqmta-po-08v.sys.XXXX.net with SMTP
>        id C5ZqcRB3e2dNjC5ZqcQvHl; Wed, 30 Nov 2016 14:09:38 +0000
> Received: from outgoingemail1.digitalrightscorp.com ([69.36.73.150])
>        by resimta-po-14v.sys.XXXX.net with SMTP
>        id C5ZNcJfg9npCYC5Zcceh9K; Wed, 30 Nov 2016 14:09:25 +0000
> X-Xfinity-Message-Heuristics: IPv6:N;TLS=0;SPF=0;DMARC=
> Received: from outgoingemail1-69-150 (localhost [127.0.0.1])
>        by outgoingemail1. XXXXXRightsCorp.com (Postfix) with ESMTP id 15EB7100419
>        for <dmca@XXXX.net>; Wed, 30 Nov 2016 06:05:52 -0800 (PST)
> From: APMC@XXXXXRightsCorp.com
> To: dmca@XXXX.net
> Message-ID: 
> <551271522.6.1480514752082.JavaMail.root@outgoingemail1-69-150>
> Subject: Unauthorized Use of Copyrights RE:
> TC-cc0ae97d-8918-4a4b-8515-749ff9303bc0
> MIME-Version: 1.0
> Content-Type: text/plain; charset=us-ascii
> Content-Transfer-Encoding: 7bit
> Date: Wed, 30 Nov 2016 06:05:52 -0800 (PST)
> X-CMAE-Envelope: 
> MS4wfAIoEnMl1VVV7nPS/7pis5Gr/ijSjTNaioaGiZVCAo4cXRoeTl9Z1Nt8SYSY4kX7Rp
> DlZuxzGbzyeRDJIorfdeodi9fzNtQETs56Or8SwlysmgQQQt4R
> kKDdiZaRx3Q0be579K6C4XZGyRC6JMDzDi1X6bXgBL8KYDFFA/aEyOBd+2Zrz1YpOi2aTj
> zyRc4d4MXJwaIGivtlXtZc6R5KypOhVP6eX1kx/qV9OwVzXAz6
> 
> **NOTE TO ISP: PLEASE FORWARD THE ENTIRE NOTICE***
> 
> Re: Unauthorized Use of Copyrights Owned Exclusively by The Bicycle 
> Music Company
> 
> Reference#: ZBP96D4  IP Address: 73.166.122.44
> 
> Dear Sir or Madam:
> .
> .
> .
> .
> .
> .
> 
> 
> Regards,
> ~Sri


Mime
View raw message