manifoldcf-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kamil Żyta <kamil.z...@pwr.edu.pl>
Subject Re: ElastiSearch missing doc
Date Thu, 18 Dec 2014 07:51:33 GMT
Hi,
any tips on how to solve the problem?

K

On Tue, Dec 16, 2014 at 05:06:33PM +0100, Kamil Żyta wrote:
> All *.7z files causes a problem. Example in attachment.
> 
> K
> 
> On Tue, Dec 16, 2014 at 10:20:24AM -0500, Karl Wright wrote:
> > Hi Kamil,
> > 
> > If it happens again, see if you can find an archive file that it happens
> > on.  It's easy to do that: you just want to drop the file you suspect down
> > in the file system somewhere, and set up a file system job to crawl that
> > one file, making sure you send it through the Tika transformer of course.
> > You can use a null output connection.
> > 
> > If you can reproduce the problem with just that one file, then if you send
> > me the file I can work with it here and determine whether the problem is
> > local to your system or is a more general issue.
> > 
> > Thanks,
> > Karl
> > 
> > 
> > On Tue, Dec 16, 2014 at 9:55 AM, Kamil Żyta <kamil.zyta@pwr.edu.pl> wrote:
> > >
> > > err, only a few jobs causes a problem (the rest probably does not have
> > > archives).
> > > I don't know which files you ask.
> > >
> > > K
> > >
> > >
> > > On Tue, Dec 16, 2014 at 03:15:16PM +0100, Kamil Żyta wrote:
> > > > Ok, I rebuilt mcf and the problem still was so I restart all jobs and
no
> > > problem.
> > > > Thx Karl for your time.
> > > >
> > > > K
> > > >
> > > > On Tue, Dec 16, 2014 at 07:28:34AM -0500, Karl Wright wrote:
> > > > > The commons-compress code just makes a simple reference to the Coder
> > > class,
> > > > > with no reflection or anything suspicious going on.
> > > > >
> > > > > If you can send me the binary file that causes this issue, I can
verify
> > > > > whether it happens here or not.  (If this seems to happen for you
on
> > > ALL
> > > > > files, then I can already assure you that it does not happen here,
and
> > > > > you've got something very special happening in your environment.)
 If
> > > this
> > > > > is not reproducible here then you probably have a corrupt
> > > commons-compress
> > > > > jar and need to download it again.  If it *is* reproducible, then
> > > probably
> > > > > we will need to create an Oracle Java bug ticket.
> > > > >
> > > > > Karl
> > > > >
> > > > >
> > > > >
> > > > > On Tue, Dec 16, 2014 at 7:17 AM, Karl Wright <daddywri@gmail.com>
> > > wrote:
> > > > > >
> > > > > > Ok, then I don't understand it.  It may be a bug in
> > > commons-compress, or
> > > > > > maybe even a bad jar.  I'll have a look at their code and see
if I
> > > can
> > > > > > figure out why that class won't load.
> > > > > >
> > > > > > Karl
> > > > > >
> > > > > >
> > > > > > On Tue, Dec 16, 2014 at 7:06 AM, Kamil Żyta <kamil.zyta@pwr.edu.pl>
> > > wrote:
> > > > > >>
> > > > > >> I use multiprocess-zk-example, external pgsql db and splited
war.
> > > > > >>
> > > > > >> K
> > > > > >>
> > > > > >> On Tue, Dec 16, 2014 at 07:02:47AM -0500, Karl Wright wrote:
> > > > > >> > Hi Kamil,
> > > > > >> >
> > > > > >> > Which example are you using?  is this with the combined
war, or
> > > is it
> > > > > >> one
> > > > > >> > of the multiprocess examples, or is it the single-process
quick
> > > start?
> > > > > >> >
> > > > > >> > I really don't have any idea why a class that IS found
in a
> > > particular
> > > > > >> jar
> > > > > >> > cannot in turn find another class in the same jar,
so I'll need
> > > as many
> > > > > >> > details as possible.
> > > > > >> >
> > > > > >> > Karl
> > > > > >> >
> > > > > >> >
> > > > > >> > On Tue, Dec 16, 2014 at 6:54 AM, Kamil Żyta <
> > > kamil.zyta@pwr.edu.pl>
> > > > > >> wrote:
> > > > > >> > >
> > > > > >> > > > find . -iname 'commons-compress*'
> > > > > >> > > ./lib/commons-compress-1.8.1.jar
> > > > > >> > > ./dist/lib/commons-compress-1.8.1.jar
> > > > > >> > > ./framework/dist/lib/commons-compress-1.8.1.jar
> > > > > >> > >
> > > > > >> > >
> > > > > >>
> > > ./framework/build/webapp/crawler-ui-proprietary/WEB-INF/lib/commons-compress-1.8.1.jar
> > > > > >> > >
> > > > > >> > >
> > > > > >>
> > > ./framework/build/webapp/combined-service-proprietary/WEB-INF/lib/commons-compress-1.8.1.jar
> > > > > >> > >
> > > > > >> > >
> > > > > >>
> > > ./framework/build/webapp/authority-service/WEB-INF/lib/commons-compress-1.8.1.jar
> > > > > >> > >
> > > > > >>
> > > ./framework/build/webapp/crawler-ui/WEB-INF/lib/commons-compress-1.8.1.jar
> > > > > >> > >
> > > > > >>
> > > ./framework/build/webapp/api-service/WEB-INF/lib/commons-compress-1.8.1.jar
> > > > > >> > >
> > > > > >> > >
> > > > > >>
> > > ./framework/build/webapp/combined-service/WEB-INF/lib/commons-compress-1.8.1.jar
> > > > > >> > >
> > > > > >> > >
> > > > > >>
> > > ./framework/build/webapp/authority-service-proprietary/WEB-INF/lib/commons-compress-1.8.1.jar
> > > > > >> > >
> > > > > >> > >
> > > > > >>
> > > ./framework/build/webapp/api-service-proprietary/WEB-INF/lib/commons-compress-1.8.1.jar
> > > > > >> > >
> > > > > >> > > I follow
> > > > > >> > >
> > > > > >>
> > > https://manifoldcf.apache.org/release/trunk/en_US/how-to-build-and-deploy.html#Building+the+framework+and+the+connectors+using+Apache+Ant
> > > > > >> > > There aren't anything about clean-core-deps. I
checkout fresh
> > > > > >> > > release-1.8-branch.
> > > > > >> > >
> > > > > >> > > K
> > > > > >> > >
> > > > > >> > > On Tue, Dec 16, 2014 at 06:38:18AM -0500, Karl
Wright wrote:
> > > > > >> > > > Hi Kamil,
> > > > > >> > > >
> > > > > >> > > > I've confirmed that this should not be a
classloader issue.
> > > The
> > > > > >> class in
> > > > > >> > > > question is in commons-compress.jar at the
root level (under
> > > > > >> dist/lib).
> > > > > >> > > > The only way this would not be loadable is
if you had TWO
> > > > > >> > > commons-compress
> > > > > >> > > > jars in your lib area.  This is possible
if you upgraded to
> > > mcf 1.8
> > > > > >> and
> > > > > >> > > did
> > > > > >> > > > not do a make clean-core-deps before you
did a make-core-deps,
> > > > > >> because
> > > > > >> > > now
> > > > > >> > > > all jars have versions attached to their
names.
> > > > > >> > > >
> > > > > >> > > > Please confirm you do not have duplicate
jars in this
> > > directory.
> > > > > >> > > >
> > > > > >> > > > Thanks,
> > > > > >> > > > Karl
> > > > > >> > > >
> > > > > >> > > >
> > > > > >> > > > On Tue, Dec 16, 2014 at 6:31 AM, Karl Wright
<
> > > daddywri@gmail.com>
> > > > > >> wrote:
> > > > > >> > > > >
> > > > > >> > > > > Hi Kamil,
> > > > > >> > > > >
> > > > > >> > > > > Your problem looks like a potential
classloader issue.  Let
> > > me do
> > > > > >> some
> > > > > >> > > > > research and get back to you.
> > > > > >> > > > >
> > > > > >> > > > > Karl
> > > > > >> > > > >
> > > > > >> > > > >
> > > > > >> > > > >
> > > > > >> > > > >
> > > > > >> > > > >
> > > > > >> > > > > On Tue, Dec 16, 2014 at 5:34 AM, Kamil
Żyta <
> > > > > >> kamil.zyta@pwr.edu.pl>
> > > > > >> > > wrote:
> > > > > >> > > > >>
> > > > > >> > > > >> thx Karl but now I have new issue:
> > > > > >> > > > >>
> > > > > >> > > > >> FATAL 2014-12-16 11:12:58,496 (Worker
thread '47') - Error
> > > > > >> tossed:
> > > > > >> > > Could
> > > > > >> > > > >> not initialize class
> > > > > >> > > org.apache.commons.compress.archivers.sevenz.Coders
> > > > > >> > > > >> java.lang.NoClassDefFoundError:
Could not initialize class
> > > > > >> > > > >> org.apache.commons.compress.archivers.sevenz.Coders
> > > > > >> > > > >>         at
> > > > > >> > > > >>
> > > > > >> > >
> > > > > >>
> > > org.apache.commons.compress.archivers.sevenz.SevenZFile.readEncodedHeader(SevenZFile.java:279)
> > > > > >> > > > >>         at
> > > > > >> > > > >>
> > > > > >> > >
> > > > > >>
> > > org.apache.commons.compress.archivers.sevenz.SevenZFile.readHeaders(SevenZFile.java:191)
> > > > > >> > > > >>         at
> > > > > >> > > > >>
> > > > > >> > >
> > > > > >>
> > > org.apache.commons.compress.archivers.sevenz.SevenZFile.<init>(SevenZFile.java:95)
> > > > > >> > > > >>         at
> > > > > >> > > > >>
> > > > > >> > >
> > > > > >>
> > > org.apache.commons.compress.archivers.sevenz.SevenZFile.<init>(SevenZFile.java:117)
> > > > > >> > > > >>         at
> > > > > >> > > > >>
> > > > > >>
> > > org.apache.tika.parser.pkg.PackageParser.parse(PackageParser.java:130)
> > > > > >> > > > >>         at
> > > > > >> > > > >>
> > > > > >>
> > > org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:244)
> > > > > >> > > > >>         at
> > > > > >> > > > >>
> > > > > >>
> > > org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:244)
> > > > > >> > > > >>         at
> > > > > >> > > > >>
> > > > > >> > >
> > > > > >>
> > > org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:121)
> > > > > >> > > > >>         at
> > > > > >> > > > >>
> > > > > >> > >
> > > > > >>
> > > org.apache.manifoldcf.agents.transformation.tika.TikaExtractor.addOrReplaceDocumentWithException(TikaExtractor.java:230)
> > > > > >> > > > >>         at
> > > > > >> > > > >>
> > > > > >> > >
> > > > > >>
> > > org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineAddEntryPoint.addOrReplaceDocumentWithException(IncrementalIngester.java:3257)
> > > > > >> > > > >>         at
> > > > > >> > > > >>
> > > > > >> > >
> > > > > >>
> > > org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineAddFanout.sendDocument(IncrementalIngester.java:3108)
> > > > > >> > > > >>         at
> > > > > >> > > > >>
> > > > > >> > >
> > > > > >>
> > > org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester$PipelineObjectWithVersions.addOrReplaceDocumentWithException(IncrementalIngester.java:2739)
> > > > > >> > > > >>         at
> > > > > >> > > > >>
> > > > > >> > >
> > > > > >>
> > > org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester.documentIngest(IncrementalIngester.java:792)
> > > > > >> > > > >>         at
> > > > > >> > > > >>
> > > > > >> > >
> > > > > >>
> > > org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.ingestDocumentWithException(WorkerThread.java:1610)
> > > > > >> > > > >>         at
> > > > > >> > > > >>
> > > > > >> > >
> > > > > >>
> > > org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.ingestDocumentWithException(WorkerThread.java:1558)
> > > > > >> > > > >>         at
> > > > > >> > > > >>
> > > > > >> > >
> > > > > >>
> > > org.apache.manifoldcf.crawler.connectors.sharedrive.SharedDriveConnector.processDocuments(SharedDriveConnector.java:911)
> > > > > >> > > > >>         at
> > > > > >> > > > >>
> > > > > >> > >
> > > > > >>
> > > org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:383)
> > > > > >> > > > >>
> > > > > >> > > > >> And another question: I use Solr
4.10 with Tika 1.5. MCF
> > > 1.8
> > > > > >> have tika
> > > > > >> > > > >> 1.6. How this affect document parsing?
> > > > > >> > > > >>
> > > > > >> > > > >> K
> > > > > >> > > > >>
> > > > > >> > > > >> On Mon, Dec 15, 2014 at 08:45:31AM
-0500, Karl Wright
> > > wrote:
> > > > > >> > > > >> > If you changed this file, you
would need to rerun
> > > > > >> initialize.sh in
> > > > > >> > > > >> order to
> > > > > >> > > > >> > register the connector.
> > > > > >> > > > >> >
> > > > > >> > > > >> > Karl
> > > > > >> > > > >> >
> > > > > >> > > > >> >
> > > > > >> > > > >> > On Mon, Dec 15, 2014 at 8:42
AM, Kamil Żyta <
> > > > > >> kamil.zyta@pwr.edu.pl>
> > > > > >> > > > >> wrote:
> > > > > >> > > > >> > >
> > > > > >> > > > >> > > the same as connectors.xml:
> > > > > >> > > > >> > > (...)
> > > > > >> > > > >> > > <repositoryconnector
name="Windows shares"
> > > > > >> > > > >> > >
> > > > > >> > > > >>
> > > > > >> > >
> > > > > >>
> > > class="org.apache.manifoldcf.crawler.connectors.sharedrive.SharedDriveConnector"/>
> > > > > >> > > > >> > > (...)
> > > > > >> > > > >> > >
> > > > > >> > > > >> > > K
> > > > > >> > > > >> > >
> > > > > >> > > > >> > > On Mon, Dec 15, 2014 at
08:39:07AM -0500, Karl Wright
> > > wrote:
> > > > > >> > > > >> > > > Hi Kamil,
> > > > > >> > > > >> > > >
> > > > > >> > > > >> > > > What does connectors-proprietary.xml
say about the
> > > jcifs
> > > > > >> > > connector?
> > > > > >> > > > >> > > >
> > > > > >> > > > >> > > > Karl
> > > > > >> > > > >> > > >
> > > > > >> > > > >> > > >
> > > > > >> > > > >> > > > On Mon, Dec 15, 2014
at 8:35 AM, Kamil Żyta <
> > > > > >> > > kamil.zyta@pwr.edu.pl>
> > > > > >> > > > >> > > wrote:
> > > > > >> > > > >> > > > >
> > > > > >> > > > >> > > > > Right, thx.
Another problem:
> > > > > >> > > > >> > > > > >
> > > > > >> > > > >> > > > >
> > > > > >> > > > >> > >
> > > > > >> > > > >>
> > > > > >> > >
> > > > > >>
> > > org.apache.manifoldcf.crawler.connectors.sharedrive.SharedDriveConnector(uninstalled)
> > > > > >> > > > >> > > > >
> > > > > >> > > > >> > > > > properties.xml:
> > > > > >> > > > >> > > > > <libdir path="../connector-lib-proprietary"/>
> > > > > >> > > > >> > > > >
> > > > > >> > > > >> > > > > > cat ../connectors.xml
> > > > > >> > > > >> > > > > <repositoryconnector
name="Windows shares"
> > > > > >> > > > >> > > > >
> > > > > >> > > > >> > >
> > > > > >> > > > >>
> > > > > >> > >
> > > > > >>
> > > class="org.apache.manifoldcf.crawler.connectors.sharedrive.SharedDriveConnector"/>
> > > > > >> > > > >> > > > >
> > > > > >> > > > >> > > > > > ls ../connector-lib-proprietary
> > > > > >> > > > >> > > > > jcifs.jar
> > > > > >> > > > >> > > > >
> > > > > >> > > > >> > > > > I think I checked/restarted
everything.
> > > > > >> > > > >> > > > >
> > > > > >> > > > >> > > > > K
> > > > > >> > > > >> > > > >
> > > > > >> > > > >> > > > > On Mon, Dec
15, 2014 at 08:00:12AM -0500, Karl
> > > Wright
> > > > > >> wrote:
> > > > > >> > > > >> > > > > > You have
to run ./initialize.sh on the MCF 1.8
> > > > > >> codebase for
> > > > > >> > > the
> > > > > >> > > > >> > > upgrade
> > > > > >> > > > >> > > > > to
> > > > > >> > > > >> > > > > > take place.
> > > > > >> > > > >> > > > > >
> > > > > >> > > > >> > > > > > Karl
> > > > > >> > > > >> > > > > >
> > > > > >> > > > >> > > > > >
> > > > > >> > > > >> > > > > > On Mon,
Dec 15, 2014 at 7:43 AM, Kamil Żyta <
> > > > > >> > > > >> kamil.zyta@pwr.edu.pl>
> > > > > >> > > > >> > > > > wrote:
> > > > > >> > > > >> > > > > > >
> > > > > >> > > > >> > > > > > > With
release-1.8-branch is the same problem.
> > > > > >> > > > >> > > > > > >
> > > > > >> > > > >> > > > > > > K
> > > > > >> > > > >> > > > > > >
> > > > > >> > > > >> > > > > > > On
Mon, Dec 15, 2014 at 06:47:12AM -0500, Karl
> > > Wright
> > > > > >> > > wrote:
> > > > > >> > > > >> > > > > > > >
Hi Kamil,
> > > > > >> > > > >> > > > > > > >
> > > > > >> > > > >> > > > > > > >
You cannot upgrade to trunk from 1.x.
> > > > > >> > > > >> > > > > > > >
> > > > > >> > > > >> > > > > > > >
Try upgrading to branches/release-1.8-branch.
> > > > > >> > > > >> > > > > > > >
> > > > > >> > > > >> > > > > > > >
Karl
> > > > > >> > > > >> > > > > > > >
> > > > > >> > > > >> > > > > > > >
> > > > > >> > > > >> > > > > > > >
On Mon, Dec 15, 2014 at 3:39 AM, Kamil Żyta <
> > > > > >> > > > >> > > kamil.zyta@pwr.edu.pl>
> > > > > >> > > > >> > > > > > > wrote:
> > > > > >> > > > >> > > > > > > >
>
> > > > > >> > > > >> > > > > > > >
> Hi,
> > > > > >> > > > >> > > > > > > >
> after upgrading to trunk I get 'Database
> > > > > >> exception:
> > > > > >> > > > >> > > SQLException
> > > > > >> > > > >> > > > > doing
> > > > > >> > > > >> > > > > > > >
> query (42703): ERROR: column
> > > "needpriority" does
> > > > > >> not
> > > > > >> > > > >> exist'.
> > > > > >> > > > >> > > > > > > >
> How can I upgrade db schema? I tried
> > > > > >> ./initialize.sh
> > > > > >> > > > >> without
> > > > > >> > > > >> > > > > success.
> > > > > >> > > > >> > > > > > > >
>
> > > > > >> > > > >> > > > > > > >
> K
> > > > > >> > > > >> > > > > > > >
>
> > > > > >> > > > >> > > > > > > >
> On Fri, Dec 12, 2014 at 10:40:39AM -0500,
> > > Karl
> > > > > >> Wright
> > > > > >> > > > >> wrote:
> > > > > >> > > > >> > > > > > > >
> > Ok, committed a fix. CONNECTORS-1121.
> > > > > >> > > > >> > > > > > > >
> >
> > > > > >> > > > >> > > > > > > >
> > Karl
> > > > > >> > > > >> > > > > > > >
> >
> > > > > >> > > > >> > > > > > > >
> >
> > > > > >> > > > >> > > > > > > >
> > On Fri, Dec 12, 2014 at 10:32 AM, Karl
> > > Wright <
> > > > > >> > > > >> > > > > daddywri@gmail.com>
> > > > > >> > > > >> > > > > > > >
> wrote:
> > > > > >> > > > >> > > > > > > >
> > >
> > > > > >> > > > >> > > > > > > >
> > > Ah, thanks, this is due to changes I
> > > made
> > > > > >> > > yesterday.
> > > > > >> > > > >> > > > > > > >
> > >
> > > > > >> > > > >> > > > > > > >
> > > Hold on.
> > > > > >> > > > >> > > > > > > >
> > > Karl
> > > > > >> > > > >> > > > > > > >
> > >
> > > > > >> > > > >> > > > > > > >
> > >
> > > > > >> > > > >> > > > > > > >
> > > On Fri, Dec 12, 2014 at 10:12 AM,
> > > Kamil Żyta
> > > > > >> <
> > > > > >> > > > >> > > > > > > kamil.zyta@pwr.edu.pl>
> > > > > >> > > > >> > > > > > > >
> > > wrote:
> > > > > >> > > > >> > > > > > > >
> > >>
> > > > > >> > > > >> > > > > > > >
> > >> On Fri, Dec 12, 2014 at 09:55:41AM
> > > -0500,
> > > > > >> Karl
> > > > > >> > > Wright
> > > > > >> > > > >> > > wrote:
> > > > > >> > > > >> > > > > > > >
> > >> > I've created CONNECTORS-1120 for
> > > this
> > > > > >> fix.  I
> > > > > >> > > > >> should
> > > > > >> > > > >> > > have
> > > > > >> > > > >> > > > > > > something
> > > > > >> > > > >> > > > > > > >
> to
> > > > > >> > > > >> > > > > > > >
> > >> try
> > > > > >> > > > >> > > > > > > >
> > >> > shortly.
> > > > > >> > > > >> > > > > > > >
> > >> >
> > > > > >> > > > >> > > > > > > >
> > >>
> > > > > >> > > > >> > > > > > > >
> > >> I can't build mcf from source:
> > > > > >> > > > >> > > > > > > >
> > >> BUILD FAILED
> > > > > >> > > > >> > > > > > > >
> > >> /opt/mcf-trunk/build.xml:1438: Can't
> > > get
> > > > > >> > > > >> > > > > > > >
> > >>
> > > > > >> > > > >> > > > > > > >
>
> > > > > >> > > > >> > > > > > >
> > > > > >> > > > >> > > > >
> > > > > >> > > > >> > >
> > > > > >> > > > >>
> > > > > >> > >
> > > > > >>
> > > https://www.apache.org/dist/manifoldcf/apache-manifoldcf-elasticsearch-plugin-2.0-bin.zip
> > > > > >> > > > >> > > > > > > >
> > >> to
> > > > > >> > > > >> > > > > > > >
> > >>
> > > > > >> > > > >> > > > > > > >
>
> > > > > >> > > > >> > > > > > >
> > > > > >> > > > >> > > > >
> > > > > >> > > > >> > >
> > > > > >> > > > >>
> > > > > >> > >
> > > > > >>
> > > /opt/mcf-trunk/build/download/apache-manifoldcf-elasticsearch-plugin-bin.zip
> > > > > >> > > > >> > > > > > > >
> > >>
> > > > > >> > > > >> > > > > > > >
> > >> K
> > > > > >> > > > >> > > > > > > >
> > >>
> > > > > >> > > > >> > > > > > > >
> > >
> > > > > >> > > > >> > > > > > > >
>
> > > > > >> > > > >> > > > > > >
> > > > > >> > > > >> > > > >
> > > > > >> > > > >> > >
> > > > > >> > > > >>
> > > > > >> > > > >
> > > > > >> > >
> > > > > >>
> > > > > >
> > >



Mime
View raw message