manifoldcf-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Karl Wright <daddy...@gmail.com>
Subject Re: org.hsqldb.HsqlException: java.lang.NegativeArraySizeException
Date Fri, 18 Mar 2016 13:56:22 GMT
Hi Ian,

If you can can connect to your HSQLDB instance, you can simply drop all
rows from the table "repohistory".  That should make a difference.  Of
course it is possible that the database instance is corrupt now and nothing
can be done to fix it up.

Once you get back to a point where queries will work against your HSQLDB
instance, only then will the configuration changes to control simple
history table bloat work.

If you need to recreate everything, I do suggest you do it on Postgresql,
since it's easier to manage than HSQLDB and is meant for far larger
database instances.

Thanks,
Karl


On Fri, Mar 18, 2016 at 9:44 AM, Ian Zapczynski <
Ian.Zapczynski@veritablelp.com> wrote:

> Karl,
>
> Wow... 100 Mb vs. my 32+ Gb is certainly perplexing!
>
> I dropped HistoryCleanupInterval in properties.xml to 302400000 ms and
> have restarted and waited, but I don't see a difference in .data file
> size.   I tried to connect to HyperSQL directly and run a CHECKPOINT DEFRAG
> and SHUTDOWN COMPACT, but I must not be doing these correctly as the
> commands came back immediately with no effect whatsoever.
>
> Unless you think otherwise, I feel like I'm now only faced with a
> few options:
>
> 1)  Delete the database and re-run the job to reindex all files.   The
> problem will likely eventually return.
> 2)  Upgrade ManifoldCF to a recent release and see if the database
> magically shrinks.   Is there any logical hope in doing this?
> 3)  Begin using PostgreSQL instead.   This won't tell me what I'm
> apparently doing wrong, but it will give me more flexibility with database
> maintenance.
>
> What do you think?
>
> -Ian
>
> >>> Karl Wright <daddywri@gmail.com> 3/16/2016 2:10 PM >>>
> Hi Ian,
>
> This all looks very straightforward. Typical sizes of an HSQLDB database
> under this scenario would probably run well under 100M. What might be
> happening, though, is that you might be accumulating a huge history table.
> This would bloat your database until it falls over (which for HSQLDB is at
> 32GB).
>
> History records are used only for generation of reports. Normally MCF out
> of the box is configured to drop history rows older than a month. But if
> you are doing lots of crawling and want to stick with HSQLDB you might want
> to do it faster than that. There's a properties.xml parameter you can set
> to control the time interval these records are kept; see the
> how-to-build-and-deploy page.
>
> Thanks,
> Karl
>
>
> On Wed, Mar 16, 2016 at 1:05 PM, Ian Zapczynski <
> Ian.Zapczynski@veritablelp.com> wrote:
>
>> Thanks, Karl.
>> I am using a single Windows shares repository connection to a folder on
>> our file server which currently contains a total of 143,997 files and
>> 54,424 folders (59.2 Gb of total data) of which ManifoldCF seems to
>> identify just over 108,000 as indexable. The job specifies the following:
>> 1. Include indexable file(s) matching *
>> 2. Include directory(s) matching *
>>
>> No custom connectors. I kept this simple because I'm a simple guy. :-) As
>> such, it's entirely possible that I did something stupid when I set it up,
>> but I'm not seeing anything else obvious that seems worth pointing out.
>> -Ian
>>
>> >>> Karl Wright <daddywri@gmail.com> 3/16/2016 12:03 PM >>>
>> Hi Ian,
>>
>> The database size seems way too big for this crawl size. I've not seen
>> this problem before but I suspect that whatever is causing the bloat is
>> also causing HSQLDB to fail.
>>
>> Can you give me further details about what repository connections you are
>> using? It is possible that there's a heretofore unknown pathological case
>> you are running into during the crawl. Are there any custom connectors
>> involved?
>>
>> If we rule out a bug of some kind, then the next thing to do would be to
>> go to a real database, e.g. PostgreSQL.
>>
>> Karl
>>
>>
>> On Wed, Mar 16, 2016 at 11:04 AM, Ian Zapczynski <
>> Ian.Zapczynski@veritablelp.com> wrote:
>>
>>> Hello,
>>> We've had ManifoldCF 2.0.1 working well with SOLR for months on Windows
>>> 2012 using the single process model. We recently just noticed that new
>>> documents are not getting ingested, even after restarting the job, the
>>> server, etc. What I see in the logs are first a bunch of 500 errors coming
>>> out of SOLR as a result of ManifoldCF trying to index .tif files that are
>>> found in the directory structure being indexed. After that (not sure if
>>> related or not), I see a bunch of these errors:
>>> FATAL 2016-03-15 16:01:48,801 (Thread-1387745) -
>>> C:\apache-manifoldcf-2.0.1\example\.\./dbname.data getFromFile failed
>>> 33337202
>>> org.hsqldb.HsqlException: java.lang.NegativeArraySizeException
>>> at org.hsqldb.error.Error.error(Unknown Source)
>>> at org.hsqldb.persist.DataFileCache.getFromFile(Unknown Source)
>>> at org.hsqldb.persist.DataFileCache.get(Unknown Source)
>>> at org.hsqldb.persist.RowStoreAVLDisk.get(Unknown Source)
>>> at org.hsqldb.index.NodeAVLDisk.findNode(Unknown Source)
>>> at org.hsqldb.index.NodeAVLDisk.getRight(Unknown Source)
>>> at org.hsqldb.index.IndexAVL.next(Unknown Source)
>>> at org.hsqldb.index.IndexAVL.next(Unknown Source)
>>> at org.hsqldb.index.IndexAVL$IndexRowIterator.getNextRow(Unknown Source)
>>> at org.hsqldb.RangeVariable$RangeIteratorMain.findNext(Unknown Source)
>>> at org.hsqldb.RangeVariable$RangeIteratorMain.next(Unknown Source)
>>> at org.hsqldb.QuerySpecification.buildResult(Unknown Source)
>>> at org.hsqldb.QuerySpecification.getSingleResult(Unknown Source)
>>> at org.hsqldb.QuerySpecification.getResult(Unknown Source)
>>> at org.hsqldb.StatementQuery.getResult(Unknown Source)
>>> at org.hsqldb.StatementDMQL.execute(Unknown Source)
>>> at org.hsqldb.Session.executeCompiledStatement(Unknown Source)
>>> at org.hsqldb.Session.execute(Unknown Source)
>>> at org.hsqldb.jdbc.JDBCPreparedStatement.fetchResult(Unknown Source)
>>> at org.hsqldb.jdbc.JDBCPreparedStatement.executeQuery(Unknown Source)
>>> at
>>> org.apache.manifoldcf.core.database.Database.execute(Database.java:889)
>>> at
>>> org.apache.manifoldcf.core.database.Database$ExecuteQueryThread.run(Database.java:683)
>>> Caused by: java.lang.NegativeArraySizeException
>>> at org.hsqldb.lib.StringConverter.readUTF(Unknown Source)
>>> at org.hsqldb.rowio.RowInputBinary.readString(Unknown Source)
>>> at org.hsqldb.rowio.RowInputBinary.readChar(Unknown Source)
>>> at org.hsqldb.rowio.RowInputBase.readData(Unknown Source)
>>> at org.hsqldb.rowio.RowInputBinary.readData(Unknown Source)
>>> at org.hsqldb.rowio.RowInputBase.readData(Unknown Source)
>>> at org.hsqldb.rowio.RowInputBinary.readData(Unknown Source)
>>> at org.hsqldb.rowio.RowInputBinaryDecode.readData(Unknown Source)
>>> at org.hsqldb.RowAVLDisk.<init>(Unknown Source)
>>> at org.hsqldb.persist.RowStoreAVLDisk.get(Unknown Source)
>>> ... 21 more
>>> ERROR 2016-03-15 16:01:48,911 (Stuffer thread) - Stuffer thread aborting
>>> and restarting due to database connection reset: Database exception:
>>> SQLException doing query (S1000): java.lang.NegativeArraySizeException
>>> org.apache.manifoldcf.core.interfaces.ManifoldCFException: Database
>>> exception: SQLException doing query (S1000):
>>> java.lang.NegativeArraySizeException
>>> at
>>> org.apache.manifoldcf.core.database.Database$ExecuteQueryThread.finishUp(Database.java:702)
>>> at
>>> org.apache.manifoldcf.core.database.Database.executeViaThread(Database.java:728)
>>> at
>>> org.apache.manifoldcf.core.database.Database.executeUncachedQuery(Database.java:771)
>>> at
>>> org.apache.manifoldcf.core.database.Database$QueryCacheExecutor.create(Database.java:1444)
>>> at
>>> org.apache.manifoldcf.core.cachemanager.CacheManager.findObjectsAndExecute(CacheManager.java:146)
>>> at
>>> org.apache.manifoldcf.core.database.Database.executeQuery(Database.java:191)
>>> at
>>> org.apache.manifoldcf.core.database.DBInterfaceHSQLDB.performQuery(DBInterfaceHSQLDB.java:916)
>>> at
>>> org.apache.manifoldcf.core.database.BaseTable.performQuery(BaseTable.java:221)
>>> at
>>> org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester.getPipelineDocumentIngestDataChunk(IncrementalIngester.java:1783)
>>> at
>>> org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester.getPipelineDocumentIngestDataMultiple(IncrementalIngester.java:1748)
>>> at
>>> org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester.getPipelineDocumentIngestDataMultiple(IncrementalIngester.java:1703)
>>> at
>>> org.apache.manifoldcf.crawler.system.StufferThread.run(StufferThread.java:254)
>>> Caused by: java.sql.SQLException: java.lang.NegativeArraySizeException
>>> at org.hsqldb.jdbc.JDBCUtil.sqlException(Unknown Source)
>>> at org.hsqldb.jdbc.JDBCUtil.sqlException(Unknown Source)
>>> at org.hsqldb.jdbc.JDBCPreparedStatement.fetchResult(Unknown Source)
>>> at org.hsqldb.jdbc.JDBCPreparedStatement.executeQuery(Unknown Source)
>>> at
>>> org.apache.manifoldcf.core.database.Database.execute(Database.java:889)
>>> at
>>> org.apache.manifoldcf.core.database.Database$ExecuteQueryThread.run(Database.java:683)
>>> Caused by: org.hsqldb.HsqlException: java.lang.NegativeArraySizeException
>>> After these errors occur, the job just seems to hang and not process any
>>> further documents or log anything more in the manifoldcf.log. So I see the
>>> error is coming out of the HyperSQL database, but I don't know why. There
>>> is sufficient disk space. Now the database file is 33 Gb (larger than I'd
>>> expect for our ~110,000 documents), but I haven't seen any evidence that
>>> we're hitting a limit on file size. I'm afraid I'm not sure where to go
>>> from here to further nail down the problem.
>>> As always, any and all help is much appreciated.
>>> Thanks,
>>> -Ian
>>>
>>
>>
>

Mime
View raw message