ignite-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Denis Magda <dma...@apache.org>
Subject Re: Partition eviction failed, this can cause grid hang. (Caused by: java.lang.IllegalStateException: Failed to get page IO instance (page content is corrupted))
Date Tue, 26 Dec 2017 18:15:37 GMT
Cross-posting to the dev list.

Ignite persistence maintainers please chime in.

—
Denis

> On Dec 26, 2017, at 2:17 AM, Arseny Kovalchuk <arseny.kovalchuk@synesis.ru> wrote:
> 
> Hi guys.
> 
> Another issue when using Ignite 2.3 with native persistence enabled. See details below.
> 
> We deploy Ignite along with our services in Kubernetes (v 1.8) on premises. Ignite cluster
is a StatefulSet of 5 Pods (5 instances) of Ignite version 2.3. Each Pod mounts PersistentVolume
backed by CEPH RBD. 
> 
> We put about 230 events/second into Ignite, 70% of events are ~200KB in size and 30%
are 5000KB. Smaller events have indexed fields and we query them via SQL.
> 
> The cluster is activated from a client node which also streams events into Ignite from
Kafka. We use custom implementation of streamer which uses cache.putAll() API.
> 
> We started cluster from scratch without any persistent data. After a while we got corrupted
data with the error message.
> 
> [2017-12-26 07:44:14,251] ERROR [sys-#127%ignite-instance-2%] org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPreloader:
- Partition eviction failed, this can cause grid hang.
> class org.apache.ignite.IgniteException: Runtime failure on search row: Row@5b1479d6[
key: 171:1513946618964:3008806055072854, val: ru.synesis.kipod.event.KipodEvent [idHash=510912646,
hash=-387621419, face_last_name=null, face_list_id=null, channel=171, source=, face_similarity=null,
license_plate_number=null, descriptors=null, cacheName=kipod_events, cacheKey=171:1513946618964:3008806055072854,
stream=171, alarm=false, processed_at=0, face_id=null, id=3008806055072854, persistent=false,
face_first_name=null, license_plate_first_name=null, face_full_name=null, level=0, module=Kpx.Synesis.Outdoor,
end_time=1513946624379, params=null, commented_at=0, tags=[vehicle, 0, human, 0, truck, 0,
start_time=1513946618964, processed=false, kafka_offset=111259, license_plate_last_name=null,
armed=false, license_plate_country=null, topic=MovingObject, comment=, expiration=1514033024000,
original_id=null, license_plate_lists=null], ver: GridCacheVersion [topVer=125430590, order=1513955001926,
nodeOrder=3] ][ 3008806055072854, MovingObject, Kpx.Synesis.Outdoor, 0, , 1513946618964, 1513946624379,
171, 171, FALSE, FALSE, , FALSE, FALSE, 0, 0, 111259, 1514033024000, (vehicle, 0, human, 0,
truck, 0), null, null, null, null, null, null, null, null, null, null, null, null ]
> 	at org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.doRemove(BPlusTree.java:1787)
> 	at org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.remove(BPlusTree.java:1578)
> 	at org.apache.ignite.internal.processors.query.h2.database.H2TreeIndex.remove(H2TreeIndex.java:216)
> 	at org.apache.ignite.internal.processors.query.h2.opt.GridH2Table.doUpdate(GridH2Table.java:496)
> 	at org.apache.ignite.internal.processors.query.h2.opt.GridH2Table.update(GridH2Table.java:423)
> 	at org.apache.ignite.internal.processors.query.h2.IgniteH2Indexing.remove(IgniteH2Indexing.java:580)
> 	at org.apache.ignite.internal.processors.query.GridQueryProcessor.remove(GridQueryProcessor.java:2334)
> 	at org.apache.ignite.internal.processors.cache.query.GridCacheQueryManager.remove(GridCacheQueryManager.java:461)
> 	at org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl$CacheDataStoreImpl.finishRemove(IgniteCacheOffheapManagerImpl.java:1453)
> 	at org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl$CacheDataStoreImpl.remove(IgniteCacheOffheapManagerImpl.java:1416)
> 	at org.apache.ignite.internal.processors.cache.persistence.GridCacheOffheapManager$GridCacheDataStore.remove(GridCacheOffheapManager.java:1271)
> 	at org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl.remove(IgniteCacheOffheapManagerImpl.java:374)
> 	at org.apache.ignite.internal.processors.cache.GridCacheMapEntry.removeValue(GridCacheMapEntry.java:3233)
> 	at org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtCacheEntry.clearInternal(GridDhtCacheEntry.java:588)
> 	at org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtLocalPartition.clearAll(GridDhtLocalPartition.java:951)
> 	at org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtLocalPartition.tryEvict(GridDhtLocalPartition.java:809)
> 	at org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPreloader$3.call(GridDhtPreloader.java:593)
> 	at org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPreloader$3.call(GridDhtPreloader.java:580)
> 	at org.apache.ignite.internal.util.IgniteUtils.wrapThreadLoader(IgniteUtils.java:6631)
> 	at org.apache.ignite.internal.processors.closure.GridClosureProcessor$2.body(GridClosureProcessor.java:967)
> 	at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:110)
> 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> 	at java.lang.Thread.run(Thread.java:748)
> Caused by: java.lang.IllegalStateException: Failed to get page IO instance (page content
is corrupted)
> 	at org.apache.ignite.internal.processors.cache.persistence.tree.io.IOVersions.forVersion(IOVersions.java:83)
> 	at org.apache.ignite.internal.processors.cache.persistence.tree.io.IOVersions.forPage(IOVersions.java:95)
> 	at org.apache.ignite.internal.processors.cache.persistence.CacheDataRowAdapter.initFromLink(CacheDataRowAdapter.java:148)
> 	at org.apache.ignite.internal.processors.cache.persistence.CacheDataRowAdapter.initFromLink(CacheDataRowAdapter.java:102)
> 	at org.apache.ignite.internal.processors.query.h2.database.H2RowFactory.getRow(H2RowFactory.java:62)
> 	at org.apache.ignite.internal.processors.query.h2.database.io.H2ExtrasLeafIO.getLookupRow(H2ExtrasLeafIO.java:126)
> 	at org.apache.ignite.internal.processors.query.h2.database.io.H2ExtrasLeafIO.getLookupRow(H2ExtrasLeafIO.java:36)
> 	at org.apache.ignite.internal.processors.query.h2.database.H2Tree.getRow(H2Tree.java:123)
> 	at org.apache.ignite.internal.processors.query.h2.database.H2Tree.getRow(H2Tree.java:40)
> 	at org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.getRow(BPlusTree.java:4372)
> 	at org.apache.ignite.internal.processors.query.h2.database.H2Tree.compare(H2Tree.java:200)
> 	at org.apache.ignite.internal.processors.query.h2.database.H2Tree.compare(H2Tree.java:40)
> 	at org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.compare(BPlusTree.java:4359)
> 	at org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.findInsertionPoint(BPlusTree.java:4279)
> 	at org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.access$1500(BPlusTree.java:81)
> 	at org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree$Search.run0(BPlusTree.java:261)
> 	at org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree$GetPageHandler.run(BPlusTree.java:4697)
> 	at org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree$GetPageHandler.run(BPlusTree.java:4682)
> 	at org.apache.ignite.internal.processors.cache.persistence.tree.util.PageHandler.readPage(PageHandler.java:158)
> 	at org.apache.ignite.internal.processors.cache.persistence.DataStructure.read(DataStructure.java:319)
> 	at org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.removeDown(BPlusTree.java:1823)
> 	at org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.removeDown(BPlusTree.java:1842)
> 	at org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.removeDown(BPlusTree.java:1842)
> 	at org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.removeDown(BPlusTree.java:1842)
> 	at org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.doRemove(BPlusTree.java:1752)
> 	... 23 more
> 
> After restart we also get this error. See ignite-instance-2.log. 
> 
> The cache-config.xml is used for server instances.
> The ignite-common-cache-conf.xml is used for client instances which activete cluster
and stream data from Kafka into Ignite.
> 
> Is it possible to tune up (or implement) native persistence in a way when it just reports
about error in data or corrupted data, then skip it and continue to work without that corrupted
part. Thus it will make the cluster to continue operating regardless of errors on storage?
> 
> 
> ​Arseny Kovalchuk
> 
> Senior Software Engineer at Synesis
> skype: arseny.kovalchuk
> mobile: +375 (29) 666-16-16
> ​LinkedIn Profile <http://www.linkedin.com/in/arsenykovalchuk/en>​
> <ignite-instance-0.log><ignite-instance-1.log><ignite-instance-2.log><ignite-instance-3.log><ignite-instance-4.log><cache-config.xml><ignite-discovery-kubernetes.xml><ignite-common.xml><ignite-common-storage.xml><ignite-common-entity.xml>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message