ignite-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Pavel Kovalenko <jokse...@gmail.com>
Subject Re: Ignite as distributed file storage
Date Thu, 05 Jul 2018 19:43:48 GMT
Vladimir,

The key difference between BLOB storage and IGFS is that BLOB storage will
have persistent-based architecture with possibility to cache blocks in
offheap (using mmap, which is more simple, because we delegate it to OS
level)
, while IGFS has in-memory based architecture with possibility to persist
blocks.
BLOB storage will have possibility to work with small amount of RAM without
signficant performance drop (Using zero-copy from socket to disk) and in
opposite case it can keep all available blocks in offheap if it's possible
(Using mmap again).
IGFS perform a lot of operations with blocks in on-heap which leads to
unnecessary data copies, long GC pauses and performance drop. All IGFS
architecture tightly bound with in-memory features, so it's too hard to
rewrite IGFS in persistent-based manner. But, cool IGFS features such as
intelligent affinity routing, chunk colocation will be reused in BLOB
storage.
Does it make sense?



2018-07-05 19:01 GMT+03:00 Vladimir Ozerov <vozerov@gridgain.com>:

> Pavel,
> Design you described is almost precisely what IGFS does. It has a cache for
> metadata, split binary data in chunks with intelligent affinity routing. In
> addition we have map-reduce feature on top of it and integration with
> underlying file system with optional caching. Data can be accessed in
> blocks or streams. IGFS is not in active development, but it is not
> outdated either.
> Can you shortly explain why do you think that we need to drop IGFS and
> re-implement almost the same thing from scratch?
>
> Dima, Sergey,
> Yes, we need BLOB support you described. Unfortunately it is not that easy
> to implement from SQL perspective. To support it we would need either MVCC
> (with it's own drawbacks) or read-locks for SELECT.
>
> Vladimir.
>
> On Tue, Jul 3, 2018 at 10:40 AM Sergey Kozlov <skozlov@gridgain.com>
> wrote:
>
> > Dmitriy
> >
> > You're right that that large objects storing should be optmized.
> >
> > Let's assume the large object means the regular object having large
> fields
> > and such fileds won't be used for comparison thus we can do not restore
> the
> > BLOB fields in offheap page memory e.g for sql queries if select doesn't
> > include them explicitly. It can reduce page eviction and speed up the
> > perfomance and make less chance to get OOM.
> >
> >
> >
> > On Tue, Jul 3, 2018 at 1:06 AM, Dmitriy Setrakyan <dsetrakyan@apache.org
> >
> > wrote:
> >
> > > To be honest, I am not sure if we need to kick off another file system
> > > storage discussion in Ignite. It sounds like a huge effort and likely
> > will
> > > not be productive.
> > >
> > > However, I think an ability to store large objects will make sense. For
> > > example, how do I store a 10GB blob in Ignite cache? Most likely we
> have
> > to
> > > have a separate memory or disk space, allocated for blobs only. We also
> > > need to be able to efficiently transfer a 10GB Blob object over the
> > network
> > > and store it off-heap right away, without bringing it into main heap
> > memory
> > > (otherwise we would run out of memory).
> > >
> > > I suggest that we create an IEP about this use case alone and leave the
> > > file system for the future discussions.
> > >
> > > D.
> > >
> > > On Mon, Jul 2, 2018 at 6:50 AM, Vladimir Ozerov <vozerov@gridgain.com>
> > > wrote:
> > >
> > > > Pavel,
> > > >
> > > > Thank you. I'll wait for feature comparison and concrete use cases,
> > > because
> > > > for me this feature still sounds too abstract to judge whether
> product
> > > > would benefit from it.
> > > >
> > > > On Mon, Jul 2, 2018 at 3:15 PM Pavel Kovalenko <jokserfn@gmail.com>
> > > wrote:
> > > >
> > > > > Dmitriy,
> > > > >
> > > > > I think we have a little miscommunication here. Of course, I meant
> > > > > supporting large entries / chunks of binary data. Internally it
> will
> > be
> > > > > BLOB storage, which can be accessed through various interfaces.
> > > > > "File" is just an abstraction for an end user for convenience, a
> > > wrapper
> > > > > layer to have user-friendly API to directly store BLOBs. We
> shouldn't
> > > > > support full file protocol support with file system capabilities.
> It
> > > can
> > > > be
> > > > > added later, but now it's absolutely unnecessary and introduces
> extra
> > > > > complexity.
> > > > >
> > > > > We can implement our BLOB storage step by step. The first thing is
> > > > > core functionality and support to save large parts of binary
> objects
> > to
> > > > it.
> > > > > "File" layer, Web layer, etc. can be added later.
> > > > >
> > > > > The initial IGFS design doesn't have good capabilities to have a
> > > > > persistence layer. I think we shouldn't do any changes to it, this
> > > > project
> > > > > as for me is almost outdated. We will drop IGFS after implementing
> > File
> > > > > System layer over our BLOB storage.
> > > > >
> > > > > Vladimir,
> > > > >
> > > > > I will prepare a comparison with other existing distributed file
> > > storages
> > > > > and file systems in a few days.
> > > > >
> > > > > About usage data grid, I never said, that we need transactions,
> sync
> > > > backup
> > > > > and etc. We need just a few core things - Atomic cache with
> > > persistence,
> > > > > Discovery, Baseline, Affinity, and Communication.
> > > > > Other things we can implement by ourselves. So this feature can
> > develop
> > > > > independently of other non-core features.
> > > > > For me Ignite way is providing to our users a fast and convenient
> way
> > > to
> > > > > solve their problems with good performance and durability. We have
> > the
> > > > > problem with storing large data, we should solve it.
> > > > > About other things see my message to Dmitriy above.
> > > > >
> > > > > вс, 1 июл. 2018 г. в 9:48, Dmitriy Setrakyan <
> dsetrakyan@apache.org
> > >:
> > > > >
> > > > > > Pavel,
> > > > > >
> > > > > > I have actually misunderstood the use case. To be honest, I
> thought
> > > > that
> > > > > > you were talking about the support of large values in Ignite
> > caches,
> > > > e.g.
> > > > > > objects that are several megabytes in cache.
> > > > > >
> > > > > > If we are tackling the distributed file system, then in my view,
> we
> > > > > should
> > > > > > be talking about IGFS and adding persistence support to IGFS
> (which
> > > is
> > > > > > based on HDFS API). It is not clear to me that you are talking
> > about
> > > > > IGFS.
> > > > > > Can you confirm?
> > > > > >
> > > > > > D.
> > > > > >
> > > > > >
> > > > > > On Sat, Jun 30, 2018 at 10:59 AM, Pavel Kovalenko <
> > > jokserfn@gmail.com>
> > > > > > wrote:
> > > > > >
> > > > > > > Dmitriy,
> > > > > > >
> > > > > > > Yes, I have approximate design in my mind. The main idea
is
> that
> > we
> > > > > > already
> > > > > > > have distributed cache for files metadata (our Atomic cache),
> the
> > > > data
> > > > > > flow
> > > > > > > and distribution will be controlled by our AffinityFunction
and
> > > > > Baseline.
> > > > > > > We're already have discovery and communication to make
such
> local
> > > > files
> > > > > > > storages to be synced. The files data will be separated
to
> large
> > > > blocks
> > > > > > > (64-128Mb) (which looks very similar to our WAL). Each
block
> can
> > > > > contain
> > > > > > > one or more file chunks. The tablespace (segment ids, offsets
> and
> > > > etc.)
> > > > > > > will be stored to our regular page memory. This is key
ideas to
> > > > > implement
> > > > > > > first version of such storage. We already have similiar
> > components
> > > in
> > > > > our
> > > > > > > persistence, so this experience can be reused to develop
such
> > > > storage.
> > > > > > >
> > > > > > > Denis,
> > > > > > >
> > > > > > > Nothing significant should be changed at our memory level.
It
> > will
> > > be
> > > > > > > separate, pluggable component over cache. Most of the functions
> > > which
> > > > > > give
> > > > > > > performance boost can be delegated to OS level (Memory
mapped
> > > files,
> > > > > DMA,
> > > > > > > Direct write from Socket to disk and vice versa). Ignite
and
> File
> > > > > Storage
> > > > > > > can develop independetly of each other.
> > > > > > >
> > > > > > > Alexey Stelmak, which has a great experience with developing
> such
> > > > > systems
> > > > > > > can provide more low level information about how it should
> look.
> > > > > > >
> > > > > > > сб, 30 июн. 2018 г. в 19:40, Dmitriy Setrakyan <
> > > > dsetrakyan@apache.org
> > > > > >:
> > > > > > >
> > > > > > > > Pavel, it definitely makes sense. Do you have a design
in
> mind?
> > > > > > > >
> > > > > > > > D.
> > > > > > > >
> > > > > > > > On Sat, Jun 30, 2018, 07:24 Pavel Kovalenko <
> > jokserfn@gmail.com>
> > > > > > wrote:
> > > > > > > >
> > > > > > > > > Igniters,
> > > > > > > > >
> > > > > > > > > I would like to start a discussion about designing
a new
> > > feature
> > > > > > > because
> > > > > > > > I
> > > > > > > > > think it's time to start making steps towards
it.
> > > > > > > > > I noticed, that some of our users have tried
to store large
> > > > > > homogenous
> > > > > > > > > entries (> 1, 10, 100 Mb/Gb/Tb) to our caches,
but without
> > big
> > > > > > success.
> > > > > > > > >
> > > > > > > > > IGFS project has the possibility to do it, but
as for me it
> > has
> > > > one
> > > > > > big
> > > > > > > > > disadvantage - it's in-memory only, so users
have a strict
> > size
> > > > > limit
> > > > > > > of
> > > > > > > > > their data and have data loss problem.
> > > > > > > > >
> > > > > > > > > Our durable memory has a possibility to persist
a data that
> > > > doesn't
> > > > > > fit
> > > > > > > > to
> > > > > > > > > RAM to disk, but page structure of it is not
supposed to
> > store
> > > > > large
> > > > > > > > pieces
> > > > > > > > > of data.
> > > > > > > > >
> > > > > > > > > There are a lot of projects of distributed file
systems
> like
> > > > HDFS,
> > > > > > > > > GlusterFS, etc. But all of them concentrate to
implement
> > > > high-grade
> > > > > > > file
> > > > > > > > > protocol, rather than user-friendly API which
leads to high
> > > entry
> > > > > > > > threshold
> > > > > > > > > to start implementing something over it.
> > > > > > > > > We shouldn't go in this way. Our main goal should
be
> > providing
> > > to
> > > > > > user
> > > > > > > > easy
> > > > > > > > > and fast way to use file storage and processing
here and
> now.
> > > > > > > > >
> > > > > > > > > If take HDFS as closest possible by functionality
project,
> we
> > > > have
> > > > > > one
> > > > > > > > big
> > > > > > > > > advantage against it. We can use our caches as
files
> metadata
> > > > > storage
> > > > > > > and
> > > > > > > > > have the infinite possibility to scale it, while
HDFS is
> > > bounded
> > > > by
> > > > > > > > > Namenode capacity and has big problems with keeping
a large
> > > > number
> > > > > of
> > > > > > > > files
> > > > > > > > > in the system.
> > > > > > > > >
> > > > > > > > > We achieved very good experience with persistence
when we
> > > > developed
> > > > > > our
> > > > > > > > > durable memory, and we can couple together it
and
> experience
> > > with
> > > > > > > > services,
> > > > > > > > > binary protocol, I/O and start to design a new
IEP.
> > > > > > > > >
> > > > > > > > > Use cases and features of the project:
> > > > > > > > > 1) Storing XML, JSON, BLOB, CLOB, images, videos,
text, etc
> > > > without
> > > > > > > > > overhead and data loss possibility.
> > > > > > > > > 2) Easy, pluggable, fast and distributed file
processing,
> > > > > > > transformation
> > > > > > > > > and analysis. (E.g. ImageMagick processor for
images
> > > > > transformation,
> > > > > > > > > LuceneIndex for texts, whatever, it's bounded
only by your
> > > > > > > imagination).
> > > > > > > > > 3) Scalability out of the box.
> > > > > > > > > 4) User-friendly API and minimal steps to start
using this
> > > > storage
> > > > > in
> > > > > > > > > production.
> > > > > > > > >
> > > > > > > > > I repeated again, this project is not supposed
to be a
> > > high-grade
> > > > > > > > > distributed file system with full file protocol
support.
> > > > > > > > > This project should primarily focus on target
users, which
> > > would
> > > > > like
> > > > > > > to
> > > > > > > > > use it without complex preparation.
> > > > > > > > >
> > > > > > > > > As for example, a user can deploy Ignite with
such storage
> > and
> > > > > > > web-server
> > > > > > > > > with REST API as Ignite service and get scalable,
> performant
> > > > image
> > > > > > > server
> > > > > > > > > out of the box which can be accessed using any
programming
> > > > > language.
> > > > > > > > >
> > > > > > > > > As a far target goal, we should focus on storing
and
> > > processing a
> > > > > > very
> > > > > > > > > large amount of the data like movies, streaming,
which is
> the
> > > big
> > > > > > trend
> > > > > > > > > today.
> > > > > > > > >
> > > > > > > > > I would like to say special thanks to our community
members
> > > > Alexey
> > > > > > > > Stelmak
> > > > > > > > > and Dmitriy Govorukhin which significantly helped
me to put
> > > > > together
> > > > > > > all
> > > > > > > > > pieces of that puzzle.
> > > > > > > > >
> > > > > > > > > So, I want to hear your opinions about this proposal.
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> >
> >
> > --
> > Sergey Kozlov
> > GridGain Systems
> > www.gridgain.com
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message