ignite-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Pavel Kovalenko <jokse...@gmail.com>
Subject Re: Ignite as distributed file storage
Date Thu, 05 Jul 2018 21:47:34 GMT
Vladimir,

I didn't say that it stores data in on-heap, I said that it performs a lot
of operations with byte[] arrays in on-heap as I see in , which will lead
to frequent GCs and unnecessary data copying.
"But the whole idea around mmap sounds like premature optimisation to me" -
this is not premature optimisation, this is on of the key performance
features. E.g. Apache Kafka wouldn't be so fast and extremely performant
without zero-copy.
If we can do better, why not just do it? Especially if it costs nothing for
us (This is OS level).
As I said in my first message, our end target is handling video and
streaming, copying every chunk of it to on-heap userspace then to offheap
and then to disk is unacceptable.
You ask me to implement almost anything using just IGFS interface, why we
need to do that? Proxy mode looks like crutch, to support replication and
possibility to have some data in-memory I need to write again a lot of
stuff.
Let's finally leave IGFS alone and wait for IEP.


2018-07-06 0:01 GMT+03:00 Vladimir Ozerov <vozerov@gridgain.com>:

> Pavel,
>
> IGFS doesn't enforce you to have block in heap. What you suggest can be
> achieved with IGFS as follows:
> 1) Disable caching, so data cache is not used ("PROXY" mode)
> 2) Implement IgniteFileSystem interface which operates on abstract streams
>
> But the whole idea around mmap sounds like premature optimisation to me. I
> conducted a number of experiments with IGFS on large Hadoop workload. Even
> with old AI 1.x architecture, where everything was stored onheap, I never
> had an issue with GC. The key point is that IGFS operates on large (64Kb)
> data blocks, so even with 100Gb full of these blocks you will have
> relatively small number of objects and normal GC pauses. Additional memory
> copying is not an issue either in most workloads in distributed systems,
> because most of the time is spent on IO and internal synchronization
> anyway.
>
> Do you have specific scenario when you observed long GC pauses with GC or
> serious performance degradation with IGFS?
>
> Even if we agree that mmap usage is a critical piece, all we need is to
> implement a single IGFS interface.
>
> On Thu, Jul 5, 2018 at 10:44 PM Pavel Kovalenko <jokserfn@gmail.com>
> wrote:
>
> > Vladimir,
> >
> > The key difference between BLOB storage and IGFS is that BLOB storage
> will
> > have persistent-based architecture with possibility to cache blocks in
> > offheap (using mmap, which is more simple, because we delegate it to OS
> > level)
> > , while IGFS has in-memory based architecture with possibility to persist
> > blocks.
> > BLOB storage will have possibility to work with small amount of RAM
> without
> > signficant performance drop (Using zero-copy from socket to disk) and in
> > opposite case it can keep all available blocks in offheap if it's
> possible
> > (Using mmap again).
> > IGFS perform a lot of operations with blocks in on-heap which leads to
> > unnecessary data copies, long GC pauses and performance drop. All IGFS
> > architecture tightly bound with in-memory features, so it's too hard to
> > rewrite IGFS in persistent-based manner. But, cool IGFS features such as
> > intelligent affinity routing, chunk colocation will be reused in BLOB
> > storage.
> > Does it make sense?
> >
> >
> >
> > 2018-07-05 19:01 GMT+03:00 Vladimir Ozerov <vozerov@gridgain.com>:
> >
> > > Pavel,
> > > Design you described is almost precisely what IGFS does. It has a cache
> > for
> > > metadata, split binary data in chunks with intelligent affinity
> routing.
> > In
> > > addition we have map-reduce feature on top of it and integration with
> > > underlying file system with optional caching. Data can be accessed in
> > > blocks or streams. IGFS is not in active development, but it is not
> > > outdated either.
> > > Can you shortly explain why do you think that we need to drop IGFS and
> > > re-implement almost the same thing from scratch?
> > >
> > > Dima, Sergey,
> > > Yes, we need BLOB support you described. Unfortunately it is not that
> > easy
> > > to implement from SQL perspective. To support it we would need either
> > MVCC
> > > (with it's own drawbacks) or read-locks for SELECT.
> > >
> > > Vladimir.
> > >
> > > On Tue, Jul 3, 2018 at 10:40 AM Sergey Kozlov <skozlov@gridgain.com>
> > > wrote:
> > >
> > > > Dmitriy
> > > >
> > > > You're right that that large objects storing should be optmized.
> > > >
> > > > Let's assume the large object means the regular object having large
> > > fields
> > > > and such fileds won't be used for comparison thus we can do not
> restore
> > > the
> > > > BLOB fields in offheap page memory e.g for sql queries if select
> > doesn't
> > > > include them explicitly. It can reduce page eviction and speed up the
> > > > perfomance and make less chance to get OOM.
> > > >
> > > >
> > > >
> > > > On Tue, Jul 3, 2018 at 1:06 AM, Dmitriy Setrakyan <
> > dsetrakyan@apache.org
> > > >
> > > > wrote:
> > > >
> > > > > To be honest, I am not sure if we need to kick off another file
> > system
> > > > > storage discussion in Ignite. It sounds like a huge effort and
> likely
> > > > will
> > > > > not be productive.
> > > > >
> > > > > However, I think an ability to store large objects will make sense.
> > For
> > > > > example, how do I store a 10GB blob in Ignite cache? Most likely
we
> > > have
> > > > to
> > > > > have a separate memory or disk space, allocated for blobs only. We
> > also
> > > > > need to be able to efficiently transfer a 10GB Blob object over the
> > > > network
> > > > > and store it off-heap right away, without bringing it into main
> heap
> > > > memory
> > > > > (otherwise we would run out of memory).
> > > > >
> > > > > I suggest that we create an IEP about this use case alone and leave
> > the
> > > > > file system for the future discussions.
> > > > >
> > > > > D.
> > > > >
> > > > > On Mon, Jul 2, 2018 at 6:50 AM, Vladimir Ozerov <
> > vozerov@gridgain.com>
> > > > > wrote:
> > > > >
> > > > > > Pavel,
> > > > > >
> > > > > > Thank you. I'll wait for feature comparison and concrete use
> cases,
> > > > > because
> > > > > > for me this feature still sounds too abstract to judge whether
> > > product
> > > > > > would benefit from it.
> > > > > >
> > > > > > On Mon, Jul 2, 2018 at 3:15 PM Pavel Kovalenko <
> jokserfn@gmail.com
> > >
> > > > > wrote:
> > > > > >
> > > > > > > Dmitriy,
> > > > > > >
> > > > > > > I think we have a little miscommunication here. Of course,
I
> > meant
> > > > > > > supporting large entries / chunks of binary data. Internally
it
> > > will
> > > > be
> > > > > > > BLOB storage, which can be accessed through various interfaces.
> > > > > > > "File" is just an abstraction for an end user for convenience,
> a
> > > > > wrapper
> > > > > > > layer to have user-friendly API to directly store BLOBs.
We
> > > shouldn't
> > > > > > > support full file protocol support with file system
> capabilities.
> > > It
> > > > > can
> > > > > > be
> > > > > > > added later, but now it's absolutely unnecessary and introduces
> > > extra
> > > > > > > complexity.
> > > > > > >
> > > > > > > We can implement our BLOB storage step by step. The first
thing
> > is
> > > > > > > core functionality and support to save large parts of binary
> > > objects
> > > > to
> > > > > > it.
> > > > > > > "File" layer, Web layer, etc. can be added later.
> > > > > > >
> > > > > > > The initial IGFS design doesn't have good capabilities
to have
> a
> > > > > > > persistence layer. I think we shouldn't do any changes
to it,
> > this
> > > > > > project
> > > > > > > as for me is almost outdated. We will drop IGFS after
> > implementing
> > > > File
> > > > > > > System layer over our BLOB storage.
> > > > > > >
> > > > > > > Vladimir,
> > > > > > >
> > > > > > > I will prepare a comparison with other existing distributed
> file
> > > > > storages
> > > > > > > and file systems in a few days.
> > > > > > >
> > > > > > > About usage data grid, I never said, that we need transactions,
> > > sync
> > > > > > backup
> > > > > > > and etc. We need just a few core things - Atomic cache
with
> > > > > persistence,
> > > > > > > Discovery, Baseline, Affinity, and Communication.
> > > > > > > Other things we can implement by ourselves. So this feature
can
> > > > develop
> > > > > > > independently of other non-core features.
> > > > > > > For me Ignite way is providing to our users a fast and
> convenient
> > > way
> > > > > to
> > > > > > > solve their problems with good performance and durability.
We
> > have
> > > > the
> > > > > > > problem with storing large data, we should solve it.
> > > > > > > About other things see my message to Dmitriy above.
> > > > > > >
> > > > > > > вс, 1 июл. 2018 г. в 9:48, Dmitriy Setrakyan <
> > > dsetrakyan@apache.org
> > > > >:
> > > > > > >
> > > > > > > > Pavel,
> > > > > > > >
> > > > > > > > I have actually misunderstood the use case. To be
honest, I
> > > thought
> > > > > > that
> > > > > > > > you were talking about the support of large values
in Ignite
> > > > caches,
> > > > > > e.g.
> > > > > > > > objects that are several megabytes in cache.
> > > > > > > >
> > > > > > > > If we are tackling the distributed file system, then
in my
> > view,
> > > we
> > > > > > > should
> > > > > > > > be talking about IGFS and adding persistence support
to IGFS
> > > (which
> > > > > is
> > > > > > > > based on HDFS API). It is not clear to me that you
are
> talking
> > > > about
> > > > > > > IGFS.
> > > > > > > > Can you confirm?
> > > > > > > >
> > > > > > > > D.
> > > > > > > >
> > > > > > > >
> > > > > > > > On Sat, Jun 30, 2018 at 10:59 AM, Pavel Kovalenko
<
> > > > > jokserfn@gmail.com>
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > > > Dmitriy,
> > > > > > > > >
> > > > > > > > > Yes, I have approximate design in my mind. The
main idea is
> > > that
> > > > we
> > > > > > > > already
> > > > > > > > > have distributed cache for files metadata (our
Atomic
> cache),
> > > the
> > > > > > data
> > > > > > > > flow
> > > > > > > > > and distribution will be controlled by our AffinityFunction
> > and
> > > > > > > Baseline.
> > > > > > > > > We're already have discovery and communication
to make such
> > > local
> > > > > > files
> > > > > > > > > storages to be synced. The files data will be
separated to
> > > large
> > > > > > blocks
> > > > > > > > > (64-128Mb) (which looks very similar to our WAL).
Each
> block
> > > can
> > > > > > > contain
> > > > > > > > > one or more file chunks. The tablespace (segment
ids,
> offsets
> > > and
> > > > > > etc.)
> > > > > > > > > will be stored to our regular page memory. This
is key
> ideas
> > to
> > > > > > > implement
> > > > > > > > > first version of such storage. We already have
similiar
> > > > components
> > > > > in
> > > > > > > our
> > > > > > > > > persistence, so this experience can be reused
to develop
> such
> > > > > > storage.
> > > > > > > > >
> > > > > > > > > Denis,
> > > > > > > > >
> > > > > > > > > Nothing significant should be changed at our
memory level.
> It
> > > > will
> > > > > be
> > > > > > > > > separate, pluggable component over cache. Most
of the
> > functions
> > > > > which
> > > > > > > > give
> > > > > > > > > performance boost can be delegated to OS level
(Memory
> mapped
> > > > > files,
> > > > > > > DMA,
> > > > > > > > > Direct write from Socket to disk and vice versa).
Ignite
> and
> > > File
> > > > > > > Storage
> > > > > > > > > can develop independetly of each other.
> > > > > > > > >
> > > > > > > > > Alexey Stelmak, which has a great experience
with
> developing
> > > such
> > > > > > > systems
> > > > > > > > > can provide more low level information about
how it should
> > > look.
> > > > > > > > >
> > > > > > > > > сб, 30 июн. 2018 г. в 19:40, Dmitriy Setrakyan
<
> > > > > > dsetrakyan@apache.org
> > > > > > > >:
> > > > > > > > >
> > > > > > > > > > Pavel, it definitely makes sense. Do you
have a design in
> > > mind?
> > > > > > > > > >
> > > > > > > > > > D.
> > > > > > > > > >
> > > > > > > > > > On Sat, Jun 30, 2018, 07:24 Pavel Kovalenko
<
> > > > jokserfn@gmail.com>
> > > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > > Igniters,
> > > > > > > > > > >
> > > > > > > > > > > I would like to start a discussion
about designing a
> new
> > > > > feature
> > > > > > > > > because
> > > > > > > > > > I
> > > > > > > > > > > think it's time to start making steps
towards it.
> > > > > > > > > > > I noticed, that some of our users have
tried to store
> > large
> > > > > > > > homogenous
> > > > > > > > > > > entries (> 1, 10, 100 Mb/Gb/Tb)
to our caches, but
> > without
> > > > big
> > > > > > > > success.
> > > > > > > > > > >
> > > > > > > > > > > IGFS project has the possibility to
do it, but as for
> me
> > it
> > > > has
> > > > > > one
> > > > > > > > big
> > > > > > > > > > > disadvantage - it's in-memory only,
so users have a
> > strict
> > > > size
> > > > > > > limit
> > > > > > > > > of
> > > > > > > > > > > their data and have data loss problem.
> > > > > > > > > > >
> > > > > > > > > > > Our durable memory has a possibility
to persist a data
> > that
> > > > > > doesn't
> > > > > > > > fit
> > > > > > > > > > to
> > > > > > > > > > > RAM to disk, but page structure of
it is not supposed
> to
> > > > store
> > > > > > > large
> > > > > > > > > > pieces
> > > > > > > > > > > of data.
> > > > > > > > > > >
> > > > > > > > > > > There are a lot of projects of distributed
file systems
> > > like
> > > > > > HDFS,
> > > > > > > > > > > GlusterFS, etc. But all of them concentrate
to
> implement
> > > > > > high-grade
> > > > > > > > > file
> > > > > > > > > > > protocol, rather than user-friendly
API which leads to
> > high
> > > > > entry
> > > > > > > > > > threshold
> > > > > > > > > > > to start implementing something over
it.
> > > > > > > > > > > We shouldn't go in this way. Our main
goal should be
> > > > providing
> > > > > to
> > > > > > > > user
> > > > > > > > > > easy
> > > > > > > > > > > and fast way to use file storage and
processing here
> and
> > > now.
> > > > > > > > > > >
> > > > > > > > > > > If take HDFS as closest possible by
functionality
> > project,
> > > we
> > > > > > have
> > > > > > > > one
> > > > > > > > > > big
> > > > > > > > > > > advantage against it. We can use our
caches as files
> > > metadata
> > > > > > > storage
> > > > > > > > > and
> > > > > > > > > > > have the infinite possibility to scale
it, while HDFS
> is
> > > > > bounded
> > > > > > by
> > > > > > > > > > > Namenode capacity and has big problems
with keeping a
> > large
> > > > > > number
> > > > > > > of
> > > > > > > > > > files
> > > > > > > > > > > in the system.
> > > > > > > > > > >
> > > > > > > > > > > We achieved very good experience with
persistence when
> we
> > > > > > developed
> > > > > > > > our
> > > > > > > > > > > durable memory, and we can couple together
it and
> > > experience
> > > > > with
> > > > > > > > > > services,
> > > > > > > > > > > binary protocol, I/O and start to design
a new IEP.
> > > > > > > > > > >
> > > > > > > > > > > Use cases and features of the project:
> > > > > > > > > > > 1) Storing XML, JSON, BLOB, CLOB, images,
videos, text,
> > etc
> > > > > > without
> > > > > > > > > > > overhead and data loss possibility.
> > > > > > > > > > > 2) Easy, pluggable, fast and distributed
file
> processing,
> > > > > > > > > transformation
> > > > > > > > > > > and analysis. (E.g. ImageMagick processor
for images
> > > > > > > transformation,
> > > > > > > > > > > LuceneIndex for texts, whatever, it's
bounded only by
> > your
> > > > > > > > > imagination).
> > > > > > > > > > > 3) Scalability out of the box.
> > > > > > > > > > > 4) User-friendly API and minimal steps
to start using
> > this
> > > > > > storage
> > > > > > > in
> > > > > > > > > > > production.
> > > > > > > > > > >
> > > > > > > > > > > I repeated again, this project is not
supposed to be a
> > > > > high-grade
> > > > > > > > > > > distributed file system with full file
protocol
> support.
> > > > > > > > > > > This project should primarily focus
on target users,
> > which
> > > > > would
> > > > > > > like
> > > > > > > > > to
> > > > > > > > > > > use it without complex preparation.
> > > > > > > > > > >
> > > > > > > > > > > As for example, a user can deploy Ignite
with such
> > storage
> > > > and
> > > > > > > > > web-server
> > > > > > > > > > > with REST API as Ignite service and
get scalable,
> > > performant
> > > > > > image
> > > > > > > > > server
> > > > > > > > > > > out of the box which can be accessed
using any
> > programming
> > > > > > > language.
> > > > > > > > > > >
> > > > > > > > > > > As a far target goal, we should focus
on storing and
> > > > > processing a
> > > > > > > > very
> > > > > > > > > > > large amount of the data like movies,
streaming, which
> is
> > > the
> > > > > big
> > > > > > > > trend
> > > > > > > > > > > today.
> > > > > > > > > > >
> > > > > > > > > > > I would like to say special thanks
to our community
> > members
> > > > > > Alexey
> > > > > > > > > > Stelmak
> > > > > > > > > > > and Dmitriy Govorukhin which significantly
helped me to
> > put
> > > > > > > together
> > > > > > > > > all
> > > > > > > > > > > pieces of that puzzle.
> > > > > > > > > > >
> > > > > > > > > > > So, I want to hear your opinions about
this proposal.
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > > Sergey Kozlov
> > > > GridGain Systems
> > > > www.gridgain.com
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message