ignite-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Pavel Kovalenko <jokse...@gmail.com>
Subject Re: Ignite as distributed file storage
Date Thu, 02 Aug 2018 08:08:26 GMT
Dmitriy,

I still don't understand why do you think that it will be file system?
In all my previous messages I emphasized that this storage shouldn't be
considered as a file system. It's just a large data storage, whose entities
can be easily accessed using key/link (internally, or externally using
web/binary protocol interfaces).

> Instead, if we must focus on large blobs, I would solve the problem of
supporting large blobs in regular Ignite caches, as I suggested before.

This is impossible. Our page memory can't handle efficiently it by design.


2018-08-02 6:11 GMT+03:00 Dmitriy Setrakyan <dsetrakyan@apache.org>:

> Dmitriy, Pavel,
>
> Everything that gets accepted into the project has to make sense. I agree
> with Vladimir - we do not need more than one file system in Ignite. Given
> the number of usage and questions we get about IGFS, I would question
> whether Ignite needs a file system at all.
>
> As community members we should drive the community towards improving the
> project instead of advocating that no change will be rejected, no matter
> what it is. In this case, I am not convinced this is a real problem for
> users and why should Ignite even try to solve it.
>
> Instead, if we must focus on large blobs, I would solve the problem of
> supporting large blobs in regular Ignite caches, as I suggested before.
>
> D.
>
> On Wed, Aug 1, 2018 at 2:50 AM, Dmitriy Pavlov <dpavlov.spb@gmail.com>
> wrote:
>
> > Hi Vladimir,
> >
> > I think not accepting by community is possible only if PMC will veto
> > change. I didn't find any reasons why not to do this change and why it
> can
> > be vetoed..
> >
> > I would appreciate if you will become mentor of this change and will
> assist
> > to Pavel or other community member to make this happen.
> >
> > To my mind, the Apache Way is not abot rejecting things, it is about
> > sharing knowlege. If you will be able to share you experience to grow
> > community it would be good donation.
> >
> > If you have any disagreements about this change, can we set up voice call
> > where you will explain how to do this proposal as good as it is possible.
> >
> > Sincerely,
> > Dmitriy Pavlov
> >
> > пт, 6 июл. 2018 г. в 10:35, Vladimir Ozerov <vozerov@gridgain.com>:
> >
> > > Pavel,
> > >
> > > I do not think it is a good idea to delay discussions and decisions.
> > > Because it puts your efforts at risk being not accepted by community in
> > the
> > > end. Our ultimate goal is not having as much features as possible, but
> to
> > > have a consistent product which is easy to understand and use. Having
> > both
> > > IGFS and another one "not-IGFS" which is in fact the same IGFS but with
> > > different name is not a good idea, because it would cause more harm
> than
> > > value.
> > >
> > > Approaches which seems reasonable to me:
> > > 1) Integrate your ideas into IGFS, which is really flexible in how to
> > > process data and where to store it. PROXY mode is not a "crutch" as you
> > > said, but a normal mode which was used in real deployments.
> > > 2) Replace IGFS with your solution but with clear explanation how it is
> > > better than IGFS and why we need to drop thousands lines of
> battle-tested
> > > code with something new, what does virtually the same thing
> > > 3) Just drop IGFS from the product, and do not implement any
> replacement
> > at
> > > all - personally, I am all for this decision.
> > >
> > > If you want I can guide you through IGFS architecture so that we better
> > > understand what should be done to integrate your ideas into it.
> > >
> > > Lat, but not least - we need objective facts why proposed solution is
> > > better in terms of performance - concrete use cases and performance
> > numbers
> > > (or at least estimations).
> > >
> > > On Fri, Jul 6, 2018 at 1:45 AM Pavel Kovalenko <jokserfn@gmail.com>
> > wrote:
> > >
> > > > Vladimir,
> > > >
> > > > I just want to add to my words, that we can implement BLOB storage
> and
> > > > then, if community really wants it, we can adapt this storage to use
> as
> > > > underlying file system in IGFS. But IGFS shouldn't be entry point for
> > > BLOB
> > > > storage. I think this conclusion can satisfy both of us.
> > > >
> > > > 2018-07-06 0:47 GMT+03:00 Pavel Kovalenko <jokserfn@gmail.com>:
> > > >
> > > > > Vladimir,
> > > > >
> > > > > I didn't say that it stores data in on-heap, I said that it
> performs
> > a
> > > > lot
> > > > > of operations with byte[] arrays in on-heap as I see in , which
> will
> > > lead
> > > > > to frequent GCs and unnecessary data copying.
> > > > > "But the whole idea around mmap sounds like premature optimisation
> to
> > > me"
> > > > > - this is not premature optimisation, this is on of the key
> > performance
> > > > > features. E.g. Apache Kafka wouldn't be so fast and extremely
> > > performant
> > > > > without zero-copy.
> > > > > If we can do better, why not just do it? Especially if it costs
> > nothing
> > > > > for us (This is OS level).
> > > > > As I said in my first message, our end target is handling video and
> > > > > streaming, copying every chunk of it to on-heap userspace then to
> > > offheap
> > > > > and then to disk is unacceptable.
> > > > > You ask me to implement almost anything using just IGFS interface,
> > why
> > > we
> > > > > need to do that? Proxy mode looks like crutch, to support
> replication
> > > and
> > > > > possibility to have some data in-memory I need to write again a lot
> > of
> > > > > stuff.
> > > > > Let's finally leave IGFS alone and wait for IEP.
> > > > >
> > > > >
> > > > > 2018-07-06 0:01 GMT+03:00 Vladimir Ozerov <vozerov@gridgain.com>:
> > > > >
> > > > >> Pavel,
> > > > >>
> > > > >> IGFS doesn't enforce you to have block in heap. What you suggest
> can
> > > be
> > > > >> achieved with IGFS as follows:
> > > > >> 1) Disable caching, so data cache is not used ("PROXY" mode)
> > > > >> 2) Implement IgniteFileSystem interface which operates on abstract
> > > > streams
> > > > >>
> > > > >> But the whole idea around mmap sounds like premature optimisation
> to
> > > > me. I
> > > > >> conducted a number of experiments with IGFS on large Hadoop
> > workload.
> > > > Even
> > > > >> with old AI 1.x architecture, where everything was stored onheap,
> I
> > > > never
> > > > >> had an issue with GC. The key point is that IGFS operates on
large
> > > > (64Kb)
> > > > >> data blocks, so even with 100Gb full of these blocks you will
have
> > > > >> relatively small number of objects and normal GC pauses.
> Additional
> > > > memory
> > > > >> copying is not an issue either in most workloads in distributed
> > > systems,
> > > > >> because most of the time is spent on IO and internal
> synchronization
> > > > >> anyway.
> > > > >>
> > > > >> Do you have specific scenario when you observed long GC pauses
> with
> > GC
> > > > or
> > > > >> serious performance degradation with IGFS?
> > > > >>
> > > > >> Even if we agree that mmap usage is a critical piece, all we
need
> is
> > > to
> > > > >> implement a single IGFS interface.
> > > > >>
> > > > >> On Thu, Jul 5, 2018 at 10:44 PM Pavel Kovalenko <
> jokserfn@gmail.com
> > >
> > > > >> wrote:
> > > > >>
> > > > >> > Vladimir,
> > > > >> >
> > > > >> > The key difference between BLOB storage and IGFS is that
BLOB
> > > storage
> > > > >> will
> > > > >> > have persistent-based architecture with possibility to cache
> > blocks
> > > in
> > > > >> > offheap (using mmap, which is more simple, because we delegate
> it
> > to
> > > > OS
> > > > >> > level)
> > > > >> > , while IGFS has in-memory based architecture with possibility
> to
> > > > >> persist
> > > > >> > blocks.
> > > > >> > BLOB storage will have possibility to work with small amount
of
> > RAM
> > > > >> without
> > > > >> > signficant performance drop (Using zero-copy from socket
to
> disk)
> > > and
> > > > in
> > > > >> > opposite case it can keep all available blocks in offheap
if
> it's
> > > > >> possible
> > > > >> > (Using mmap again).
> > > > >> > IGFS perform a lot of operations with blocks in on-heap
which
> > leads
> > > to
> > > > >> > unnecessary data copies, long GC pauses and performance
drop.
> All
> > > IGFS
> > > > >> > architecture tightly bound with in-memory features, so it's
too
> > hard
> > > > to
> > > > >> > rewrite IGFS in persistent-based manner. But, cool IGFS
features
> > > such
> > > > as
> > > > >> > intelligent affinity routing, chunk colocation will be reused
in
> > > BLOB
> > > > >> > storage.
> > > > >> > Does it make sense?
> > > > >> >
> > > > >> >
> > > > >> >
> > > > >> > 2018-07-05 19:01 GMT+03:00 Vladimir Ozerov <
> vozerov@gridgain.com
> > >:
> > > > >> >
> > > > >> > > Pavel,
> > > > >> > > Design you described is almost precisely what IGFS
does. It
> has
> > a
> > > > >> cache
> > > > >> > for
> > > > >> > > metadata, split binary data in chunks with intelligent
> affinity
> > > > >> routing.
> > > > >> > In
> > > > >> > > addition we have map-reduce feature on top of it and
> integration
> > > > with
> > > > >> > > underlying file system with optional caching. Data
can be
> > accessed
> > > > in
> > > > >> > > blocks or streams. IGFS is not in active development,
but it
> is
> > > not
> > > > >> > > outdated either.
> > > > >> > > Can you shortly explain why do you think that we need
to drop
> > IGFS
> > > > and
> > > > >> > > re-implement almost the same thing from scratch?
> > > > >> > >
> > > > >> > > Dima, Sergey,
> > > > >> > > Yes, we need BLOB support you described. Unfortunately
it is
> not
> > > > that
> > > > >> > easy
> > > > >> > > to implement from SQL perspective. To support it we
would need
> > > > either
> > > > >> > MVCC
> > > > >> > > (with it's own drawbacks) or read-locks for SELECT.
> > > > >> > >
> > > > >> > > Vladimir.
> > > > >> > >
> > > > >> > > On Tue, Jul 3, 2018 at 10:40 AM Sergey Kozlov <
> > > skozlov@gridgain.com
> > > > >
> > > > >> > > wrote:
> > > > >> > >
> > > > >> > > > Dmitriy
> > > > >> > > >
> > > > >> > > > You're right that that large objects storing should
be
> > optmized.
> > > > >> > > >
> > > > >> > > > Let's assume the large object means the regular
object
> having
> > > > large
> > > > >> > > fields
> > > > >> > > > and such fileds won't be used for comparison thus
we can do
> > not
> > > > >> restore
> > > > >> > > the
> > > > >> > > > BLOB fields in offheap page memory e.g for sql
queries if
> > select
> > > > >> > doesn't
> > > > >> > > > include them explicitly. It can reduce page eviction
and
> speed
> > > up
> > > > >> the
> > > > >> > > > perfomance and make less chance to get OOM.
> > > > >> > > >
> > > > >> > > >
> > > > >> > > >
> > > > >> > > > On Tue, Jul 3, 2018 at 1:06 AM, Dmitriy Setrakyan
<
> > > > >> > dsetrakyan@apache.org
> > > > >> > > >
> > > > >> > > > wrote:
> > > > >> > > >
> > > > >> > > > > To be honest, I am not sure if we need to
kick off another
> > > file
> > > > >> > system
> > > > >> > > > > storage discussion in Ignite. It sounds like
a huge effort
> > and
> > > > >> likely
> > > > >> > > > will
> > > > >> > > > > not be productive.
> > > > >> > > > >
> > > > >> > > > > However, I think an ability to store large
objects will
> make
> > > > >> sense.
> > > > >> > For
> > > > >> > > > > example, how do I store a 10GB blob in Ignite
cache? Most
> > > likely
> > > > >> we
> > > > >> > > have
> > > > >> > > > to
> > > > >> > > > > have a separate memory or disk space, allocated
for blobs
> > > only.
> > > > We
> > > > >> > also
> > > > >> > > > > need to be able to efficiently transfer a
10GB Blob object
> > > over
> > > > >> the
> > > > >> > > > network
> > > > >> > > > > and store it off-heap right away, without
bringing it into
> > > main
> > > > >> heap
> > > > >> > > > memory
> > > > >> > > > > (otherwise we would run out of memory).
> > > > >> > > > >
> > > > >> > > > > I suggest that we create an IEP about this
use case alone
> > and
> > > > >> leave
> > > > >> > the
> > > > >> > > > > file system for the future discussions.
> > > > >> > > > >
> > > > >> > > > > D.
> > > > >> > > > >
> > > > >> > > > > On Mon, Jul 2, 2018 at 6:50 AM, Vladimir
Ozerov <
> > > > >> > vozerov@gridgain.com>
> > > > >> > > > > wrote:
> > > > >> > > > >
> > > > >> > > > > > Pavel,
> > > > >> > > > > >
> > > > >> > > > > > Thank you. I'll wait for feature comparison
and concrete
> > use
> > > > >> cases,
> > > > >> > > > > because
> > > > >> > > > > > for me this feature still sounds too
abstract to judge
> > > whether
> > > > >> > > product
> > > > >> > > > > > would benefit from it.
> > > > >> > > > > >
> > > > >> > > > > > On Mon, Jul 2, 2018 at 3:15 PM Pavel
Kovalenko <
> > > > >> jokserfn@gmail.com
> > > > >> > >
> > > > >> > > > > wrote:
> > > > >> > > > > >
> > > > >> > > > > > > Dmitriy,
> > > > >> > > > > > >
> > > > >> > > > > > > I think we have a little miscommunication
here. Of
> > > course, I
> > > > >> > meant
> > > > >> > > > > > > supporting large entries / chunks
of binary data.
> > > Internally
> > > > >> it
> > > > >> > > will
> > > > >> > > > be
> > > > >> > > > > > > BLOB storage, which can be accessed
through various
> > > > >> interfaces.
> > > > >> > > > > > > "File" is just an abstraction for
an end user for
> > > > >> convenience, a
> > > > >> > > > > wrapper
> > > > >> > > > > > > layer to have user-friendly API
to directly store
> BLOBs.
> > > We
> > > > >> > > shouldn't
> > > > >> > > > > > > support full file protocol support
with file system
> > > > >> capabilities.
> > > > >> > > It
> > > > >> > > > > can
> > > > >> > > > > > be
> > > > >> > > > > > > added later, but now it's absolutely
unnecessary and
> > > > >> introduces
> > > > >> > > extra
> > > > >> > > > > > > complexity.
> > > > >> > > > > > >
> > > > >> > > > > > > We can implement our BLOB storage
step by step. The
> > first
> > > > >> thing
> > > > >> > is
> > > > >> > > > > > > core functionality and support
to save large parts of
> > > binary
> > > > >> > > objects
> > > > >> > > > to
> > > > >> > > > > > it.
> > > > >> > > > > > > "File" layer, Web layer, etc. can
be added later.
> > > > >> > > > > > >
> > > > >> > > > > > > The initial IGFS design doesn't
have good capabilities
> > to
> > > > >> have a
> > > > >> > > > > > > persistence layer. I think we shouldn't
do any changes
> > to
> > > > it,
> > > > >> > this
> > > > >> > > > > > project
> > > > >> > > > > > > as for me is almost outdated. We
will drop IGFS after
> > > > >> > implementing
> > > > >> > > > File
> > > > >> > > > > > > System layer over our BLOB storage.
> > > > >> > > > > > >
> > > > >> > > > > > > Vladimir,
> > > > >> > > > > > >
> > > > >> > > > > > > I will prepare a comparison with
other existing
> > > distributed
> > > > >> file
> > > > >> > > > > storages
> > > > >> > > > > > > and file systems in a few days.
> > > > >> > > > > > >
> > > > >> > > > > > > About usage data grid, I never
said, that we need
> > > > >> transactions,
> > > > >> > > sync
> > > > >> > > > > > backup
> > > > >> > > > > > > and etc. We need just a few core
things - Atomic cache
> > > with
> > > > >> > > > > persistence,
> > > > >> > > > > > > Discovery, Baseline, Affinity,
and Communication.
> > > > >> > > > > > > Other things we can implement by
ourselves. So this
> > > feature
> > > > >> can
> > > > >> > > > develop
> > > > >> > > > > > > independently of other non-core
features.
> > > > >> > > > > > > For me Ignite way is providing
to our users a fast and
> > > > >> convenient
> > > > >> > > way
> > > > >> > > > > to
> > > > >> > > > > > > solve their problems with good
performance and
> > durability.
> > > > We
> > > > >> > have
> > > > >> > > > the
> > > > >> > > > > > > problem with storing large data,
we should solve it.
> > > > >> > > > > > > About other things see my message
to Dmitriy above.
> > > > >> > > > > > >
> > > > >> > > > > > > вс, 1 июл. 2018 г. в 9:48,
Dmitriy Setrakyan <
> > > > >> > > dsetrakyan@apache.org
> > > > >> > > > >:
> > > > >> > > > > > >
> > > > >> > > > > > > > Pavel,
> > > > >> > > > > > > >
> > > > >> > > > > > > > I have actually misunderstood
the use case. To be
> > > honest,
> > > > I
> > > > >> > > thought
> > > > >> > > > > > that
> > > > >> > > > > > > > you were talking about the
support of large values
> in
> > > > Ignite
> > > > >> > > > caches,
> > > > >> > > > > > e.g.
> > > > >> > > > > > > > objects that are several megabytes
in cache.
> > > > >> > > > > > > >
> > > > >> > > > > > > > If we are tackling the distributed
file system, then
> > in
> > > my
> > > > >> > view,
> > > > >> > > we
> > > > >> > > > > > > should
> > > > >> > > > > > > > be talking about IGFS and
adding persistence support
> > to
> > > > IGFS
> > > > >> > > (which
> > > > >> > > > > is
> > > > >> > > > > > > > based on HDFS API). It is
not clear to me that you
> are
> > > > >> talking
> > > > >> > > > about
> > > > >> > > > > > > IGFS.
> > > > >> > > > > > > > Can you confirm?
> > > > >> > > > > > > >
> > > > >> > > > > > > > D.
> > > > >> > > > > > > >
> > > > >> > > > > > > >
> > > > >> > > > > > > > On Sat, Jun 30, 2018 at 10:59
AM, Pavel Kovalenko <
> > > > >> > > > > jokserfn@gmail.com>
> > > > >> > > > > > > > wrote:
> > > > >> > > > > > > >
> > > > >> > > > > > > > > Dmitriy,
> > > > >> > > > > > > > >
> > > > >> > > > > > > > > Yes, I have approximate
design in my mind. The
> main
> > > idea
> > > > >> is
> > > > >> > > that
> > > > >> > > > we
> > > > >> > > > > > > > already
> > > > >> > > > > > > > > have distributed cache
for files metadata (our
> > Atomic
> > > > >> cache),
> > > > >> > > the
> > > > >> > > > > > data
> > > > >> > > > > > > > flow
> > > > >> > > > > > > > > and distribution will
be controlled by our
> > > > >> AffinityFunction
> > > > >> > and
> > > > >> > > > > > > Baseline.
> > > > >> > > > > > > > > We're already have discovery
and communication to
> > make
> > > > >> such
> > > > >> > > local
> > > > >> > > > > > files
> > > > >> > > > > > > > > storages to be synced.
The files data will be
> > > separated
> > > > to
> > > > >> > > large
> > > > >> > > > > > blocks
> > > > >> > > > > > > > > (64-128Mb) (which looks
very similar to our WAL).
> > Each
> > > > >> block
> > > > >> > > can
> > > > >> > > > > > > contain
> > > > >> > > > > > > > > one or more file chunks.
The tablespace (segment
> > ids,
> > > > >> offsets
> > > > >> > > and
> > > > >> > > > > > etc.)
> > > > >> > > > > > > > > will be stored to our
regular page memory. This is
> > key
> > > > >> ideas
> > > > >> > to
> > > > >> > > > > > > implement
> > > > >> > > > > > > > > first version of such
storage. We already have
> > > similiar
> > > > >> > > > components
> > > > >> > > > > in
> > > > >> > > > > > > our
> > > > >> > > > > > > > > persistence, so this
experience can be reused to
> > > develop
> > > > >> such
> > > > >> > > > > > storage.
> > > > >> > > > > > > > >
> > > > >> > > > > > > > > Denis,
> > > > >> > > > > > > > >
> > > > >> > > > > > > > > Nothing significant should
be changed at our
> memory
> > > > >> level. It
> > > > >> > > > will
> > > > >> > > > > be
> > > > >> > > > > > > > > separate, pluggable component
over cache. Most of
> > the
> > > > >> > functions
> > > > >> > > > > which
> > > > >> > > > > > > > give
> > > > >> > > > > > > > > performance boost can
be delegated to OS level
> > (Memory
> > > > >> mapped
> > > > >> > > > > files,
> > > > >> > > > > > > DMA,
> > > > >> > > > > > > > > Direct write from Socket
to disk and vice versa).
> > > Ignite
> > > > >> and
> > > > >> > > File
> > > > >> > > > > > > Storage
> > > > >> > > > > > > > > can develop independetly
of each other.
> > > > >> > > > > > > > >
> > > > >> > > > > > > > > Alexey Stelmak, which
has a great experience with
> > > > >> developing
> > > > >> > > such
> > > > >> > > > > > > systems
> > > > >> > > > > > > > > can provide more low
level information about how
> it
> > > > should
> > > > >> > > look.
> > > > >> > > > > > > > >
> > > > >> > > > > > > > > сб, 30 июн. 2018
г. в 19:40, Dmitriy Setrakyan <
> > > > >> > > > > > dsetrakyan@apache.org
> > > > >> > > > > > > >:
> > > > >> > > > > > > > >
> > > > >> > > > > > > > > > Pavel, it definitely
makes sense. Do you have a
> > > design
> > > > >> in
> > > > >> > > mind?
> > > > >> > > > > > > > > >
> > > > >> > > > > > > > > > D.
> > > > >> > > > > > > > > >
> > > > >> > > > > > > > > > On Sat, Jun 30,
2018, 07:24 Pavel Kovalenko <
> > > > >> > > > jokserfn@gmail.com>
> > > > >> > > > > > > > wrote:
> > > > >> > > > > > > > > >
> > > > >> > > > > > > > > > > Igniters,
> > > > >> > > > > > > > > > >
> > > > >> > > > > > > > > > > I would like
to start a discussion about
> > > designing a
> > > > >> new
> > > > >> > > > > feature
> > > > >> > > > > > > > > because
> > > > >> > > > > > > > > > I
> > > > >> > > > > > > > > > > think it's
time to start making steps towards
> > it.
> > > > >> > > > > > > > > > > I noticed,
that some of our users have tried
> to
> > > > store
> > > > >> > large
> > > > >> > > > > > > > homogenous
> > > > >> > > > > > > > > > > entries (>
1, 10, 100 Mb/Gb/Tb) to our caches,
> > but
> > > > >> > without
> > > > >> > > > big
> > > > >> > > > > > > > success.
> > > > >> > > > > > > > > > >
> > > > >> > > > > > > > > > > IGFS project
has the possibility to do it, but
> > as
> > > > for
> > > > >> me
> > > > >> > it
> > > > >> > > > has
> > > > >> > > > > > one
> > > > >> > > > > > > > big
> > > > >> > > > > > > > > > > disadvantage
- it's in-memory only, so users
> > have
> > > a
> > > > >> > strict
> > > > >> > > > size
> > > > >> > > > > > > limit
> > > > >> > > > > > > > > of
> > > > >> > > > > > > > > > > their data
and have data loss problem.
> > > > >> > > > > > > > > > >
> > > > >> > > > > > > > > > > Our durable
memory has a possibility to
> persist
> > a
> > > > data
> > > > >> > that
> > > > >> > > > > > doesn't
> > > > >> > > > > > > > fit
> > > > >> > > > > > > > > > to
> > > > >> > > > > > > > > > > RAM to disk,
but page structure of it is not
> > > > supposed
> > > > >> to
> > > > >> > > > store
> > > > >> > > > > > > large
> > > > >> > > > > > > > > > pieces
> > > > >> > > > > > > > > > > of data.
> > > > >> > > > > > > > > > >
> > > > >> > > > > > > > > > > There are a
lot of projects of distributed
> file
> > > > >> systems
> > > > >> > > like
> > > > >> > > > > > HDFS,
> > > > >> > > > > > > > > > > GlusterFS,
etc. But all of them concentrate to
> > > > >> implement
> > > > >> > > > > > high-grade
> > > > >> > > > > > > > > file
> > > > >> > > > > > > > > > > protocol, rather
than user-friendly API which
> > > leads
> > > > to
> > > > >> > high
> > > > >> > > > > entry
> > > > >> > > > > > > > > > threshold
> > > > >> > > > > > > > > > > to start implementing
something over it.
> > > > >> > > > > > > > > > > We shouldn't
go in this way. Our main goal
> > should
> > > be
> > > > >> > > > providing
> > > > >> > > > > to
> > > > >> > > > > > > > user
> > > > >> > > > > > > > > > easy
> > > > >> > > > > > > > > > > and fast way
to use file storage and
> processing
> > > here
> > > > >> and
> > > > >> > > now.
> > > > >> > > > > > > > > > >
> > > > >> > > > > > > > > > > If take HDFS
as closest possible by
> > functionality
> > > > >> > project,
> > > > >> > > we
> > > > >> > > > > > have
> > > > >> > > > > > > > one
> > > > >> > > > > > > > > > big
> > > > >> > > > > > > > > > > advantage against
it. We can use our caches as
> > > files
> > > > >> > > metadata
> > > > >> > > > > > > storage
> > > > >> > > > > > > > > and
> > > > >> > > > > > > > > > > have the infinite
possibility to scale it,
> while
> > > > HDFS
> > > > >> is
> > > > >> > > > > bounded
> > > > >> > > > > > by
> > > > >> > > > > > > > > > > Namenode capacity
and has big problems with
> > > keeping
> > > > a
> > > > >> > large
> > > > >> > > > > > number
> > > > >> > > > > > > of
> > > > >> > > > > > > > > > files
> > > > >> > > > > > > > > > > in the system.
> > > > >> > > > > > > > > > >
> > > > >> > > > > > > > > > > We achieved
very good experience with
> > persistence
> > > > >> when we
> > > > >> > > > > > developed
> > > > >> > > > > > > > our
> > > > >> > > > > > > > > > > durable memory,
and we can couple together it
> > and
> > > > >> > > experience
> > > > >> > > > > with
> > > > >> > > > > > > > > > services,
> > > > >> > > > > > > > > > > binary protocol,
I/O and start to design a new
> > > IEP.
> > > > >> > > > > > > > > > >
> > > > >> > > > > > > > > > > Use cases and
features of the project:
> > > > >> > > > > > > > > > > 1) Storing
XML, JSON, BLOB, CLOB, images,
> > videos,
> > > > >> text,
> > > > >> > etc
> > > > >> > > > > > without
> > > > >> > > > > > > > > > > overhead and
data loss possibility.
> > > > >> > > > > > > > > > > 2) Easy, pluggable,
fast and distributed file
> > > > >> processing,
> > > > >> > > > > > > > > transformation
> > > > >> > > > > > > > > > > and analysis.
(E.g. ImageMagick processor for
> > > images
> > > > >> > > > > > > transformation,
> > > > >> > > > > > > > > > > LuceneIndex
for texts, whatever, it's bounded
> > only
> > > > by
> > > > >> > your
> > > > >> > > > > > > > > imagination).
> > > > >> > > > > > > > > > > 3) Scalability
out of the box.
> > > > >> > > > > > > > > > > 4) User-friendly
API and minimal steps to
> start
> > > > using
> > > > >> > this
> > > > >> > > > > > storage
> > > > >> > > > > > > in
> > > > >> > > > > > > > > > > production.
> > > > >> > > > > > > > > > >
> > > > >> > > > > > > > > > > I repeated
again, this project is not supposed
> > to
> > > > be a
> > > > >> > > > > high-grade
> > > > >> > > > > > > > > > > distributed
file system with full file
> protocol
> > > > >> support.
> > > > >> > > > > > > > > > > This project
should primarily focus on target
> > > users,
> > > > >> > which
> > > > >> > > > > would
> > > > >> > > > > > > like
> > > > >> > > > > > > > > to
> > > > >> > > > > > > > > > > use it without
complex preparation.
> > > > >> > > > > > > > > > >
> > > > >> > > > > > > > > > > As for example,
a user can deploy Ignite with
> > such
> > > > >> > storage
> > > > >> > > > and
> > > > >> > > > > > > > > web-server
> > > > >> > > > > > > > > > > with REST API
as Ignite service and get
> > scalable,
> > > > >> > > performant
> > > > >> > > > > > image
> > > > >> > > > > > > > > server
> > > > >> > > > > > > > > > > out of the
box which can be accessed using any
> > > > >> > programming
> > > > >> > > > > > > language.
> > > > >> > > > > > > > > > >
> > > > >> > > > > > > > > > > As a far target
goal, we should focus on
> storing
> > > and
> > > > >> > > > > processing a
> > > > >> > > > > > > > very
> > > > >> > > > > > > > > > > large amount
of the data like movies,
> streaming,
> > > > >> which is
> > > > >> > > the
> > > > >> > > > > big
> > > > >> > > > > > > > trend
> > > > >> > > > > > > > > > > today.
> > > > >> > > > > > > > > > >
> > > > >> > > > > > > > > > > I would like
to say special thanks to our
> > > community
> > > > >> > members
> > > > >> > > > > > Alexey
> > > > >> > > > > > > > > > Stelmak
> > > > >> > > > > > > > > > > and Dmitriy
Govorukhin which significantly
> > helped
> > > me
> > > > >> to
> > > > >> > put
> > > > >> > > > > > > together
> > > > >> > > > > > > > > all
> > > > >> > > > > > > > > > > pieces of that
puzzle.
> > > > >> > > > > > > > > > >
> > > > >> > > > > > > > > > > So, I want
to hear your opinions about this
> > > > proposal.
> > > > >> > > > > > > > > > >
> > > > >> > > > > > > > > >
> > > > >> > > > > > > > >
> > > > >> > > > > > > >
> > > > >> > > > > > >
> > > > >> > > > > >
> > > > >> > > > >
> > > > >> > > >
> > > > >> > > >
> > > > >> > > >
> > > > >> > > > --
> > > > >> > > > Sergey Kozlov
> > > > >> > > > GridGain Systems
> > > > >> > > > www.gridgain.com
> > > > >> > > >
> > > > >> > >
> > > > >> >
> > > > >>
> > > > >
> > > > >
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message