hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrew Purtell <apurt...@apache.org>
Subject Re: Standalone == Dev Only?
Date Fri, 06 Mar 2015 22:19:39 GMT
... And if you have at most "small data" at this stage, you might be able
to cut the heap sizes of the HDFS daemons in half.

On Fri, Mar 6, 2015 at 2:18 PM, Andrew Purtell <apurtell@apache.org> wrote:

> > I think the final issue with hadoop-common (re: unimplemented sync for local
> filesystems) is the one showstopper for us.
> Although the unnecessary overhead would be significant, you could run a
> stripped down HDFS stack on the VM. Give the NameNode, SecondaryNameNode,
> and DataNode 1GB of heap only (so this sacrifices 3GB of RAM), configure
> replication to default to 1, enable short-circuit reads for direct file
> access during reads, and enable sync-behind-write on the DataNode. If using
> EXT3 or 4 mount with dirsync. (Or use XFS.) This is about as good as you'll
> be able to do until that Hadoop JIRA is addressed. It could get you over
> the hump. I do this sort of thing for testing the full HDFS+HBase stack
> when I'm unable to get my hands on a cluster.
> On Fri, Mar 6, 2015 at 1:50 PM, Rose, Joseph <
> Joseph.Rose@childrens.harvard.edu> wrote:
>> So, I think Nick, St.Ack and Wilm have all made some excellent points, but
>> this last email more or less hit it on the head. Like I said, I¹m working
>> with patient data and while the volume is small now, it¹s not going to
>> stay that way. And the cell-level security is a *huge* win ‹ I¹m sure you
>> folks have some idea how happy that feature makes me. I¹d also rather be
>> writing coprocessors than triggers or ‹ heaven forbid ‹ PL/SQL.
>> But there¹s another, more fundamental thing: we¹re exploring other DB
>> architectures because classical RDBMS systems haven¹t always worked out so
>> well. In fact, we¹re having a bit of a hard time with the current project
>> because we¹ve been constrained (thus far) to a relational system and it
>> doesn¹t seem to be a clean fit. A key/val store, on the other hand, will
>> have enough flexibility to get the job done, I think. It¹s all being
>> prototyped now, so we¹ll see.
>> I think the final issue with hadoop-common (re: unimplemented sync for
>> local filesystems) is the one showstopper for us. We have to have assured
>> durability. I¹m willing to devote some cycles to get it done, so maybe I¹m
>> the one that says this problem is worthwhile.
>> Thanks for chiming in. I¹d love to hear more.
>> -j
>> On 3/6/15, 3:02 PM, "Wilm Schumacher" <wilm.schumacher@gmail.com> wrote:
>> >Hi,
>> >
>> >Am 06.03.2015 um 19:18 schrieb Stack:
>> >> Why not use an RDBMS then?
>> >
>> >When I first read the hbase documentation I also stumbled about the
>> >"only use for large datasets" or "standalone only in dev mode" etc. In
>> >my point of view there are some arguments against RDBMSs and for e.g.
>> >hbase, although we talk about a single node application.
>> >
>> >* scalability is a future investment. Even if the dataset is small now,
>> >it doesn't mean that it is in the future, too. Scalabilty in size and
>> >computing power is always a good idea.
>> >
>> >* query language: for a user hbase is more of a database library than a
>> >"DBMS". For me this is a big plus, as it forces the user to do it the
>> >right way. Just think of SQL-injection. Or CQL-injection for that
>> >matter. Query languages are like scripting languages. Makes easy stuff
>> >easier and hard stuff harder.
>> >
>> >* fancy features: hbase has fancy features RDBMSs doesn't have. E.g.
>> >coprocessors. I know that e.g. mysql has "triggers", but they are not
>> >nearly as powerful as coprocessors. And don't forget that you have to
>> >write most of the triggers in this *curse word* SQ-language if you don't
>> >want to use evil hacks.
>> >
>> >* schema-less: another HUGE plus is the possibility to use it without a
>> >fixed schema. In SQL you would need several tables and do a lot of
>> >joins. And the output is way harder to get and to parse.
>> >
>> >* ecosystem: when you use hbase you automatically get the whole hadoop,
>> >or better apache foundation, ecosystem right away. Not only hdfs, but
>> >mapred, lucene, spark, kafka etc. etc..
>> >
>> >There are only two real arguments against hbase in that scenario:
>> >
>> >* joins etc.: well, in sql that's a question of minutes. In hbase that
>> >takes a little more effort. BUT: then it's done the right way ;).
>> >
>> >* RDMSs are more widely known: well ... that's not the fault of hbase ;).
>> >
>> >Thus, I think that the hbase community should be more self-reliant for
>> >that matter, even and especially for applications in the SQL realm ;).
>> >Which is a good opportunity to say congratulations for the hbase 1.0
>> >milestone. And thank you for that.
>> >​

Best regards,

   - Andy

Problems worthy of attack prove their worth by hitting back. - Piet Hein
(via Tom White)

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message