nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Angel Faus <angel.f...@gmail.com>
Subject NameNode scalibility
Date Mon, 07 Mar 2005 19:11:14 GMT
Hi,

I have been doing some tests to find out if NDFS can be used at our
company to reliable store many files (both small and big) across a
cluster of cheap servers.

The short summary is that right now NDFS doesn't look viable for our needs. 

I am sending the results of the test to the list, in case it is of any
interest.

We created about 400.000 files, each one of 1 or a small number of
blocks, and placed them in a cluster of 8 Datanodes (and 1 NameNode).
Since FSDirectory stores children as a simple Vector, we took care no
to create any directory with more than 100 files.

Performance degraded as we added more files, and we eventually ran
into some system limits in the NameNode (too many opened files,
out-of-error memory errors) that were solved the usual way (increasing
ulimit, adding more memory to the heap).

Afterwards performance degradation continued untill most connections
could not be established ("Problem making IPC call")

This happened with a fairly small image/fsimage file (just 65Mb).

Anyway, this allowed us to set up an installation to try NDFS.

This are the results for the measurements of just starting up the
NutchFileSystem. Measurements are taken with iostat. All 9 machines
are single-CPU, Xeon 2.4 Ghz with 512Mb RAM. (Linux kernel 2.4.20 with
reiserfs). This is nutch-0.7-dev right from the CVS.

--------------------
1) Launch of NameNode:

Reading the the filenames->blocks table takes about 2 minutes with the
CPU at 100% and very low IO activity.

During that period the Namenode is not available for --report queries.

Afterwards CPU activity goes to 0 and --report queries work fine. 

This step is definitely CPU bound.

--------------------
2) Launch of a single DataNode:

After launching a single Datanode (80000 blocks) the CPU of the
Namenode is at 13%, and CPU of Namenode is at 100% (76% sys, 24%
user).

General operations on NDFS (--report, --ls) keep working.

--------------------
3) Launching additional DataNodes

Launching a second and third DataNodes maintains the pattern
(additional 13% CPU on NameNode for each DataNode, 100% [76% sys, 24%
user aprx] CPU load at the DataNode)

But launching the whole set (9 DataNodes) changes the situation.

On the NameNode

 * Sustained IOs of over 10.000 blk_read/s and 4.000 blk_writn/s. (no
significant IO activity before).
 
 * iowait gets to 90%. idle time sinks to 0%
 
 * Simple --report or -ls / queries to NDFS fail or require many
trials before success. ("Problem making IPC call...")

--------------------

So, it's enough to just start up the whole set of DataNodes to
effectively make the NDFS unavailable. No actual activity (from the fs
user) is needed.

I understand the 100% CPU usage while loading the filename->blocks
table, but... ┬┐what can be causing such a high amount of IO in the
NameNode?

I can try to narrow this further if there is any interest: adding more
logging code in nutchfs, more stats, different use cases, etc.

On the other hand, maybe this installation is just not the problem
ndfs intends to address.

Best,


angel

Mime
View raw message