hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Charles Glommen <rada...@gmail.com>
Subject Re: Issue reading consistently from an hbase test client app
Date Fri, 16 Apr 2010 18:00:37 GMT
Yeah, strange stuff. I Don't see anything strange in the master's logs, it
will take awhile to go over the region server logs, but I will certainly do
that. We are running version 0.20.3.

On Fri, Apr 16, 2010 at 10:42 AM, Amandeep Khurana <amansk@gmail.com> wrote:

> What version of HBase are you on? Did you see anything out of place in the
> master or regionserver logs? This should be happening...!
>
>
> Amandeep Khurana
> Computer Science Graduate Student
> University of California, Santa Cruz
>
>
> On Fri, Apr 16, 2010 at 10:27 AM, Charles Glommen <radarcg@gmail.com>
> wrote:
>
> > For a slightly unrelated reason, I needed to write a quick app to test
> some
> > code running on our hadoop/hbase cluster. However, I seem to be having
> > issues with getting consistent reads.
> >
> > Here's the scenario:
> >
> > This application scans some directories in hdfs, and reads lines of text
> > from each file. A user ID is extracted from the line, then hbase is
> checked
> > to see that the ID exists. In *all* cases the ID should exist in hbase.
> > However, only about the first 100 or so (of about 1000) return valid
> > results. After about 100 reads or so, the rest return null for
> > Result.getValue(). You can see from the code that it takes a userID as a
> > parameter. This is to illustrate that data is in fact in hbase.
> > Setting*any*
> > of the userIDs that produced null results as a parameter will result in a
> > valid hbase read. Here is an abbreviated output that illustrates this
> > oddity:
> >
> > First execution of application:
> > ...(many 'good' output lines, like the following 2)
> > bytes for user 139|754436243196115533|c: 1920
> > bytes for user 139|754436243113796511|c: 1059
> > bytes for user 141|754999187733044577|c: 0
> > 1/171 FILE MAY HAVE LINE MISSING FROM HBASE!:
> >
> >
> hdfs://elh00/user/hadoop/events/siteID-141/2010-04-12T00-0700/fiqgvrl.events
> > bytes for user *141|754717712663942409|c*: 0
> > 2/172 FILE MAY HAVE LINE MISSING FROM HBASE!:
> >
> >
> hdfs://elh00/user/hadoop/events/siteID-141/2010-04-12T00-0700/fwesvqn.events
> > bytes for user 141|755280633926232247|c: 0
> > 3/173 FILE MAY HAVE LINE MISSING FROM HBASE!:
> >
> hdfs://elh00/user/hadoop/events/siteID-141/2010-04-12T01-0700/wydfvn.events
> > bytes for user 141|754436237930862231|c: 0
> > 4/174 FILE MAY HAVE LINE MISSING FROM HBASE!:
> >
> hdfs://elh00/user/hadoop/events/siteID-141/2010-04-12T01-0700/zpjyod.events
> > byte
> >
> > ...and this continues for the remaining files.
> >
> > Second execution with *any* of the seemingly missing userIDs yields the
> > following sample:
> >
> > Count bytes for commandline user 141|754717712663942409|c: 855
> > ...(many 'good' output lines, like the following 1)
> > bytes for user 141|qfbvndelauretis|a: 2907001
> > bytes for user 141|754436240987076893|c: 0
> > 1/208 FILE MAY HAVE LINE MISSING FROM HBASE!:
> >
> hdfs://elh00/user/hadoop/events/siteID-141/2010-04-12T14-0700/hehvln.events
> > bytes for user 141|754436241315533944|c: 0
> > bytes for user 141|754436241215573999|c: 0
> > 2/210 FILE MAY HAVE LINE MISSING FROM HBASE!:
> >
> >
> hdfs://elh00/user/hadoop/events/siteID-141/2010-04-12T15-0700/fvkeert.events
> > ...
> >
> > Notice that the 'zeros' don't occur until file 208 this time. This is not
> > random either, rerunning the above two will produce the exact same
> results,
> > all day long. Its as if selecting the initial user allows its region to
> be
> > read more consistently for the remainder of the run. Three last points:
> No
> > exceptions are ever thrown, all region servers are up throughout the
> > execution, and no other reads or writes are occurring on the cluster
> during
> > the execution.
> >
> > Any thoughts of advice? This is really causing me pain at the moment.
> >
> > Oh, and here's the quick and dirty class that produces this:
> >
> > package com.touchcommerce.data.jobs.misc.partitioning_debug;
> >
> > import java.io.BufferedReader;
> > import java.io.IOException;
> > import java.io.InputStreamReader;
> >
> > import org.apache.hadoop.fs.FileStatus;
> > import org.apache.hadoop.fs.FileSystem;
> > import org.apache.hadoop.fs.Path;
> > import org.apache.hadoop.hbase.HBaseConfiguration;
> > import org.apache.hadoop.hbase.client.Get;
> > import org.apache.hadoop.hbase.client.HTable;
> > import org.apache.hadoop.hbase.client.Result;
> > import org.apache.hadoop.hbase.util.Bytes;
> >
> > import com.touchcommerce.data.Constants;
> > import com.touchcommerce.data.services.resources.HDFSService;
> > import com.touchcommerce.data.services.utils.EventUtils;
> >
> > public class TestIt {
> >
> > private final static HBaseConfiguration config = new
> > HBaseConfiguration(HDFSService.newConfigInstance());
> >  private static String userToCheckFirst;
> > private static HTable userEventsTable;
> >  public static void main(String[] args) throws IOException {
> >  FileSystem hdfs = FileSystem.get(config);
> > userEventsTable = new HTable(config, Constants.HBASE_USER_EVENTS_TABLE);
> >  int maxLinesPerFileToRead = Integer.parseInt(args[0]);
> > FileStatus[] containedSiteEntries = hdfs.listStatus(new
> > Path(Constants.HDFS_EVENTS_ROOT_DIR));
> >  int good = 0;
> > int bad = 0;
> >  /**
> >  * Passing in a key here that returned no data during the loop below will
> > almost certainly result in event data,
> >  * meaning that hbase *does* have data for this key after all. So what's
> > wrong with the loop below??????
> >  */
> >  userToCheckFirst = args.length > 1 ? args[1] : null;
> > if (userToCheckFirst != null) {
> >  byte[] data = fetchData(Bytes.toBytes(userToCheckFirst));
> > System.out.println("Count bytes for commandline user " + userToCheckFirst
> +
> > ": " + (data == null ? 0 : data.length));
> >  }
> >  for (FileStatus siteStatus : containedSiteEntries) {
> >  if (siteStatus.isDir()) {
> > FileStatus[] containedHourEntries =
> hdfs.listStatus(siteStatus.getPath());
> >  for (FileStatus hourStatus : containedHourEntries) {
> > String hourStatusPath = hourStatus.getPath().toString();
> >  if (hourStatus.isDir() &&
> > hourStatusPath.indexOf(Constants.HDFS_INVALID_EVENTS_DIR) < 0
> > && (hourStatusPath.indexOf("2010-04-12") > 0)) {
> >  FileStatus[] containedHourFiles = hdfs.listStatus(hourStatus.getPath());
> > for (FileStatus hourFile : containedHourFiles) {
> >  if (hourFile.getLen() > 0) {
> > Path hourFilePath = hourFile.getPath();
> >  boolean containedUser = false;
> > BufferedReader in = new BufferedReader(new
> > InputStreamReader(hdfs.open(hourFilePath)));
> >  boolean fileIsGood = false;
> > String line = in.readLine();
> > boolean processMoreLines = line != null;
> >  int linesRead = line == null ? 0 : 1;
> > while (processMoreLines) {
> >  byte[] data = null;
> > String siteID = EventUtils.extractField(line, "siteID");
> >  String userID = EventUtils.extractCustomerID(line);
> > String type = "c";
> >  if (userID == null) {
> > userID = EventUtils.extractAgentID(line);
> >  type = "a";
> > }
> > if (userID != null) {
> >  containedUser = true;
> > int attempts = 0;
> > while (data == null || data.length == 0) {
> >  data = fetchData(Bytes.toBytes(siteID + "|" + userID + "|" + type));
> > if (data == null || data.length == 0) {
> >  //THIS SHOULD NOT HAPPEN, BUT ONCE IT DOES, THE REST SEEM TO FOLLOW WITH
> > THE SAME
> > }
> >  else {
> > System.out.println("bytes for user " + siteID + "|" + userID + "|" + type
> +
> > ": " + (data == null ? 0 : data.length));
> >  break;
> > }
> > if (++attempts == 3) {
> >  System.out.println("bytes for user " + siteID + "|" + userID + "|" +
> type
> > +
> > ": " + (data == null ? 0 : data.length));
> >  break;
> > }
> > }
> >  }
> > if (data != null && data.length > 0) {
> > fileIsGood = true;
> >  processMoreLines = false;
> > }
> >  if (linesRead >= maxLinesPerFileToRead)
> > processMoreLines = false;
> >  else {
> > line = in.readLine();
> > processMoreLines = line != null;
> >  linesRead++;
> > }
> > }
> >  in.close();
> > if (fileIsGood || !containedUser)
> > good++;
> >  else {
> > bad++;
> > System.out.println(bad +"/" + (good+bad) + " FILE MAY HAVE LINE MISSING
> > FROM
> > HBASE!: " + hourFilePath);
> >  }
> > }
> > }
> >  }
> > }
> >  }
> > }
> > }
> >
> > private static byte[] fetchData(byte[] userBytes) throws IOException {
> > Get userEventsQuery = new Get(userBytes);
> >  Result row = userEventsTable.get(userEventsQuery);
> > if(row == null)
> > return null;
> >  return row.getValue(Constants.HBASE_USER_EVENTS_TABLE_RAW_FAMILY,
> > Constants.HBASE_USER_EVENTS_TABLE_EVENTS_COLUMN);
> > }
> > }
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message