My workspace

Posts

Showing posts from May, 2013

Lesson learned : Hadoop + Cassandra integration

After a few weeks break at last we completed our tuning and configuration cassandra hadoop stack in production. It was exciting and i decided to share our experience with all. 1) Cassandra version >> 1.2 has some problems and doesn't integrate with Hadoop very well. The problem with Map Reduce, when we runs any Map reduce job, it always assigns only one mapper regardless of the amount of data. See here for more detail. 2) If you are going to use Pig for you data analysis, think twice, because Pig always picks up all the data from the Cassandra Storage and only after these it can filter. If you have a billions of rows and only a few millions of then you have to aggregate, then Pig always pick up the billions of rows. Here you can find a compression between Hadoop framework for executing Map reduce. 3) If you are using Pig, filter rows as early as possible. Filter fields like null or empty. 4) When using Pig, try to model your CF slightly different. Use Bucket pattern, sto