Monday

JVM multi-tenancy with ElastiCat

One of the speaker in last JavaOne Moscow mentioned about JVM multi-tenancy and declare it's availability in Java 9. End of the last year IBM also declare their IBM J9 JVM to host multiple applications. Out of curiosity i was interested on this topics and time to time read a few articles and see presentation. Who hasn't familiar with JVM multi-tenancy yet, JVM multi-tenancy is hosting more then one application on single JVM. Here you will found a very good articles about JVM multi-tanancy. Last Friday i have got an e-mail from Dzone mailing list about ElastiCat, which providing jvm multi tenancy and much more. After spending some time i download the products and deploy simple web service on it and play with JVM. Today i have found a few spare hours after lunch and decided to share my experience with ElastiCat.
ElastiCat, as they declared, provides Multitenancy, Isolation, Density and Elasticity through Tomcat. You can deploy more than one application on single tomcat and share infrastructure resources. ElastiCat provides virtualization layer that isolates applications in Java Virtual Containers, it's means when you are deploying your application in ElastiCat, you are deploying in Java Virtual Containers and you have control on this JVC, you can add heap memory on it or can add any extra resources. Ok lets try some quick test.
1) Download ElastiCat and extract it in any directory, please see the prerequisites first. Currently ElastiCat doesn't support JDK 1.7.
2) Run the servlet container.
3) Develop a quick web service for testing purpose.
4) Deploy the web service war in directory $ELASTICAT_HOME/cloudapps. If you want to deploy in your application host JVM, if can use $ELASTICAT_HOME/webapps
5) ElastiCat also provides a few examples to try, one of the test-app you can reach by http://localhost:8080/test-app
6) Now we can use jirsh shell, jirsh shell - is a command line administrative interface to the Waratek CloudVM for Java. The Waratek jirsh shell is based on the libvirt virsh shell.
7) Connect through terminals ssh -p 2222 super@localhost - use word super for password
8) Run the command list which will show the following output (it may be differ from yours)
JVCID    GROUP    STATUS          NAME                    COMMAND
0        0        Running         dom-0                   platform
1        0        Running         jvc-1                   /examples1.war
2        0        Running         jvc-2                   /examples2.war
3        0        Running         jvc-3                   /test-app.war
4        0        Running         jvc-4                   /test-infras.war
my webservice artifact test-infras.war is deployed in jvc-4, we can check the configuration by running the command dominfo
dominfo 4

JVCID:                          4
JVC-NAME:                       jvc-4
JVC type:                       ServletContainer
JVC command line:               /test-infras.war
Console log file:               /home/xx/ccc/elasticcat/waratek-elasticat-0.9.2/waratek/var/log/javad/elasticat/jvc-4/stdout
JVC status:                     Running
JVC persistence:                true
JVC priority:                   10
JVC elastic group:              0
JVC uptime:                     21 minutes, 3.237 seconds
JVC cpu usage:                  0.0010 GHz-hours (1.570 seconds)
Maximum heap memory:            0 (unlimited)
Allowed elastic memory:         0KiB
Used heap memory:               1.16MiB
Classloader count:              4
Total classes loaded:           2470
Thread maximum limit:           0 (unlimited)
Alive thread count:             1
Alive daemon threads:           1
Peak thread count:              1
Total started threads:          1
Alive thread IDs:               57
Number of host processors:      4
Number of JVC processors:       4
Cpu affinity:                   true, true, true, true,
File descriptor limit:          0 (unlimited)
File descriptor count:          10
File bytes written:             10781 (10.53KiB)
File bytes read:                21281 (20.78KiB)
Socket maximum limit:           0 (unlimited)
Active socket count:            0
Network bytes written:          5696 (5.56KiB)
Network bytes read:             1232 (1.2KiB)
Native library loading is:      Enabled
Virtual root directory:         "/"
9) Lets modify web service to do some memory leaks, perhaps it's very easy way to do OOM errors.
10) Now we are going to fixed the heap memory size for the JVC by running the command setmem
setmem 4 10000 # add about 10 MB heap 
dominfo 4

JVCID:                          4
JVC-NAME:                       jvc-4
JVC type:                       ServletContainer
JVC command line:               /test-infras.war
JVC status:                     Running
JVC persistence:                true
JVC priority:                   10
JVC elastic group:              0
JVC uptime:                     30 minutes, 3.445 seconds
JVC cpu usage:                  0.0009 GHz-hours (1.410 seconds)
Maximum heap memory:            9.77MiB
Allowed elastic memory:         0KiB
Used heap memory:               1.18MiB
Classloader count:              4
Total classes loaded:           2449
11) now we are ready to invoke the web service method to fill up the heap size for getting OOM error, run the web service client a few times
setmem 4 10000 # add about 10 MB heap 
dominfo 4

JVCID:                          4
JVC-NAME:                       jvc-4
JVC type:                       ServletContainer
JVC command line:               /test-infras.war
JVC status:                     Running
JVC persistence:                true
JVC priority:                   10
JVC elastic group:              0
JVC uptime:                     30 minutes, 3.445 seconds
JVC cpu usage:                  0.0009 GHz-hours (1.410 seconds)
Maximum heap memory:            9.77MiB
Allowed elastic memory:         0KiB
Used heap memory:               9.43MiB
Classloader count:              4
Total classes loaded:           2449
12) Check the ElastiCat log file
WARNING: Interceptor for TestService#sayHello has thrown exception, unwinding now
org.apache.cxf.interceptor.Fault
 at org.apache.cxf.interceptor.ServiceInvokerInterceptor.handleMessage(ServiceInvokerInterceptor.java:122)
 at org.apache.cxf.phase.PhaseInterceptorChain.doIntercept(PhaseInterceptorChain.java:263)
 at org.apache.cxf.phase.PhaseInterceptorChain.resume(PhaseInterceptorChain.java:232)
 at org.apache.cxf.interceptor.OneWayProcessorInterceptor$1.run(OneWayProcessorInterceptor.java:130)
 at org.apache.cxf.workqueue.AutomaticWorkQueueImpl$3.run(AutomaticWorkQueueImpl.java:371)
 at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
 at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
 at java.lang.Thread.run(Thread.java)
 at java.lang.Thread.begin(Thread.java)
 at java.lang.Thread.invokeRun(Thread.java)
 at java.lang.reflect.VMReflection.invokeMethod(VMReflection.java)
 at java.lang.reflect.Method$NativeAccessor.invoke(Method.java)
 at java.lang.reflect.MethodNativeAccessorImpl.invoke(MethodNativeAccessorImpl.java)
 at java.lang.reflect.Method.invoke(Method.java)
Caused by: java.lang.OutOfMemoryError
Yes, we already stack with the OOM error on server. At these moment we can check several application deployed in ElastiCat, in my cases others application run well, however i have check only test-app.
Here we can add a more resource (Heap memory) to avoid the OOM error and continue to execute the web service method.
13) Add more 10 MB heap to the 4th JVC
setmem 4 10000
dominfo 4

JVCID:                          4
JVC-NAME:                       jvc-4
JVC type:                       ServletContainer
JVC command line:               /test-infras.war
JVC status:                     Running
JVC persistence:                true
JVC priority:                   10
JVC elastic group:              0
JVC uptime:                     30 minutes, 3.445 seconds
JVC cpu usage:                  0.0009 GHz-hours (1.410 seconds)
Maximum heap memory:            18.65MiB
Allowed elastic memory:         0KiB
Used heap memory:               9.43MiB
Classloader count:              4
Total classes loaded:           2449
Run the web service client and we check the log file, you should find that OOM error clear and the web service continue executing normally.
With Jirsh command line client you also stop, start and resume the JVC, the full list of the command you can find here.
In my experience it's works like a charm, however very hard to say what will be the actual performance in production environment. All the JVC's share the host JVM GC and other resources also, it's hard to say what will be the actual performance in production with high load app. At first glance, bottle neck would be the GC, but for the first step of JVM multi tenancy ElastiCat is good choice.

Wednesday

Hadoop Map reduce with Cassandra Cql through Pig

One of the main disadvantage of using PIG is that, Pig always raise all the data from Cassandra Storage, and after that it can filter by your choose. It's very easy to imagine how the workload will be if you have a tons of million rows in your CF. For example, in our production environment we have always more than 300 million rows, where only 20-25 millions of rows is unprocessed. When we are executing pig script, we have got more than 5000 map tasks with all the 300 millions of rows. It's time consuming and high load batch processing we always tried to avoid but in vain. It's could be very nice if we could use CQL query in pig scripts with where clause to select and filter our data. Here benefit is clear, less data will consume, less map task and a little workload.


Still in latest version of Cassandra (1.2.6) this feature is not available. This feature is planned in next version Cassandra 1.2.7. However patch is already available for this feature, with a few efforts we can make a try.
First we have to download the source code of the Cassandra from the branch 1.2. Also we should have a configured Hadoop cluster with Pig.
1) Download the Cassandra source code from branch 1.2
git clone -b cassandra-1.2 http://git-wip-us.apache.org/repos/asf/cassandra.git
assume that we already familiar with git.
and also apply the patch fix_where_clause.patch

Now compile the source code and setup the cluster. For testing purpose i am using my single node Hadoop 1.1.2 + Cassandra 1.2.7 + Pig 0.11.1 cluster.
2) To setup single node cluster please see here A single node Hadoop + Cassandra + Pig setup
3) Create a CF as follows:
CREATE TABLE test (
  id text PRIMARY KEY,
  title text,
  age int
);
and insert some dummy data
insert into test (id, title, age) values('1', 'child', 21);
insert into test (id, title, age) values('2', 'support', 21);
insert into test (id, title, age) values('3', 'manager', 31);
insert into test (id, title, age) values('4', 'QA', 41); 
insert into test (id, title, age) values('5', 'QA', 30); 
insert into test (id, title, age) values('6', 'QA', 30); 
4) Execute the following pig script
rows = LOAD 'cql://keyspace1/test?page_size=1&columns=title,age&split_size=4&where_clause=age%3D30' USING CqlStorage();
dump rows;
you should get following result on pig console
((id,5),(age,30),(title,QA))
((id,6),(age,30),(title,QA))
Lets check the Hadoop job history page

Map input records equals 2.
With this new feature we can use where clause to select our desired data from Cassandra storage. You can also check the jira issue tracker to drill down much more.
All the credits goes for the Alex Lui, who implemented this feature.