My workspace

Posts

Showing posts from 2012

Real time data processing with Storm, ETL from Oracle to Cassandra

Last couple of months we are using Apache Hadoop Map Reduce batch processing to analyze a huge amount of data. We have a few legacy product where we can't consider to using Cassandra big table database. A few of them uses Oracle Database as their primary storage. As our requirements we have to extract the data from the rdbms, parse the payload and load it to Cassandra for aggregating. Here i have decided to use Storm for real time data processing. Our usecase is as follows: 1) Storm spout connects to Oracle Database and collects data from particular table with some intervals. 2) Storm bolt parses the data with some fashion and emit to Storm-cassandra bolt to store the row into Cassandra DB. Here is the fragment code of project. First i have create a Jdbc connector class, class contain a few class variables which contradicting with Storm ideology, as far i have just need one spout as input - it's enough for me. package storm.contrib.jdbc; import org.slf4j.Logger; import

UnsatisfiedLinkError: JNA link failure on RHEL 5.7 with Cassandra 1.1.5

Today our team tried to install JNA 3.5.0 in our UAT environment. Here is the link to make a try. At the moment of Cassandra start we have noticed the following INFO on cassandra log: INFO [main] 2012-11-27 13:20:17,747 CLibrary.java (line 66) JNA link failure, one or more native method will be unavailable. Very interesting thing is that, most of the JNA features works. I have decided to investigate the problem and restart Cassandra in debug mode (edit the log4j-server.properties and set the rootLogger level DEBUG) and found the details error DEBUG [main] 2012-11-27 13:20:17,748 CLibrary.java (line 67) JNA link failure details: /tmp/jna-oracle/jna1599621626582486116.tmp: /lib64/libc.so.6: version `GLIBC_2.11' not found (required by /tmp/jna-oracle/jna1599621626582486116.tmp) Now it was easy to fix the problem. There is a few solution^ 1) updated linux binaries for x86/amd64 against 2.1.3 and 2.2.5 2) Use JNA 3.3.0 or JNA 2.7.0 version you can download JNA 3.3.0 from the follow

Patch pig_cassandra for setting ttl to cassandra data

Apache pig provides a platform for analyzing very large data set. With apache pig you can easily analyze your data from Cassandra. Apache pig compiles instruction to sequences of Map-Reduce programs which will run on Hadoop cluster. Cassandra source provides a simple pig script to run pig with Cassandra data. Cassandra also provides CassandraStorage class which will load and store data from Cassandra DB, this class will no built in support for storing data with TTL (time to live). In many cases you have to update a few columns or rows with ttl to delete later automatically from DB. For that, i have patched the CassandraStorage class and add the similar functionality. Here is the patch Index: src/main/java/ru/atc/smev/cassandra/storage/CassandraStorage.java IDEA additional info: Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP <+>UTF-8 =================================================================== --- src/main/java/ru/atc/smev/cassandra/storage/CassandraStorage.ja

Another Cassandra data manipulation api - PlayOrm

Recently i have found one interesting project on Github named PlayOrm , which features very impress me. I have decided to just play with it. Lets first check out there features list: Just added support for Entity has a Cursor instead of List which is lazy read to prevent out of memory on VERY wide rows PlayOrm Queries use way less resources from cassandra cluster than CQL queries Scalabla JQL(SJQL) supported which is modified JQL that scales(SQL doesn't scale well) Partitioning so you can query a one trillion row table in just ms with SJQL(Scalable Java Query Language) Typical query support of <=, <, >, >= and = and no limitations here Typical query support of AND and OR as well as parenthesis Inner Join support (Must keep your very very large tables partitioned so you get very fast access times here) Left Outer Join support Return Database cursor on query OneToMany, ManyToMany, OneToOne, and ManyToOne but the ToMany's are nosql fashion not like RDBMS

A single node Hadoop + Cassandra + Pig setup

UP1: Our book High Performace in-memory computing with Apache Ignite has been released. The book briefly described how to improved performance in existing legacy Hadoop cluster with Apache Ignite. In our current project, we have decided to store all operational logs into NoSQL DB. It's total volume about 97 TB per year. Cassandra was our main candidate to use as NoSQL DB. But we also have to analysis and monitor our data, where comes Hadoop and Pig to help. Within 2 days our team able to developed simple pilot projects to demonstrate all the power of Hadoop + Cassandra and Pig. For the pilot project we used DataStax Enterprise edition. Seems this out of box product help us to quick install Hadoop, Cassandra stack and developed our pilot project. Here we made a decision to setup Hadoop, Cassandra, and Pig by our self. It's my first attempt to install Cassandra over Hadoop and Pig. Seems all these above products already running already a few years, but I haven't found

Oracle Enterprise gateway load test

Last few weeks we have done a few exercises with OEG, develop a lot of filters, asynchronous delivery with JMS and much more. Last week we decided to do some load test, specially how many web service we can register in OEG. We develop jython script to register web services and assign policy in web services. However it's very tough to collect a big amount of web services online - we decided to register on web service in separate web service group, because you can register two similar web service in one group. The Jython script is as follows: ''' Register WSDL group in Gateway ''' from java.util import ArrayList from deploy import DeployAPI from esapi import EntityStoreAPI from com.vordel.client.manager.actions.services import AddWebServiceServletProcessor from vtrace import Tracer import common import datetime t = Tracer(Tracer.INFO) # Set trace to info level gw_deployURL = "http://localhost:8090/configuration/deployments/DeploymentService" dep

Register webservices wsdl in OEG through script

In my previous Post blogger Mark O'Neill pointed me to check new version of OEG, which has been shipped with sample scripts. Last weekend i decided to spent some time to digging out these samples. The following a list of sample scripts which ship with the Gateway: - analyse Perform various analysis on configuration - certs Examples working with certificates - io Exporting and importing configuration - publish Publishing new types to the configuration - upgrade Upgrading older versions of configuration - users Examples working with users - ws Working with Webservices and WSDLs It's pretty easy yo run these scripts, for example sh run.sh ws\listWebServices.py will show the list of the registered web service. Similarly registerWebService.py will register web service wsdl into OEG. I have simply modify the script to register web services wsdl from file, where file contain a list of web service wsdl as follows: http://www.restfulwebservices.net/wcf/WeatherFo

Manipulating Oracle Gateway Entity Store with gateway SDK

Oracle Enterprise Gateway (OEG ) is built in gateway product from company Vordel to simplify and secure SOA deployments. OEG replaces Oracle web service manager functionality for SOA development. In real life, most of all time we have a lot of services to registered in OWSM or in OEG. Even more, it was not possible to migrate registered services from one node to another on OWSM. When we got plan to migrate from OWSM to OEG, our main aim was to register web services automatically through API. I was very happy, when found OEG provides some SDK to working with registry. Here is the first attempt to working with OEG SDK. We will use maven to build our project. OEG entity store consolidate all the entities and objects uses in the repository, for example all the registered services and policies. Through Entity store you can add, update and delete any entity. In OTN you can find one tutorial to develop a custom policy through OEG SDK . OEG also provide entity explorer to working with entity s

Manage application configuration with JMX in JBOSS application server

Most of the time developers likes manage their application configuration on separate file, which contains name value pair. In my current project one of my team member also implements such a configuration through Spring. Put the file on Jboss %jboss-as%/server/xyz/conf folder which will picked by the spring on the startup of the server. I have asked him, what should i do to change the value of the configuration. He replied, you have to change the file and restart the server or start and stop the application ))). Certainly we face a lof of times these type of use cases. I told him about JMX and decided to make some quick change on code. Todays post is about JMX. For more information about JMX use cases, check the following links JMX use cases. At first we will create one interface and his implements, which will be our resource to manage by JMX. here is the fragment code of the classes: public interface ConfigMBean { public void setURL(String url); public String getURL();

Configure Nginx to working with WebLogic 12C

Nginx is a free, open-source, high-performance HTTP server and reverse proxy server, which can be use with WebLogic application server to cache static page. It's also able to load balancing between servers. However nginx default proxy pass configuration not working properly with WebLogic server, because WebLogic server reset his http header which changes host and port. Here is the configuration for Ngnix proxy pass: proxy_cache_path usr/apps/nignx/nginx-1.1.12/cache/ levels=1:2 keys_zone=data-cache:8m max_size=1000m inactive=600m; proxy_temp_path usr/apps/nignx/nginx-1.1.12/cache/temp; upstream osbapp{ server 192.168.52.101:7001; server 192.168.52.101:7002; } server { listen 8001; server_name 192.168.52.103; location / { proxy_set_header Host $http_host; # set the parameter for fine granned header proxy_set_header X-Real-IP $remote_addr; proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for; proxy