Saturday

Load balancing and fail over with scheduler

Every programmer at least develop one Scheduler or Job in their life time of programming. Nowadays writing or developing scheduler to get you job done is very simple, but when you are thinking about high availability or load balancing your scheduler or job it getting some tricky. Even more when you have a few instance of your scheduler but only one can be run at a time also need some tricks to done. A long time ago i used some data base table lock to achieved such a functionality as leader election. Around 2010 when Zookeeper comes into play, i always preferred to use Zookeeper to bring high availability and scalability. For using Zookeeper you have to need Zookeeper cluster with minimum 3 nodes and maintain the cluster. Our new customer denied to use such a open source product in their environment and i was definitely need to find something alternative. Definitely Quartz was the next choose. Quartz makes developing scheduler easy and simple. Quartz clustering feature brings the HA and scalability in scheduler.
Quartz uses JDBC-Jobstore to store the jobs and load balancing between different nodes. Please see below for high level architecture of Quartz clustering.
Quartz Clustering Features:
1) Provides fail-over.
2) Provides load balancing.
3) Quartz's built-in clustering features rely upon database persistence via JDBCJobStore (described above).
4) Terracotta extensions to Quartz provide clustering capabilities without the need for a back-end database.
Lets start coding, first we need to prepare our DB:
In my case, its Oracle.
CREATE USER quartztest PROFILE DEFAULT IDENTIFIED BY quartztest DEFAULT TABLESPACE users TEMPORARY TABLESPACE temp ACCOUNT UNLOCK;
GRANT CONNECT, RESOURCE TO quartztest;
GRANT create session TO quartztest;
I have just create a DB schema named quartztest in my Oracle database.
For creating database objects we have to download the quartz distributive and run the SQL script for Oracle. You can also download the script from my Github repository
After running the sql script and prepared our DB, we are ready to start developing our high availability scheduler.
First implements the Quartz JOB interface as below
package com.blu.scheduler;

import org.quartz.Job;
import org.quartz.JobExecutionContext;
import org.quartz.JobExecutionException;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

/**
 * Created by shamim on 07/12/15.
 */
public class AltynJob implements Job {
    private Logger logger = LoggerFactory.getLogger(AltynJob.class);
    @Override
    public void execute(JobExecutionContext jobExecutionContext) throws JobExecutionException {
        logger.info("Do something useful!!", jobExecutionContext);
    }
}
Now we need to create our Job and start the scheduler
package com.blu.scheduler;

import org.quartz.*;

import org.quartz.impl.StdSchedulerFactory;

import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

import static org.quartz.JobBuilder.newJob;
import static org.quartz.SimpleScheduleBuilder.simpleSchedule;
import static org.quartz.TriggerBuilder.newTrigger;

/**
 * Created by shamim on 11/12/15.
 */
public class CreateJob {
    private static final Logger LOGGER = LoggerFactory.getLogger(CreateJob.class);
    private static final String JOB_NAME="jobMedian";
    private static final String GROUP = "jobMedianGroup";
    private static final String TRIGGER_NAME= "trgMedian";
    private static final boolean isRecoverable = false;
    private static final Integer INTERVAL = 40; // in seconds



    private void create() throws SchedulerException {
        final Scheduler scheduler = StdSchedulerFactory.getDefaultScheduler();
        // create JOb
        JobDetail jobMedian = newJob(AltynJob.class).withIdentity(JOB_NAME, GROUP)
                                                .requestRecovery() // ask scheduler to re-execute this job if it was in progress when the scheduler went down
                                                .build();
        // trigger
        SimpleScheduleBuilder scheduleBuilder = SimpleScheduleBuilder.simpleSchedule();
        scheduleBuilder.withIntervalInSeconds(INTERVAL).repeatForever();

        Trigger trgMedian = newTrigger().withIdentity(TRIGGER_NAME, GROUP)
                                        .startNow().withSchedule(scheduleBuilder).build();
        LOGGER.info("Start the scheduler!!");
        scheduler.start();
        // Schedule the job
        scheduler.scheduleJob(jobMedian, trgMedian);

    }

    public static void main(String[] args) {
        LOGGER.info("Create and Start the scheduler!!");
        try {
            new CreateJob().create();
        } catch (SchedulerException e) {
            LOGGER.error(e.getMessage());
        }
    }
}
Note that Job should be create one time by any scheduler. Our job named jobMedian will be store in data base.
Now we need Quartz configuration to create and running the job
#============================================================================
# Configure Main Scheduler Properties
#============================================================================

org.quartz.scheduler.instanceName = MyClusteredScheduler
org.quartz.scheduler.instanceId = AUTO

#============================================================================
# Configure ThreadPool
#============================================================================

org.quartz.threadPool.class = org.quartz.simpl.SimpleThreadPool
org.quartz.threadPool.threadCount = 1
#org.quartz.threadPool.threadPriority = 5

#============================================================================
# Configure JobStore
#============================================================================

org.quartz.jobStore.misfireThreshold = 60000

org.quartz.jobStore.class = org.quartz.impl.jdbcjobstore.JobStoreTX
org.quartz.jobStore.driverDelegateClass = org.quartz.impl.jdbcjobstore.oracle.OracleDelegate
org.quartz.jobStore.useProperties = false
org.quartz.jobStore.dataSource = myDS
org.quartz.jobStore.tablePrefix = QRTZ_

org.quartz.jobStore.isClustered = true
# interval should be minimum for nodes to put and get the lock. In my case trigger interval is 40 seconds.
org.quartz.jobStore.clusterCheckinInterval = 10000

#============================================================================
# Configure Datasources
#============================================================================

org.quartz.dataSource.myDS.driver = oracle.jdbc.driver.OracleDriver
org.quartz.dataSource.myDS.URL = jdbc:oracle:thin:@localdomain:1521:DB11G
org.quartz.dataSource.myDS.user = quartztest
org.quartz.dataSource.myDS.password = quartztest
org.quartz.dataSource.myDS.maxConnections = 5
org.quartz.dataSource.myDS.validationQuery=select 0 from dual
One of the main properties is org.quartz.jobStore.isClustered = true, which confirm that quartz scheduler will run in cluster mode.
If you will run the above code first time, it will create and run the scheduler in cluster mode. Every 40 seconds you will get the message "Do something useful!!" in your console.
If you will the class StartNode, it will start another scheduler instance.
Another very important configuration is org.quartz.jobStore.clusterCheckinInterval = 10000, if it is too high or near by your trigger interval, load balancing could not work. Because every scheduler instance should get the lock for run their job. If the properties is too small it can harm you Database performance. Imagine every 2 seconds a lot of scheduler with 100 jobs are trying to get and release lock from you data base. How does the fail over work? if i shut down one of my instance, after 10 seconds one of the instance detect that one of my instance went down and he will take care of my scheduler.
If you query the data base table QRTZ_FIRED_TRIGGERS, you will found which instance acquired the lock and fired the trigger
In summary, quartz scheduler cluster is easy to setup and run. Happy weekend.

Wednesday

HighLoad++ conference 2015


UP1: Much more about high performance compution should be found in this book.

This year i was invited to HighLoad++ conference in Moscow as a speaker. My session was in 2nd November in hall number 1, you can check the summary of my presentation here. I have very enjoyed my session, there are a lot of specialists came from the different sector and I was pleased to answer their questions. Even I continue my talk with participants after my session. Here you can find my full presentation in slide share. Certainly, i also listen to a few talks and I have to mention some of them.
Session from company 2 sigma and Alibaba was very interesting. Company 2 sigma describe how they uses and managed their cluster using apache Mesos. Also 2 days non stop sessions from PostgresSQL, a lot of informations for developer and DBA. I have also learn a few new thing such as "competition" base machine learning which are using Avito. Also company Hawq introduce their new SQL engine for Hadoop, it's a alternative for SparkSQL. At last many thanks to organiser for such a great conference. Here is my presentation in embedded view.


You can find other important staff to improve performance from the book

Sunday

Ingest data from Oracle DataBase to ElasticSearch

One of my few blog posts i have mentioned how to use and the use cases of using Oracle DataBase changed notification. When you have to need context search or count facts over your datas, you certainly need any modern Lucene based search engine. Elastic search is one of that search engine that can provide those above functionalities. However, from the previous versions of ES, elastic search river is deprecated to ingesting data to ES. Here you can get the whole history about deprecating river in ES. Any way, for now we have two option to ingest or import data from any sources to ES:
1) Implements or modify your DAO services, that can update data in ES and DataBase same time.
2) Polling, implements such a job, which will polling data in some period of time to update data in ES.

First approach is the very best option to implements, however if you have any legacy DAO services or 3rd party application that you couldn't make any changes is not for you. Polling to data base frequently with huge data can hard the performance of the DataBase.
In this blog post i am going to describe an alternative way to ingest data from data base to ES. One of the Prerequisites is that, you have Oracle Database with version higher than 9.0. Also i added the whole code base in github.com to explorer the capability.
Here is the data flow from Oracle to ES:
Data Flow is very simple:
1) Registered listener getting changes (RowID, ObjectId[~tableID]) for every commit in DB.
2) Listener send the changes (RowId, ObjectId) in xml to any MQ, we are using Apache Apollo.
3) Consumer of the queue collects the messages and query to the DataBase for Table meta data and result set by rowId.
4) Consumer build the result set in JSON format and index the result set in ES.

Git hub repository contains the three module to implements the data flow.
[qrcn] - collect notification from Oracle and send the notifications to any existing queue [apollo].
[es] - consumer, collects the message from the queue and index in Elastic search
[es-dto] - common dto
You can change your Query notification on QRCN module (file connection.properties).
querystring=select * from temp t where t.a = 'a1';select * from ATM_STATE
Also i add zookeeper to QRCN module to be fault tolerate.
Any way, at these moment project contains the following prerequisites:
1) racle JDBC 11g driver needs to compile the project.
2) apache zookeeper
3) apache Apollo
4) elastic search
Any way, you can always make any changes by your requirements. Happy weekend.

Tuesday

Database DML/DDL event processing with Oracle Database change notification

A few years ago in one of my blog post, i described how to use Oracle database changed notification to update HazelCast cache in application server layer. For now this is one of the finest use case of using Oracle database changed notification, but you can also use this possibilities for solving others problem such as event processing in legacy table. For instance, you are developing dispute system for any Banking system. For banking core system, dispute is as a another banking transaction, when any dispute comes from any client, operator of bank should react on this transaction. Most of the time disputes keeps in same storage (tables) along with another transactions. Such type of tables can keeps billions of rows, and when you would like to get notified when a few of the rows changes, you have a few options in your hand:
1) Poll periodically, schedular which will poll the whole table periodically to get changed data.
2) Using oracle trigger to send some notification (stored procedure or oracle embedded java implemention)
3) Using Oracle Database change notification or Oracle continuous Query Notification
if you have a few billions of rows in OLTP system, first option is not a option at all, using trigger can also hit performance issue in OLTP system. For asynchronous event processing from oracle objects, Oracle database change notification is one the best possible option. Oracle provides three different implementing of change notification: Oracle continuous query notification and Oracle database change notification. Oracle continuous notification provides only C and pl/sql implements, in the other hand Oracle changed notification has java implementation. To enable changed notification you should only have to grant the privilege as follows and you are ready to go:
grant change notification to USER_NAME;
One thing you have to mentioned that, with notification you will only get the ROWID and objects name or ID not the entire row, it's means you will have to do extra select query by rowid to get the row. Below illustration of the simple flow of the process:
The explanation of the steps in above illustration is as follows:
1) In this example, client register the lister to certain user Oracle objects and the listener also.
2) The database populate the registration information in the data dictionary
3) Any partner application making any changes in User Objects, it's may be any DML/DDL operations
4) Oracle JOBQ background process is notified of a new change notification message
5) JOBQ process notify the client app listener.
6) Client listener gets the ROWID and the user objects ID and sending the information to the MQ server, such as kafka
7) Subscriber of the topic of Kafka server getting the information of ROWID
8) Processor query the User objects to get the result set and start processing the updated information
Lets take a quick look щи pseudo code (all the code you will find in the git hub)

Registration of the notification:
package com.blu.db;

import oracle.jdbc.OracleStatement;
import oracle.jdbc.dcn.*;
import oracle.jdbc.driver.OracleConnection;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

import java.sql.ResultSet;
import java.sql.SQLException;
import java.sql.Statement;
import java.util.Properties;

public class DBNotifactionConsumer {
    private static final Logger LOGGER = LoggerFactory.getLogger(DBNotifactionConsumer.class);

    private NsiOracleConnection oracleConnection;
    private static final Properties properties = new Properties();
    private String queryString;

    public String getQueryString() {
        return queryString;
    }

    public void setQueryString(String queryString) {
        this.queryString = queryString;
    }

    static{
        properties.setProperty(OracleConnection.DCN_NOTIFY_ROWIDS, "true");
        properties.setProperty(OracleConnection.DCN_QUERY_CHANGE_NOTIFICATION, "true"); //Activates query change notification instead of object change notification.
    }

    public DBNotifactionConsumer(NsiOracleConnection oracleConnection) {
        this.oracleConnection = oracleConnection;
    }

    public NsiOracleConnection getOracleConnection() {
        return oracleConnection;
    }

    public void registerNotification() throws SQLException{
        DatabaseChangeRegistration databaseChangeRegistration =  getOracleConnection().getConnection().registerDatabaseChangeNotification(properties);
        databaseChangeRegistration.addListener(new NsiListner());
        Statement stm = getOracleConnection().getConnection().createStatement();
        ((OracleStatement) stm).setDatabaseChangeRegistration(databaseChangeRegistration);
        ResultSet rs;
        for(String queryString : getQueryString().split(";")){
            rs = stm.executeQuery(queryString);
            while(rs.next()){
            }
            rs.close();
        }
        // get tables from dcr
        String[] tables = databaseChangeRegistration.getTables();
        for(String str : tables){
            LOGGER.info("Registreted Tables:{}", str);
        }
        if(!stm.isClosed()){
            stm.close();
        }
    }
}
In the above code i used query changed notification.
Listener without kafka is very simple
public class NsiListner implements DatabaseChangeListener {
    private static final Logger LOGGER = LoggerFactory.getLogger(NsiListner.class);

    @Override
    public void onDatabaseChangeNotification(DatabaseChangeEvent databaseChangeEvent) {
        for(QueryChangeDescription qcd : databaseChangeEvent.getQueryChangeDescription()){
            LOGGER.info("Query Id: {}", qcd.getQueryId());
            LOGGER.info("Event Type: {}", qcd.getQueryChangeEventType().name());
            for(TableChangeDescription tcd : qcd.getTableChangeDescription()){
                //ClassDescriptor descriptor = OracleChangeNotificationListener.this.descriptorsByTable.get(new DatabaseTable(tcd.getTableName()));
                LOGGER.info("table Name: {}", tcd.getTableName()); // table name is empty
                LOGGER.info("Object ID: {}", tcd.getObjectNumber()); // use object id]]
                for(RowChangeDescription rcd : tcd.getRowChangeDescription()){
                    LOGGER.info("Row ID:" + rcd.getRowid().stringValue() + " Operation:" + rcd.getRowOperation().name());
                }
            }

        }

    }
}
If you want to fault tolerant your listener application, you can use several listener application in a cluster and use leader election to run one listener at time. Here is the pseudo code of the simple leader elector, note that i am using curator to avoid boiler plate code.
package com.blu.curator;

import com.blu.db.DBNotifactionConsumer;
import org.apache.curator.framework.CuratorFramework;
import org.apache.curator.framework.recipes.leader.LeaderSelector;
import org.apache.curator.framework.recipes.leader.LeaderSelectorListenerAdapter;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import org.springframework.context.ApplicationContext;
import org.springframework.context.support.ClassPathXmlApplicationContext;

import java.util.concurrent.atomic.AtomicInteger;

public class Leader extends LeaderSelectorListenerAdapter {
    private static final Logger LOGGER = LoggerFactory.getLogger(SimpleClient.class);
    private String clientName;
    private CuratorFramework client;
    private String path;
    private LeaderSelector leaderSelector;
    // oracle change notification
    private ApplicationContext ctx;// = new ClassPathXmlApplicationContext("spring-context.xml");
    private DBNotifactionConsumer consumer;//= (DBNotifactionConsumer) ctx.getBean("consumer");

    public Leader(String clientName, CuratorFramework client, String path) {
        this.clientName = clientName;
        this.client = client;
        this.path = path;
        leaderSelector = new LeaderSelector(this.client,this.path, this);
        leaderSelector.autoRequeue();
        // initialize oracle change notification
        ctx = new ClassPathXmlApplicationContext("spring-context.xml");
        consumer = (DBNotifactionConsumer)ctx.getBean("consumer");
    }

    @Override
    public void takeLeadership(CuratorFramework curatorFramework) throws Exception {
        // run oracle notification here

        consumer.registerNotification();
        System.out.println(this.clientName + " is now the leader!!");
        Thread.sleep(Integer.MAX_VALUE);
    }
    public void start(){
        leaderSelector.start();
    }
}
When one of the listener will be not available, another one will replace it and continue to getting notification from Oracle DB. That's enough for today, guess above information will help somebody to quick start with Oracle notification changed.

Sunday

Resilient, are you ready for your application?

UP1: If you are planning to use Reactive programming, we recommend you to read the book "High performance in-memory computing with Apache Ignite".

Nowadays, term Reactive programming and Reactive manifesto is became trend. Blog post, articles and presentations are all over. Why peoples getting crazy with it? or what's the problem it could solve? The easiest way to answer the above questions is to think about the requirements we have building application these days.
We have to:
1) Modular - Module can help to maintain the application, it can be go offline and getting back to online without breaking all the system down;
2) Scalable - this way we can scale vertically or horizontally to handle a large amount of data or transactions;
3) Resilient - system can be getting back online and be fault tolerant;
4) Responsive - this means fast and available.
Here is the paradigm of reactive manifesto:
For most of the part, we already have a few framework such as Akka, Vert.x, Hystrix, ReactiveFx to develop reactive application. Today i am going to highlight only on resilient properties of any application. Probably in your development process, you had to had used any third party libraries or any 3rd party services through network. When these type of 3rd party services or libraries going down or not available, your application suffers in timeout exception or even more crash for high value waiting requests. For example, in our last project we are using 3rd party weather forecast service and define residence of city through ip address in our portal. When these services not available or not responses in time, response time of the portal increases and a few times system crashes because a lot of requests waited in thread pool. These could be happen in any external resources such as database connection, MQ connection e.t.c. If you have to recover from these situation, we have to need short circuit design pattern. Company Netflix a few years ago developed a framework to design such application, which also contain bulkhead patterns. Today i am going to describe the framework Hystrix and will develop a very quick start maven project to show it's capabilities.
Here is the illustration, how it works:
Design is very simple, "When you use Hystrix to wrap each underlying dependency. Each dependency is isolated from one other, restricted in the resources it can saturate when latency occurs, and covered in fallback logic that decides what response to make when any type of failure occurs in the dependency".
To show it's capabilities we developed a simple rest service with two methods (getweather, getcitiybyip), which will play as a 3rd party services.
public interface IServices {
    public String getWeather(String cityName);
    public String getCityByIp(String ipAddress);
}
And we also have a service facade of rest services based on Hystrix, which will delegate the request to the 3rd party services. If the 3rd party service not available, the circuit will open and no request will delegate to the service.
Here is the source code of the Hystrix service facade:
package com.blu.rest;

import com.netflix.hystrix.HystrixCommand;
import com.netflix.hystrix.HystrixCommandGroupKey;
import com.netflix.hystrix.HystrixCommandProperties;
import com.netflix.hystrix.HystrixThreadPoolProperties;
import com.sun.jersey.api.client.Client;
import com.sun.jersey.api.client.ClientResponse;
import com.sun.jersey.api.client.WebResource;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

import javax.ws.rs.GET;
import javax.ws.rs.Path;
import javax.ws.rs.PathParam;
import javax.ws.rs.core.Response;

/**
 * Created by shamim on 16/07/15.
 * Service facade
 */
// add json
@Path("/servicefacade")
public class HystrixServiceFacade extends HystrixCommand{
    private static final Logger LOGGER = LoggerFactory.getLogger(HystrixServiceFacade.class);
    private static final String CONTEXT_ROOT="http://localhost:8080/test/rest/serviceimpl/";
    private String ipAddress;


    public HystrixServiceFacade() {
        super(Setter.withGroupKey(HystrixCommandGroupKey.Factory.asKey("getCityByIp"))
                        .andThreadPoolPropertiesDefaults(HystrixThreadPoolProperties.Setter().withCoreSize(100))
                        .andCommandPropertiesDefaults(HystrixCommandProperties.Setter().withCircuitBreakerErrorThresholdPercentage(3)
                        .withCircuitBreakerSleepWindowInMilliseconds(2000)
                        )
        );
    }

    @GET
    @Path("/getcitybyip/{param}")
    // http://localhost:8080/test/rest/servicefacade/getcitybyip/192.168.121.11
    public Response echo (@PathParam("param") String ipAddress){
        LOGGER.info("Invoked with Parameter!!!" + ipAddress);
        this.ipAddress = ipAddress;


        String resp = this.execute();
        return Response.status(200).entity(resp).build();
    }

    @Override
    protected String run() throws Exception {
        // invoke 3rd party service
        final String uri = CONTEXT_ROOT + "getcitybyip/" +this.ipAddress;
        Client client = Client.create();
        WebResource webResource = client.resource(uri);
        ClientResponse response = webResource.accept("application/json").get(ClientResponse.class);

        if (response.getStatus() != 200) {
            LOGGER.info("Error:" + response.getStatus());
        }
        return response.getEntity(String.class);


    }

    @Override
    protected String getFallback() {
        return "Fall back for get City by Ip";
    }
    private GetWeatherCommand getWeatherCommand = new GetWeatherCommand();

    @GET
    @Path("/getweather/{param}")
    public Response getWeather(@PathParam("param")String cityName){
        LOGGER.info("Invoked getWeather with param: "+ cityName);
        getWeatherCommand.cityName = cityName;
        String resp = getWeatherCommand.execute();

        return Response.status(200).entity(resp).build();
    }
    class GetWeatherCommand extends HystrixCommand{
        private String cityName;
        public GetWeatherCommand() {
            super(Setter.withGroupKey(HystrixCommandGroupKey.Factory.asKey("GetWeather"))
                            .andThreadPoolPropertiesDefaults(HystrixThreadPoolProperties.Setter().withCoreSize(100))
                            .andCommandPropertiesDefaults(HystrixCommandProperties.Setter().withCircuitBreakerErrorThresholdPercentage(5)
                            .withCircuitBreakerSleepWindowInMilliseconds(2000)
                            )
            );
        }

        @Override
        protected String run() throws Exception {
            // invoke 3rd party service
            final String uri = CONTEXT_ROOT +"getweather/"+this.cityName;
            Client client = Client.create();
            WebResource webResource = client.resource(uri);
            ClientResponse response = webResource.accept("application/json").get(ClientResponse.class);

            if (response.getStatus() != 200) {
                LOGGER.info("Error {}", response.getStatus());
            }
            return response.getEntity(String.class);
        }
        // static fall back
        @Override
        protected String getFallback() {
            return "Fall Back for getWeather";
        }
    };
}
Note that, here is two type of implementation, one when we inherited from Hystrix command, another one when we are using inner Hystrix command. Full source code you will found here in github.com.
Now lets run the simple rest service by the following command:
mvn jetty:run
if everything goes well, your service should be available in url http://localhost:8080/test/rest/serviceimpl/getweather/moscow
Now lets run the hystrix service facade service
mvn jetty:run -Djetty.http.port=9093
Service facade will be available in port 9093 and you can reach it by http://localhost:9093/test/rest/servicefacade/getweather/moscow
Also i have configured HystrixStreamServlet for sending matrices to Hystrix Dashboard.
<servlet>
    <description></description>
    <display-name>HystrixMetricsStreamServlet</display-name>
    <servlet-name>HystrixMetricsStreamServlet</servlet-name>
    <servlet-class>com.netflix.hystrix.contrib.metrics.eventstream.HystrixMetricsStreamServlet</servlet-class>
</servlet>

<servlet-mapping>
    <servlet-name>HystrixMetricsStreamServlet</servlet-name>
    <url-pattern>/hystrix.stream</url-pattern>
</servlet-mapping>
StreamServlet will available in http://localhost:9093/test/hystrix.stream
Also you can clone the Hystrix dash board from github and follow the instruction to build and run the dashboard. In my case dashboard is available in http://localhost:7979/hystrix-dashboard.
Now we can start sending massive requests to check how it works. For these you can use any Curl command or using my com.blu.reactive.JersyClient class to start sending request. For demonstration purpose of network failure i have add following randomly failure code to the getCityByIp method as follows:
package com.blu.rest;

import javax.ws.rs.GET;
import javax.ws.rs.Path;
import javax.ws.rs.PathParam;
import java.util.HashMap;
import java.util.Map;

@Path("/serviceimpl")
public class ServicesImpl implements IServices {
    private static final Map WEATHERS= new HashMap(){{
        put("moscow", "29");
        put("petersburg", "21");
        put("berlin", "31");
        put("bercelona", "33");
    }};
    private static final Map CITIES = new HashMap(){{
        put("192.168.121.11","moscow");
        put("192.168.124.21","petersburg");
        put("192.138.111.31","berlin");
        put("192.168.132.61","bercelona");
    }};

    @GET
    @Path("/getweather/{param}")
    @Override
    public String getWeather(@PathParam("param") String cityName) {
        return WEATHERS.get(cityName);
    }
    @GET
    @Path("/getcitybyip/{param}")
    @Override
    public String getCityByIp(@PathParam("param")String ipAddress) {
        // Randomly failure
        final long ctime = System.currentTimeMillis();

        if(ctime % 20 == 0){
            throw new RuntimeException("Randomly failure!!");
        }
        if(ctime % 30 == 0){
            try {
                Thread.sleep(7000);
            } catch (InterruptedException e) {
                // do nothing
            }
        }
        return CITIES.get(ipAddress);
    }
}
After running the JersyClient class, you can observed the Hystrix Dashboard and monitor the services health.
When there is no service failure, circuit is closed and delegated to the 3rd party service as follows:
When randomly failure occurs or you stop the 3rd party service, Hystrix discovered the network failure and circuit is open, then no service requests delegates to the 3rd party services and hystrix service facade returns the fallback. After 2 second (it configurable by withCircuitBreakerSleepWindowInMilliseconds(2000)) hystrix passes a few requests and if the service keep alive it continue to passed all the request. Here is the illustration when circuit is open:

Certainly you can control how much service failure will cause circuit open, in our case it's 5% of all request. In Hystrix framework there are a few many interesting capabilities such as bulkhead, timeout control, but for today it's enough. Happy Weekend.

Microservices - tools and implementations

UP1: If you are planning to migrate to Microservice, we recommend you to read the book "High performance in-memory computing with Apache Ignite".

One of the challenging thing in Microservices world is not the implementation of the services, rather it's monitoring and the management. In the time of writing the blog, we have a few frameworks and tools to implements microservices such as Dropwizard, Spring boot and vertx. Complexity grows when you have a lot of independent micro services deployed and running over cloud infrastructure. It really a pain full task to monitor and manage all the services through 24*7. However Ansible, puppet, docker and logstash tools can help you to build such a management platform for micro services but they are not always sufficient. To solve the above described problem Jboss project release such a management tool for micro services named fabric8. Fabric8 provides following possibilities such as management, continuous delivery and simple integration platform based on Apache Camel project. This open source project based on google kubernetes and openshiftV3. Every application deployed in fabric8 is independent and running in separate java container, which you can stop, start or restart. Any moment you can increase or decrease the number of instance of any application in fabric8 with a single mouse click. In this blog i am going to quick start with fabric8, install and run in a single machine and will try to deploy one simple application from it's quick start example.
For installation of fabric8 you can follow the getting started guide, i have choose the Vagrant way. If you already familiar with Vagrant and docker then installation process will be easy for you. If everything goes fine with your installation you should have the following page.
Now you can play with some of the pre installed application, for example quickstart-rest. If any of them is not running, you can check application logs. If you will got following errors in web console or in any application log:
index.docker.io: no such host
add the following name server in you /etc/resolv.conf file and restart the docker instance
nameserver 8.8.8.8
nameserver 8.8.4.4
sudo systemctl restart docker
Now we can clone the quick start project from the git hub and try to deploy any of the example.
The example is well documented but for the first time it may not works.
Build the docker image
mvn package docker:build
if you will get error something like these
index.docker.io: no such host, you should add the following IP address in you /etc/hosts host machine.
52.0.31.125 index.docker.io
52.0.31.125 registry-1.docker.io
If you are using vagrant image in an single machine, do not need to push the image in the docker hub, rather than you can apply kubernetes configuration to the installed platform.
mvn fabric8:apply
Above command will deploy the given application with meta data to local fabric8.
Now you can increase the instance of the application by clicking the pod column and try to invoke the servlet by the following URL:
http://quickstart-camelservlet.vagrant.f8/camel/hello?name=shamim
the result should be as follows:
From here you can develop your application with maven archetypes provides by fabric8 team and enjoy your micro services.

Monday

Microservices – under the hood

Last couple of months, most of the conferences and seminar discussing and debuting about the new architecture design named microservice. From the first glance micro service means, developing independent small service which will be high cohesive in functionality and could be changed rapidly. Some of the IT evangelist already starts debating, why micro service is a buzzword. From my point of view word microservice misleading itself. Design model under this concept is as follows:
1. Service should be loosely coupled.
2. High cohesion in functionality.
3. Service should be change with minimal effect on other services.
4. Automated in deployments.
5. Scaling out easily.
6. Can be use polyglot programming language and polyglot persistence store.
Most of these requirements are nothing new at all. Developers or IT Architect who already familiar with SOA already know all these above concept very well. From my point of view, we have to clear following three things to giving start coding on micro services:
1. Which benefits we can get from micro service and how it’s differ from the standard j2ee development and deployments
2. How micro service differ from SOA.
3. Industry where we can actually propose to use such a new concept.

Benefits of using microservices over monolith system are very obviously. Best explanation you can find from the Martin Fowler article.
A service in microservice should be fine grained, and can be responsible only for one functionality. A service in SOA can responsible for many responsibilities and may be hard to evolve as any monolith application. Most of the other design concept for either architecture almost similar. But we also should remember another 2 most import benefit of SOA architecture design, first it’s given one entry point (or integration point) for all the congregant applications and second we can orchestrate or coordinate service call from SOA based applications. One thing doesn’t clear from the micro service design is that, who will be coordinator for all the micro service calls?

Martin fowler specifies the microservice architecture as follows:

"In short, the microservice architectural style is an approach to developing a single application as a suite of small services, each running in its own process and communicating with lightweight mechanisms, often an HTTP resource API. These services are built around business capabilities and independently deployable by fully automated deployment machinery. There is a bare minimum of centralized management of these services, which may be written in different programming languages and use different data storage technologies."

Context boundary of “minimum of centralized management of these services” is not so clear. May be only SPA (Single Page Application) can have minimum managements of services. However, if we can eliminate all the design patterns of ESB from SOA, both design approach of SOA and microservice will be the same. Definitely many vendor just miss guided the SOA movement in wrong direction with ESB to sell their product. We have never properly guided by any SOA governance, which causes the failure of the most SOA based project. Also SOA never help us to understand how to split monolith service into smaller pieces of service or how big is to big.
If we want to implement microservice we have to understand how to split service and how big is a big service.

Most of the time a lot of new ideas and technologies come from hi Tech Company such as Amazon and Google. NoSQL is one of them for example. Most of all times, these type of high technology related companies need to supply new possibilities to customer as soon as possible. Thus it is very important to find a way to change rapidly in their application and deploy as quickly. Also they are using polyglot programming language and persistence data store for various purposes. But if we taking a closer look to our customers, most of them are banks and telecommunication companies. This industry is very conservative with their own requirements and cultures. You can’t easily drag a new technologies or trend to such companies. From my experiences we cannot convince one bank to use Ansible to automate all the deployment process on more than 34 servers. Maintains department is very happy with manual deployment of all the artifacts in 34 servers. Either, I am sure, if you will propose that we want to provide docker with all our microservice for deployment, they will not very happy.

Then what is the best approach for such industries? Designing application with modules of services, which have very clear context boundary with high cohesion. It’s better to explain with example, suppose we have a few services such as sending SMS, e-mail or any push notification to clients. According to microservice design approach, you have to develop three different services, which will process independently and will maintain separately. It cloud be three different war file or can be spring boot application for examples. This approach giving granularity, but also complexity in the sight from maintaining. If we developing all the above three services in one module and will deploy as a one artifacts, nothing goes wrong. Off course if we will have to make a change only in e-mail service, we will have to redeploy whole module with other services. However one module with three different services is easy to maintain and they are very similar in operations. These types of modules we can deploy in single servlet container in tomcat or in elastic cat as isolated application for fault tolerance. Let me give a example of such application, in 2013-2014 we developed an application for Telecommunication to mobile number portability. From the business functionality we had the following context boundaries:
1. Business process execution in BPM server.
2. Exchanging message during business process execution with external system.
3. Exchanging messages with internal systems such as mobile switch control, delivery SMS, billing e.t.c.
4. One page web portal for operator.
5. Event processing.
Also we have polyglot persistency store, RDBMS and Cassandra Big Table.

Every module is highly cohesion in functionality and clearly separated by their responsibilities. Business processes was dramatically complex with more user interaction, for these reason we decided to use asynchronous interaction with external systems and thoroughly use short circuit pattern to got fault tolerant system. For internal interaction between different modules we uses binary protocol and for interaction with external modules we have http protocol. As a result we have got the following stack
• Spring
• Activiti BPM
• MyBatis
• Active MQ
• IBM MQ
• Oracle 11gR2
• Cassandra
• FlywayDB for incremental update of DB
• Ansible for automation of Deployment
• Nagios for monitoring
System is highly configurable and scaling horizontally. Changes or bug fixes in one modules gives very small effects on others modules. In conclusion I have to repeat what said martin fowler “don't even consider microservices unless you have a system that's too complex to manage as a monolith”.

Sunday

World IT Planet Conference in Saint Petersburg

Yesterday i visited Saint Petersburg university after a long period of time as a speaker. Conference held already a weeks and students came here from all over the country and out side of Russia. Audiences was very nice and ask a lot of technical question mostly about Cassandra and we are utilising it in our projects. Community was very curious about NoSQL and wanted find all the answer of their questions about the new trend. Here is the presentation from slide share net.

Thursday

Presentation for tuning and optimising high load J2EE web application

UP1: If you are interested in in-memory computing for getting High Performance from your system, we recommended you the book "High performance in-memory computing with Apache Ignite".

Today i have provide new presentation for tuning and optimising high load j2EE web application based on developing sberbank portal. Presentation is available here in slide share.

 

Friday

An impatient start with Mahout - Machine learning

One of my friend ask me to develop a course for students in subject "Machine Learning". Course should be very simple to familiar students with machine learning. Main purpose of this course is to explorer the machine learning world to the students and playing over this topics with own their hands. Machine learning is very much matured and a lot of tools and frameworks is available to wet your hands in this topics, however most of the articles or tutorials you could found in the internet will start installing cluster or have to write a bunch of code (even, in site mahout, they are using maven) to start learning. Even more not all students are familiar with hadoop or do not have very powerful notebook to install and run all the components to get test of machine learning. For these reasons i have got the following approach:
  1. Standalone Hadoop
  2. Standalone Mahout 
  3. And a few CSV data files to learn how to works with Predictions

Assume you already have java installed in your work station. If not please refer to Oracle site to download and install java yourself. First we will install standalone hadoop, check the installation and  after that we will install Mahout and try to run some example to understand whats going under the hood. 
  • Install Hadoop and run simple map reduce for test
Hadoop version: 2.6.0 
Download hadoop-2.6.0.tar.gz from apache hadoop download site. In the moment of written the blog version 2.6.0 is stable to use. 
Unarchive the gz file some where in you local disk. Add these following path to your .bash_profile path as follows:
export HADOOP_HOME=/PATH_TO_HADDOP_HOME_DIRECTORY/hadoop-2.6.0
export PATH=$HADOOP_HOME/bin:$PATH
Now lets check the installation of the hadoop. Create one directory in current folder inputwords and copy all the xml files from the hadoop etc installation folder as follows:
$ mkdir inputwords
$ cp $HADOOP_HOME/etc/hadoop/*.xml inputwords/
Now we can run the hadoop standalone map reduce to count all the words found in the xmls
$ hadoop jar $HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.0.jar wordcount inputwords outputwords
You should see a bunch of logs in your console and if everything went fine you should found a few files in folder outputwords (this folder will create in the runtime). Run the following command in you console
$cat ./outputwords/part-r-00000
it should show a lot of words follows by numbers as follows:
use 9
used 22
used. 1
user 40
user. 2
user? 1
users 21
users,wheel". 18
uses 2
using 3
value 19
values 1
version 1
version="1.0" 5
via 1
when 4
where 1
which 5
while 1
who 2
will 7
If you are curious to count more word, download the famous novel of William Shakespeare "The Tragedy of Romeo" from here and run with hadoop wordcount.
  • Download and install apache mahout 
Lets download the apache mahout distributive from here and unarchive the file some where in your local machine. Mahout distributive contains all the libraries and example for running machine learning  on top of hadoop. 
Now we need some data for learning purpose. We can use grouplens data for our purpose, certainly you can generate data for yourself, but i highly recommended data from grouplens. Grouplens organisation collecting social data for research and you can use these data for your purpose. There are a few datasets available in grouplens site such as MovieLens, BookCrosing and e.t.c. For my course we are going to use movielens datasets, because it's formatted and grouped well. Lets download the movielens datasets and unarchive the file somewhere in your filesystems.
First i would like to examine the datasets to get a closer look on the data, which will give us a very good understanding to use the data well. 
After unarchive the ml-data.tar.gz, you should find a list of datasets in your folder.
u.data     -- The full u data set, 100000 ratings by 943 users on 1682 items.
              Each user has rated at least 20 movies.  Users and items are
              numbered consecutively from 1.  The data is randomly
              ordered. This is a tab separated list of 
          user id | item id | rating | timestamp. 
              The time stamps are unix seconds since 1/1/1970 UTC   

u.info     -- The number of users, items, and ratings in the u data set.

u.item     -- Information about the items (movies); this is a tab separated
              list of
              movie id | movie title | release date | video release date |
              IMDb URL | unknown | Action | Adventure | Animation |
              Children's | Comedy | Crime | Documentary | Drama | Fantasy |
              Film-Noir | Horror | Musical | Mystery | Romance | Sci-Fi |
              Thriller | War | Western |
              The last 19 fields are the genres, a 1 indicates the movie
              is of that genre, a 0 indicates it is not; movies can be in
              several genres at once.
              The movie ids are the ones used in the u.data data set.

u.genre    -- A list of the genres.

u.user     -- Demographic information about the users; this is a tab
              separated list of
              user id | age | gender | occupation | zip code
              The user ids are the ones used in the u.data data set.
For getting recommendation we will use u.data. Now it's times to study a few theory about Recommendation and what is for?

  • Recommendation:
Mahout contains a recommender engine—several types of them, in fact, beginning with conventional user-based and item-based recommenders. It includes implementations of several other algorithms as well, but for now we’ll explore a simple user-based recommender. For detail information of the recommender engine please see here.

  • Examine the DataSet:
Lets take a closer look to the file u.data.
196 242 3 881250949
186 302 3 891717742
22 377 1 878887116
244 51 2 880606923
166 346 1 886397596
298 474 4 884182806
115 265 2 881171488
253 465 5 891628467
305 451 3 886324817
6 86 3 883603013
62 257 2 879372434
286 1014 5 879781125
200 222 5 876042340
210 40 3 891035994
224 29 3 888104457
303 785 3 879485318
122 387 5 879270459
194 274 2 879539794
291 1042 4 874834944
234 1184 2 892079237
119 392 4 886176814
167 486 4 892738452
299 144 4 877881320
291 118 2 874833878
308 1 4 887736532
95 546 2 879196566
38 95 5 892430094
102 768 2 883748450
63 277 4 875747401
For example user with id 196 recommended film 242 with 3 preferences. I have import the u.data file in excel and sort by userid as follows:

User 1 rate 3 films [1,2] and user 3 also rate film 2. If we want to find recommendation for user3 by user 1, then it should be the film 1 (Toy Story).
Lets run the recommendation engine of Mahout and examine the result.
$hadoop jar $MAHOUT_HOME/mahout-examples-0.10.0-job.jar org.apache.mahout.cf.taste.hadoop.item.RecommenderJob -s SIMILARITY_COOCCURRENCE --input PATH_TO/ml-100k/u.data --output outputm
You should found the result in the file outputm/part-r-00000. If you check attentively, you should found that recommendation for the user 3 as follows:
3 [137:5.0,248:4.8714285,14:4.8153844,285:4.8153844,845:4.754717,124:4.7089553,319:4.7035174,508:4.7006173,150:4.68,311:4.6615386]
which is differ from that we guess earlier, because recommendation engine also use preferences (rating) from other user.
Lets write down a few bunch of code in java to make sure which films was recommended by Mahout.
package com.blu.mahout;

import org.apache.avro.generic.GenericData;

import java.io.BufferedReader;
import java.io.FileInputStream;
import java.io.IOException;
import java.io.Reader;
import java.nio.channels.Channels;
import java.nio.channels.FileChannel;
import java.nio.charset.Charset;
import java.nio.charset.CharsetDecoder;
import java.nio.charset.CodingErrorAction;
import java.nio.charset.StandardCharsets;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.Paths;
import java.util.ArrayList;
import java.util.Arrays;
import java.util.List;
import java.util.Optional;
import java.util.stream.Collectors;
import java.util.stream.Stream;

import static java.util.stream.Collectors.groupingBy;
import static java.util.stream.Collectors.reducing;
import static java.util.stream.Collectors.toList;

/**
 * Created by shamim on 17/04/15.
 * Source file should be, u.data, u.item and mahout generated generated file
 */
public class PrintRecomendation {
    private static final int INPUT_LENGTH = 6;

    private static List userRecommendedFilms = new ArrayList<>();
    private static List userWithFilmItems = new ArrayList<>();

    public static void main(String[] inputs) throws IOException{
        System.out.println("Print the recommendation and recommended films for given user:");

        if(inputs.length < INPUT_LENGTH){
            System.out.println("USAGES: PrintRecomendation USERID UDATA_FILE_NAME UITEM_FILE_NAME MAHOUT_REC_FILE UDATA_FOLDER UMDATA_FOLDER" +
                    "" + " Example: java -jar mahout-ml-1.0-SNAPSHOT.one-jar.jar 3 u.data u.item part-r-00000 /Users/shamim/Development/workshop/bigdata/hadoop/inputm/ml-100k/ /Users/shamim/Development/workshop/bigdata/hadoop/outputm/"
                    );
            System.exit(0);
        }
        String USER_ID = inputs[0];
        String UDATA_FILE_NAME = inputs[1];
        String UITEM_FILE_NAME = inputs[2];
        String MAHOUT_REC_FILE = inputs[3];
        String UDATA_FOLDER = inputs[4];
        String UMDATA_FOLDER = inputs[5];

        // Read UDATA File
        Path pathToUfile = Paths.get(UDATA_FOLDER,UDATA_FILE_NAME);
        List filteredLines = Files.lines(pathToUfile).filter(s -> s.contains(USER_ID)).collect(toList());
        for(String line : filteredLines){
            String[] words =  line.split("\\t");
            if(words != null){
                String userId = words[0];
                if(userId.equalsIgnoreCase(USER_ID)){
                    userRecommendedFilms.add(line);
                }
            }
        }
        List userWithFilmName = new ArrayList();

        CharsetDecoder dec= StandardCharsets.UTF_8.newDecoder()
                .onMalformedInput(CodingErrorAction.IGNORE);
        Path pathToUItem=Paths.get(UDATA_FOLDER, UITEM_FILE_NAME);

        List nonFiltered;

        try(Reader r= Channels.newReader(FileChannel.open(pathToUItem), dec, -1);
            BufferedReader br=new BufferedReader(r)) {
            nonFiltered=br.lines().collect(Collectors.toList());
        }

        Path pathToMahoutFile = Paths.get(UMDATA_FOLDER, MAHOUT_REC_FILE);

        List filteredMLines = Files.lines(pathToMahoutFile).filter(s -> s.contains(USER_ID)).collect(toList());
        String recommendedFilms= "";
        for(String line : filteredMLines){
            String[] splited = line.split("\\t");
            if(splited[0].equalsIgnoreCase(USER_ID)){
                recommendedFilms = splited[1];
                break;
            }
        }

        String[] filmsId = recommendedFilms.split(",");

        for(String filmId : filmsId){
            String[] idWithRating = filmId.split(":");
            String id = idWithRating[0];
            String rating = idWithRating[1];
            for(String filmLine : nonFiltered ){
                String[] items = filmLine.split("\\|");
                if(id.equalsIgnoreCase(items[0])){
                    System.out.println("Film name:" + items[1]);
                }

            }
        }

    }
}
You should found the following output in your console:
Film name:Grosse Pointe Blank (1997)
Film name:Postino, Il (1994)
Film name:Secrets & Lies (1996)
Film name:That Thing You Do! (1996)
Film name:Lone Star (1996)
Film name:Everyone Says I Love You (1996)
Film name:People vs. Larry Flynt, The (1996)
Film name:Swingers (1996)
Film name:Wings of the Dove, The (1997)

Sunday

Configuring stuck connection in IBM WAS 8.5.5 Connection pool

Recently we start getting a few complains from our client related to connection on DataBase from IBM WAS. First action we have taken to take a look on log which we got from the client and discovered these following errors on application logs:
  • Error 404: Database connection problem: IO Error: Got minus one from a read call DSRA0010E: SQL State = 08006, Error Code = 17,002
  • java.sql.SQLException: The back-end resource is currently unavailable. Stuck connections have been detected.
With a quick search on google i have found PMR 34250 004 000 on IBM support sites, which is also effect IBM WAS 8.* version. As soon as we are using third party web portal engine (BackBase) it was travois to figure out the problem, so we decompiled some code to make sure that all the data source connection closing well. After some research i have asked data base statistics and data source configurations from support team of the production. And i was surprised with the data base statistics that all connection on DataBase was full and IBM application server could not get any new connection to complete request. 
On Oracle DataBase, maximum connection was set to 6000 and we have more than 32 application server with Maximum Connection 200. It was a serious mistake, formula for configuring connection pool of IBM cluster is as follows:
Maximum Number of Connection in Node * Quantity of Nodes < Max Connection set to Database
In our case, configuration should be 
200 * 32 < 6000
We send a request to increase the DataBase connection in Oracle to 10 000. But what to do with the stuck connection? I have checked the IBM WAS advanced connection pool properties and noticed that, stuck connection properties are configured at all. 
Lets check, what the Stuck connection is?
A stuck connection is an active connection that is not responding or returning to the connection pool. Stuck connections are controlled by three properties, Stuck time , Stuck threshold and Stuck timer interval.
Stuck time
  • Time for a single active connection to be in use to the backend resource before it is considered to be stuck.
  • For example, stuck time is 120 seconds and if the connection is waiting on database for more than 120 seconds then the connection would be marked as Stuck
Stuck threshold
  • The stuck threshold is the number of connections that need to be considered stuck for the pool to be in stuck mode
  • For example, if the threshold is 10 and after 10 connections are considered stuck , whole pool for that datasource is considered Stuck
Stuck Timer Interval
  • Interval at which , how often the connection pool checks for stuck connections
With the above information i have configured the following Stuck connections properties:
With the above configuration, when the connection pool will be declared as stuck? 
Stuck timer interval : 120 secs
Stuck time : 240 secs
Stuck threshold : 100 connections (maximum connection 200)
What happens when pool is declared stuck ?
  • A resource exception is given to all new connection requests until the pool is unstuck.
  • An application can explicitly catch this exception and continue processing.
  • If the number of stuck connections drops below the stuck threshold, the pool will detect this during its periodic checks and enable the pool to begin servicing requests again
Also it is very useful to check inactive connection periodically in Oracle Database, if some connection is hang and inactive you can drop this connection manually.
Here is a pseduo query to find inactive connection in DB

SELECT
 s.username, 
s.status,
S.sid || ',' || S.serial# p_sid_serial
from v$session s, v$sort_usage T, dba_tablespaces TBS
where
(s.last_call_et / 60) > 1440
AND T.tablespace = TBS.tablespace_name
and T.tablespace = 'TEMP';
Hope the above information will help somebody to quick fix in IBM WAS. 

Monday

Open sources alternatives for low budget projects

In your entire software development, sooner or later you will got a few project with very low budget, where you can't use commercial software because the budget is low and in the long run company also want some profit completing the project. From the begging of the year i have done a few pre-sale for such projects and decided to write down a list of all open source alternatives against commercial product. One thing i have to clear that, i have no religious view over open sources software or vica verse. There a plenty of reasons to use most of the commercial product but most of all time we have to cut our coat according to our cloths.
1) BPM:
Most of all vendor like Oracle and IBM have their finest product in BPM, such as Oracle BPM Server, IBM Business process manager. Also you can find a few very good open source product such as JBPM from Jboss and Bonitasoft. But their are another very good open source BPM engine you can try, it's Activity. Spring based (state machine) light weight engine that you can use in standalone or in web application. We develop our mobile number portability project based on this BPM engine and it's works more than a year. Here you can find a few screen shot to check how it cloud looks like.
2) ESB:
I am a real fun of Oracle OSB, completed more than two project successfully on this platform. It's has all the functionality you need for enterprise service bus or integration with another system. If you are looking for open source ESB, the first option should be glassfish ESB and mule. WSo2 is also a very good candidate for choose.
3) Business Rules:
Best business rules software i have ever used was IBM ILOG Jrules. It's consistence, reliable and synchronization capabilities with user code base. There are not so much open source alternatives in this sector, Drools from jboss is one of the good candidate. Certainly Drools contains a lot of bugs with it's functionality but you should make a try on this.
4) Messenging server: IBM MQ series is one of the best messenging server ever and most of banking sector and telecommunication company using this product with great successfully. This is unbreakable software with all the messenging functionality such as transition queue. Most of the company already have the unlimited licenses for using this software, however unfortunately if you need some open source version, you have a very wide choose. Active MQ, Apollo or RabbitMQ can be you choose. Here you can found the complete list of the mq servers. When i am looking for any open source MQ server, i have the following requisites to this product:
- It should be fast
- it's have to persist messages in disk
- it should be work on cluster with failover capabilities
5) LDAP: Off course Microsoft Active directory is one of the best candidate in LDAP, Microsoft AD has their own LDAP implement ion. If you have to use NTLM or kerberos, first choose should be Microsoft AD. As a opensource openldap is best option.
6) Application server:
Most of all time i am working with Oracle Web logic server and IBM WAS. Be honest they are the best in application server. In this sector there are a lot of candidate you will find in open source. But whenever i have to choose open source version, i always preferred Glassfish application server, because it's reliable. Off course you have a much more option like jboss, tomcat, jetty e.t.c.
7) Database: If you have unlimited licenses to Oracle, never thought about any other vendor or product for DB. Oracle DB is reliable and consistency for a long time. Whenever you have to choose for open source RDBMS version, you should try postgres and mySQL.
8) In memory Datagrid: Oracle coherence is the best commercial software for implementing In memory Datagrid. Company hazelcast also have open source version of in memory data grid. With hazelcast you can easily use distributed queue, map and list and much more. From the beginning of Hazelcast, i used it for hibernate l2 cache.
9) Distributive cache: If you are fun of Infinispan for some reason, you should make a try to ehcahe or jboss cache. With jgroups configuration ehcache can be use as distributive cache. Ehcache capable to configure heap with many other ways and contains a lot of algorithms such as LRU.
10) web server and load balancer: Here we have also a few options against commercial Alteon load balancer. For security reasons most of all banks and telecommuncation company using Alteon. If this is not for your reason, you must try nginx and varnish. Nginx can use not only web server but also load balancer.

Above all information is my personal opinion and from my personal expertise. May be it can be differs from many of us. If there are more options please don't hesitated to add in comment.

Tuesday

Continuous Integration (CI), A review

A few years ago (2011) in Java One Conference in Moscow, i participated with presentation about CI. During this time a lot of changes has been made with this fields. By the years many tools, plugins and frameworks has been released to help devOps to solve problems with CI. Now CI is one of the vital part of the development life cycle. With the aggressive use of cloud infrastructure and horizontal scaling of every application, now most of all application deployed in a lot of server (virtuals and dedicated). Moreover, most of the systems are heterogeneous and always need extra care (scripts) to successfully deploy the entire system. Most of the time development environment is very different from the production environment. Here is the common workflow from the development stage to production
DEV environment -> Test Environment -> UAT environment -> Production environment.
Every environments has their own characteristics, configurations. For example, most of the developers use jetty or embedded Tomcat application servers to fast development but in production environment often meet IBM or WebLogic application servers. Deployment process in jetty or IBM is very different, also In production environment frequently uses DR (disaster recovery). Workflow of the deployment process in Production environment are as follows:
1) Stop part of the application servers
2) Replicate session from the stopped servers
3) Update database with incremental scripts
4) Update new artifacts in application servers
5) Update configuration files
6) Start application servers

Their are a lot of tools in open sources to achieve the above workflow such as:
1) Puppet
2) Chef
3) Ansible e.t.c

Ansible is one of easiest and simplest tools to install, deploy and prepare environments. We have following DevOps tools in our portfolio:
1) Jenkins
2) Flyway DB
3) Ansible

A few words about flyway DB, its database migration tools to do incremental update of database objects. Supports ANSI native SQL scripts for any DB. For me it's very suitable to debug or review any sql scripts.
Ansible is a simple IT automation platform to deploy through SSH. Very easy to install and configure, working through ssh with no agent install in remote system. Ansible has very big community and a lot of plugin already developed for using in automation. With this three tools we have the following approach:

Jenkins for build project
Flyway to data base migration
Ansible for deploy application in several environments and build installation package in production environments. Most of the time in meetup or conference, i got the question how we manages and rendering different configuration files for different systems such as DEV, UAT. We uses very simple approach to solve the problem through templating. For every configuration we have some kind of template as follows:
# MQ Configuration
mq.port=@mq.port@
mq.host=@mq.host@
mq.channel=@mq.channel@
mq.queue.manager=@mq.queue.manager@
mq.ccsid=@mq.ccsid@
mq.user=@mq.user@
mq.password=@mq.password@
mq.pool.size=@mq.pool.size@
and for every environments we have defined values in xml file. For example for DEV environment we have dev.xml, for UAT environment we have uat.xml. Every xml files contains all the values such as
<property name="mq.gf.to.queue" value="MNP2GF"/>
<property name="mq.gf.from.queue" value="GF2MNP"/>
<property name="mq.port" value="1234"/>
<property name="mq.host" value="192.168.157.227"/>
<property name="mq.channel" value="SYSTEM.DEF.SVRCONN"/>
<property name="mq.queue.manager" value="venus.queue.manager"/>
<property name="mq.ccsid" value="866"/>
<property name="mq.user" value="mqm"/>
<property name="mq.password" value="mqm01"/>
<property name="mq.pool.size" value="10"/>
<property name="mq.pool.size" value="10"/>

Every time after successful build, jenkins runs one simple python script which generates all the configuration files based on template. Such way we can deploy application in different environments and building distributive package.

Tuning and optimization J2EE web application for HighLoad

Last few months we are developing a portal for 3rd largest bank in Europe. Unique visitor of the bank grows more than 1 million visitor in a day. The main non functional requirements of the project is the high availability of the portal and giving high through output. One of the main feature of the portal is to giving user to customize their pages with widgets and provide different services for targeted auditory. After a long discussion and analysis, bank decided to use java based engine to build up the portal and we have got the following stack:
1) Java 1.7_47
2) IBM WEBSphere 8.5 as Application server
3) Nginx as web server
4) Alteon as load balancer
5) Oracle 11gR2 as DataBase
6) SOLR for content search
Main challenge for us to supported legacy browser such as IE8, opera 12 e.t.c and one portal for all device (desktop, smart phone and tablet pc). Java based portal engine generated a lot of java script which didn't give us very good performance. For these above reasons we decided to use hybrid method of page rendering (server side (jsp) + client side (java script)) and rest service for business functionality. We eliminate of implementing any business logic in DataBase because, RDBMS is not suitable for scaling and minimize the network roundtrip. Here is our main design decision:
1) Implementing business logic through Rest service in application server
2) Serving all the static content from web server
3) Cashing is as much as possible in every layer
4) Hybrid method of page rendering (server side (jsp) + client side (java script))
Now it's the time for describe briefly what we have done in every layer, most of all steps are well known and i would like to summarize it in one place:

1) Web server optimization (nginx):
- Gzip compression level 6 for xml, json, css, html e.t.c
- Cache control Http header for 3 days
- Cache control for java script
- Etag
2) Client side optimization:
- Minify java script and CSS
- Minimize http request from browser to server. At the beginning we have more than 150 http request from browser to server. It should be remember that modern browser can make 7-8 request at a time to one domain
- Optimize every images (lossless)
- Using CSS sprite
- Aggregate CSS and JS in few files
- Minify Html
3) Server side (backend) optimization:
- Caching every rest response
- using distributive EhCache
- Hibernate + MyBatis second level Cache
- Optimize Connection Pool for database in IBM WAS
- Optimize heap size and GC policy for JVM in IBM WAS
- Optimize thread pool size in IBM WAS
- Optimize session management for IBM WAS
- Scheduler to drop long running and hanged SQL connection from IBM WAS (There is a bug in IBM WAS with connection pool)
4) Database optimization
- Using result cache for Data dictionary
- Move Data dictionary to Oracle KEEP POOL

If you like this article, you would also like the book

Thursday

Book Review: Cassandra High Availability

Offtopic: My first book "High performance in-memory computing with Apache Ignite" has been released and available at http://leanpub.com/ignite.

This post is my review of the Packt Publishing book Cassandra High Availability by Robbie Strickland. As the main title suggest. The book has almost 186 pages covering 9 chapters and must read for Cassandra users.

Chapter 1: Cassandra's approach to High Availability
The initial chapter of Cassandra High Availability cover some architectural design such as Monolithic Architecture, Master slave Architecture and the Cassandra's approach to achieve high availability. Most of all modern software system requires a non functional requirements such as High availability. In this chapter author briefly describe why RDBMS is the single point of failure and is not suitable for horizontal scaling, also the drawback of the Master Slave architecture. In the next few section Author describe the Cassandra's architecture and it's approach to solve the problem with high availability, also introduce Cassandra's replication mechanism which make Cassandra DB more unique than other NoSQL DB.
Chapter 2: Data Distribution
This chapter introduce Cassandra's Data distribution strategy such as consistency hashing, tokens, vnodes and various practitioner. Author briefly describe Cassandra token arrangements such as manually assigned tokens and Vnodes, even more how vnodes can improve availability, proc and cons of all the strategy.
Section partitioning discuss all the three options and difference between them, also point out when to use Murmur3Partiotion and the benefit of the ByteOrderedPartitioner. Understanding of these fundamentals will help readers to scale cluster effectively.
Chapter 3: Replication
Chapter provide fundamental concepts of replication and consistency of Cassandra cluster. Author demonstrate relation between replication and various consistency level and how they impact on Cassandra performance and data consistency. From my point of view very interesting part is the ,which can be very useful to make decision for deploy very wide range of cluster.
Chapter 4: DataCenter
Datacenter or DC one of the basic features for Cassandra high availability. Chapter provides all the possible use cases to use multiple data centers. One on the most important part of this chapter is the replication across data centers, where author describe the consistency in a multiple data centers. Also chapter covers how to setup multiple data centers and analysis data through Hadoop and spark.
Chapter 5: Scaling out
Chapter focused on scaling of Cassandra cluster, provide information for hardware sizing. Chapter base on all the valuable information of how to setup cluster, add nodes and decommissioning of node on the cluster. All of the administrative information's is very useful for any Cassandra administrator.
Chapter 6: Java Client
Mainly this chapter dedicated for developer and author provides all the necessary information to develop high available application based on Cassandra. Most of the code snippet based on Data stax native java client and graceful show how to connect to cluster, using asynchronous request and using failover policy. Chapter shows most of all the wealthy functionality of native client. For the Cassandra beginners current chapter might be the quick start point.
Chapter 7: Modeling for high availability
Chapter bases on the Cassandra Data model, author successfully describe the Cassandra's low level storage model. Author provides information every level of compaction and how compaction works under the hood. Even more, explain when and how to use different level of compactation to achieve desired goal. End of the chapter demonstrate how to model time series data and how to work with geospatial data.
Chapter 8: Anitpaterns
To be hones, it's best chapter of this book, which shows why we should not use secondary index and why we shouldn't use Cassandra as a distributed queue. Chapter demonstrate how tomstoms can effect on Cassandra read performace and how to get rid out of it. This chapter can give readers a foundation that will allow to make correct data modeling decision.
Chapter 9: Falling gracefully
This chapter mainly aim to the Cassandra administration and give well defined information for monitoring and management Cassandra cluster. Chapter clearly define all the possible way to monitoring Cassandra, such as through JMX , nodetools or third party tools. End of the part describes backup and restore process of Cassandra which will be very useful for Cassandra administrator.