Sunday

Apache Ignite with JPA : A missing element

In this article, author of the book "High Performance in-memory computing with Apache Ignite", discussing the use of JPA (Hibernate OGM) with Apache Ignite. Part of this article taken from the Persistence chapter of the book.

UP1: The article also published in DZONE.
Often the first step to developing an Enterprise information system is creating the domain model, that is, listing the entities in the domain and defining the relationships between them. A domain model is a conceptual image of the problem your system is trying to solve. Domain model elements can be linked by relationships. Usually, relational objects are represented in a tabular format, while application object model are represented in an interconnected graph of the object format. While storing and retrieving an object model from the relational database, a few mismatch occurs such as Granularity, SubTypes etc. To solve the mismatches between relational and object model, JPA provides a collection of APIs and methods to manipulates with the persistence store. The JPA specification only defines relational database access, but its API and many of its annotations are not relational specific. There are a few factors we have to take into account before applying JPA into any NoSQL database:

1. Pure relational concepts may not apply well in NoSQL
  • a. Table, Column, Joins.
2. JPA queries may not be suitable for NoSQL. NoSQL data modeling is typically driven by application-specific access patterns.

Note that, if your dataset is by nature non-domain model centric, then JPA is not for you.

Anyway, Apache Ignite provides in-memory KeyValue store and it’s quite well fit for using JPA. Other NoSQL vendor like Infinspan, Oracle NoSQL, Ehcache also supported by JPA persistence as well. There are a few NoSQL/JPA solutions available in today's market.
• Kundera
  • o One of the very first JPA implementations for NoSQL databases.
  • o Supports Cassandra, MongoDB, Hbase, Redis, Oracle NoSQL DB etc.
• DataNucleus
  • o Persistence layer behind Google App engine
  • o Supports MongoDB, Cassandra, Neo4J
• Hibernate OGM
  • o Using Hibernate ORM engine to persists entities in NoSQL database.
  • o Supports MongoDB, Cassandra, Neo4j, Infinspan, Ehcache
  • o Experipental support for Apache Ignite.
Hibernate OGM talks to NoSQL database via store-specific dialects. Hibernate OGM or Hibernate Object Grid Mapper also supports several ways for searching entities and returning them as Hibernate managed objects:

1. JP-QL queries (we convert them into a native backend query)
2. datastore specific native queries
3. full-text queries, using Hibernate Search as indexing engine

So, for Apache Ignite we are going to give a try to use JPA by Hibernate OGM framework. Note that, Apache Ignite support by Hibernate OGM still in development stage and not recommended to use in production. The project is available at Github repositories and any contributions are welcome. Anyway, you can also contribute to code review of this project with others by this URL. High-level view of the Hibernate OGM are shown below:

In the next few section we will cover the following topics:

• Clone and build the module Ignite for Hibernate OGM framework.
• Create a new maven project for using Hibernate OGM with Ignite.
• Persisting a few entities in Ignite through JPA.

Before we start, make sure the prerequisites of the project in your workstation:
1. Java JDK 1.8
2. Ignite version 1.7.0
3. Apache Maven version >3.0.3

Step 1:
Let’s set up our sandbox first. Clone or download the Hibernate OGM framework source code from the GitHub repositories.

git clone git@github.com:Z-z-z-z/hibernate-ogm.git hibernate-ogm

Step 2:

Modify the pom.xml, comment the following modules as follows:
<module>infinispan</module> 
<module>infinispan-remote</module> 
<module>mongodb</module>
<module>neo4j</module>
<module>couchdb</module>
<module>cassandra</module>
<module>redis</module>
We donot need these above modules in our project. Make sure that, you have the ignite module on pom.xml file.

Step 3:
Build the project with the following command:
mvn clean install -Dmaven.test.skip=true -DskipDocs -DskipDistro
If everything goes well, you should have all the necessary libraries in your local maven repositories.

Step 4:
Clone or download the ignite-jpa repository from the GitHub. If you create your own maven project, add this dependencies of your pom.xml.
<dependency>
    <groupId>org.hibernate.ogm</groupId>
    <artifactId>hibernate-ogm-ignite</artifactId>
    <version>5.1.0-SNAPSHOT</version>
</dependency>
<dependency>
    <groupId>org.hibernate.javax.persistence</groupId>
    <artifactId>hibernate-jpa-2.1-api</artifactId>
    <version>1.0.0.Final</version>
</dependency>
The dependencies are:
1. The hibernate OGM Ignite module for working with Apache Ignite cache. This will pull in all other required modules such as Hibernate OGM core.
2. Hibernate JPA API to working with JPA.

The domain model:
Our example domain model is consisted of two entities: Breed and Dog.

The association between Breed and Dog is a ManyToOne. One Dog can have only one breed and so on.
Step 5:
Now let’s map the domain model by creating the entity Java classes and annotating them with the required meta information. Let’s strat with the Breed class.

@Entity(name = "BREED")
public class Breed {
    @Id
    @GeneratedValue(generator = "uuid")
    @GenericGenerator(name="uuid", strategy="uuid2")
    private String id;

    private String name;

    public String getId() { return id; }
    public void setId(String id) { this.id = id; }

    public String getName() { return name; }
    public void setName(String name) { this.name = name; }

    @Override
    public String toString() {
        return "Breed{" +
                "id='" + id + '\'' +
                ", name='" + name + '\'' +
                '}';
    }
}

The entity is marked as a JPA annotation of @Entity, while other properties such as ID are annoted by the @ID.
By the @ID annotation, Hibernate will take care to generate the primary key or the key value for the entity object. @GeneratedValue UUID will generate a UUID value as a entity identifier.
Create another class named Dog and add the following contents on it.

@Entity
public class Dog {
    @Id
    @GeneratedValue(strategy = GenerationType.TABLE, generator = "dog")
    public Long getId() { return id; }
    public void setId(Long id) { this.id = id; }
    private Long id;

    public String getName() { return name; }
    public void setName(String name) { this.name = name; }
    private String name;

    @ManyToOne
    public Breed getBreed() { return breed; }
    public void setBreed(Breed breed) { this.breed = breed; }
    private Breed breed;

    @Override
    public String toString() {
        return "Dog{" +
                "id=" + id +
                ", name='" + name + '\'' +
                ", breed=" + breed +
                '}';
    }
}

We also annotated the Dog entity with @Entity and @ID annotation. Also we add one @ManyToOne annotation to make the association with Breed entity.

Step 6:
Let’s create the cache configuration class and the persistence.xml. Create a Ignite cache configuration class with name ConfigurationMaker as follows:
public class ConfigurationMaker implements IgniteConfigurationBuilder {
    @Override
    public IgniteConfiguration build() {
        IgniteConfiguration config = new IgniteConfiguration();
        config.setPeerClassLoadingEnabled(true);
        config.setClientMode(false);
        TcpDiscoverySpi discoSpi = new TcpDiscoverySpi();
        TcpDiscoveryMulticastIpFinder ipFinder = new TcpDiscoveryMulticastIpFinder();
        ArrayList addrs = new ArrayList<>();
        addrs.add("127.0.0.1:47500..47509");
        ipFinder.setAddresses(addrs);
        discoSpi.setIpFinder(ipFinder);
        config.setDiscoverySpi(discoSpi);

        CacheConfiguration accountCacheCfg = new CacheConfiguration()
                .setName("BREED")
                .setAtomicityMode(TRANSACTIONAL)
                .setIndexedTypes(
                        String.class, Breed.class
                );

        config.setCacheConfiguration(accountCacheCfg);
        return config;
    }
}

The above class represented the Ignite Cache configuration, instead of using spring configuration. We have explained the cache configurarion in chapter one. Let’s create the persistence.xml file in /ignite-jpa/src/main/resources/META-INF/persistence.xml directory.

<?xml version="1.0"?>
<persistence xmlns="http://java.sun.com/xml/ns/persistence"
             xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
             xsi:schemaLocation="http://java.sun.com/xml/ns/persistence http://java.sun.com/xml/ns/persistence/persistence_2_0.xsd"
             version="2.0">

    <persistence-unit name="ogm-jpa-tutorial" transaction-type="RESOURCE_LOCAL">
        <provider>org.hibernate.ogm.jpa.HibernateOgmPersistence</provider>
        <properties>

            <property name="com.arjuna.ats.jta.jtaTMImplementation" value="com.arjuna.ats.internal.jta.transaction.arjunacore.TransactionManagerImple"/>
            <property name="com.arjuna.ats.jta.jtaUTImplementation" value="com.arjuna.ats.internal.jta.transaction.arjunacore.UserTransactionImple"/>

            <property name="hibernate.ogm.datastore.provider" value="IGNITE_EXPERIMENTAL"/>
            <property name="hibernate.ogm.ignite.configuration_class_name" value="com.blu.imdg.exampleOgm.ConfigurationMaker"/>
        </properties>
    </persistence-unit>
</persistence>

If you have familiar with JPA before, this persistence definition unit should look very common to you. The main difference to using the classic Hibernate ORM on top of a relational database is the specific provider class we need to specify for Hibernate OGM: org.hibernate.ogm.jpa.HibernateOgmPersistence. Also Note that, we are using RESOURCE_LOCAL instead of JTA. If you want to use JTA, you should have provided a particular JTA implementation such as JBOSS. In addition, we have also specified the following configurations:

• DataStore provide: IGNITE_EXPERIMENTAL
• Configuration_class_name : Ignite configuration (ConfigurationMaker)

Step 7:
Let’s now persist a set of entities and retrieve them. Create a class with name TestOgm with following the following content:
public class TestOgm {
    public static void main(String[] args) throws SystemException, NotSupportedException, HeuristicRollbackException, HeuristicMixedException, RollbackException {
        EntityManagerFactory emf = Persistence.createEntityManagerFactory("ogm-jpa-tutorial");

        EntityManager em =  emf.createEntityManager();
        em.getTransaction().begin();

        Breed collie = new Breed();
        collie.setName("breed-collie");
        em.persist(collie);

        Dog dina = new Dog();
        dina.setName("dina");
        dina.setBreed(collie);
        //persis dina
        em.persist(dina);
        em.getTransaction().commit();
        //get ID dina
        Long dinaId = dina.getId();
        // query
        Dog ourDina =  em.find(Dog.class, dinaId);
        System.out.println("Dina:" + ourDina);

        em.close();

    }

    private static TransactionManager extractJBossTransactionManager(EntityManagerFactory factory) {
        SessionFactoryImplementor sessionFactory = (SessionFactoryImplementor) ( (HibernateEntityManagerFactory) factory ).getSessionFactory();
        return sessionFactory.getServiceRegistry().getService( JtaPlatform.class ).retrieveTransactionManager();
    }
}
First we have created a EntityManagerFactory with parameter “ogm-jpa-tutorial”. Next, we derived our EntityManager from the factory, this EntityManager will be our entry point for persistence entities. We opened a transaction from the EntityManager and create our Breed with name breed-collie. Persist the breed-collie with the entityManager persist() method. Also created an another instance of Dog: dina and associated with it breed-collie. Next we persist the dog dina in cache with the same method persist() and retrieve the instance by the find() method.
Step 8:
Let’s build and run the application. Before run the class TestOgm, we have to run an instance of the Ignite node. Run the following command to start an instance of Ignite node.

mvn exec:java -Dexec.mainClass=com.blu.imdg.StartCacheNode
Now run the following command to execute the TestOgm class as follows:
mvn exec:java -Dexec.mainClass=com.blu.imdg. exampleOgm.TestOgm
You should notice a lot of log into console, with the following entries:
Two entries have been flushed into the Ignite cache and retrieve the dog Dina from the cache. Let’s explorer the cache through Ignite Visor.
Two different cache has been created for the entities: Breed and Dog. If we scan the cache entries of the Dog cache, we should find the following entity on it.

Entity Dina has been persisted into the cache with the key of the Breed collie. Unfortunately, Hibernate HQL or Search is not working in this experimental version of this Hibernate OGM Ignite module. All the hibernate features are under development and will be supported soon.

You can learn even more from book

Friday

Black Friday discount on High performance in-memory computing with Apache Ignite

Black Friday discount on High-performance in-memory computing with Apache Ignite. 40% discount https://leanpub.com/ignite/c/MKU2EsYCj6nN


Wednesday

Tip: SQL client for Apache Ignite cache

In this article, author of the book "High Performance in-memory computing with Apache Ignite", will share a tip to using SQL client for query Apache Ignite cache. Part of this article taken from the Installation and the first Ignite application chapter of the book.

Apache Ignite provides SQL queries execution on the caches, SQL syntax is an ANSI-99 compliant. Therefore, you can execute SQL queries against any caches from any SQL client which supports JDBC thin client. This section is for those, who feels comfortable with SQL rather than execute a bunch of code to retrieve data from the cache. Apache Ignite out of the box shipped with JDBC driver that allows you to connect to Ignite caches and retrieve distributed data from the cache using standard SQL queries. Rest of the section of this chapter will describe how to connect SQL IDE (Integrated Development Environment) to Ignite cache and executes some SQL queries to play with the data. SQL IDE or SQL editor can simplify the development process and allow you to get productive much quicker.

Most database vendors have their own front-end specially developed IDE for their database. Oracle has SQL developer and Sybase has Interactive SQL so on. Unfortunately, Apache Ignite doesn’t provide any SQL editor to work with Ignite caches, however, GridGain (commercial version of the Apache Ignite) provides a commercial GridGain web console application to connect to Ignite cluster and run SQL analytics on it. As far as I work with a multi-platform database in my daily works, the last couple of years I am using Dbeaver6 to work with different databases. A couple of words about Dbeaver, it’s open-source multi-platform database tool for Developers, Analytics or Database administrators. It supports a huge range of Databases and also let you connect to any Database with JDBC thin client (if the database supports JDBC). Anyway, you can also try SQuirrel SQL client or Jetbrains DataGrip to connect to Ignite cluster, they all supports JDBC.

Note that, Cache updates are not supported by SQL queries, for now, you can only use SELECT queries.

How SQL/Text queries work in Ignite: it’s interesting to know how a query is processing under the hood of the Ignite. There are three main approaches to process SQL/Text query in Ignite:

- In-memory Map-Reduce: If you are executing any SQL query against Partitioned cache, Ignite under the hood split the query into in-memory map queries and a single reduce query. The number of map queries depends on the size of the partitions and number of the partitions of the cluster. Then all the map queries are executed on all data nodes of participating caches, providing results to the reducing node, which will, in turn, run the reduce query over these intermediate results. If you are not familiar with Map-Reduce pattern, you can imagine it as a Java Fork-join process.

- H2 SQL engine: if you are executing SQL queries against Replicated or Local cache, Ignite admit that all the data is available locally and runs a simple local SQL query in the H2 database engine. Note that, in replicated cache, every node contains replica data for other nodes. H2 database is free database written in Java and can work as an embedded mode. Depending on the configuration, every Ignite node can have an embedded h2 SQL engine.

- Lucene engine: in Apache Ignite, each node contains a local Lucene engine that stores the index in memory that reference in local cache data. When any distributed full-text queries are executed, each node performs the search in local index via IndexSearcher and send the result back to the client node, where the result aggregated.

Note that, Ignite cache doesn't contain the Lucene index, instead, Ignite provides an in-memory GridLuceneDirectory directory which is the memory-resident implementation to store the Lucene index in memory. GridLuceneDirectory is very much similar to the Lucene RAMDirectory.

To running SQL queries on caches, we already added a complete Java application (HelloIgniteSpring) in chapter installation. You can run the application by the following command.

java -jar .\target\HelloIgniteSpring-runnable.jar

At this moment, we are not going to details all the concepts of Ignite cache queries here. We will have a detailed look at Ignite SQL queries on chapter four. For now, after running the HelloIgniteSpring application, it always put a few Person objects into cache named testCache. Object Person has attributes like name and age as follows:


Property Name Property Age
1 Shamim 37
2 Mishel 2
3 Scott 55
4 Tiger 5

After completing the configuration of the Dbeaver SQL client, we will run a few SQL queries against the above objects. Now it’s the time to download the Dbeaver and complete the JDBC configuration on it.

Step 1:
Download the Dbeaver Enterprise edition (it’s free but not an open source product) for your
operating system from the following URL:

http://dbeaver.jkiss.org/download/enterprise/

Step 2:
Install the Dbeaver, please refer to the install section of the Dbeaver site, if you will encounter any problems during the installation.

Step 3:
Compile the maven chapter-installation project, if you didn’t do it before.

Step 4:
Run the HelloIgniteSpring application with the following command:

java -jar ./target/HelloIgniteSpring-runnable.jar

You should have the following output in your console:


If you are curious about the code, please refer to the chapter-installation.

Step 5:
Now, let’s configure the JDBC driver for the Dbeaver. Go to Database -> Driver Manager -> New In the Settings section, fill in the requested information as follow:


Add all the libraries shown in the above screenshot. Copy and rename the file ∼/ignite-book- code-samples/chapters/chapter-installation/src/main/resources/default-config.xml into default-config-dbeaver.xml somewhere in your file system. Change the clientMode properties value to true in the default-config-dbeaver.xml file. Add the file path to the URL template as shown in the above screenshot and click ok.

Step 6:
Create a New connection based on the Ignite Driver manager. Go to the Database->New Connection. Select Ignite drive manager from the drop down list and click next. You should have the following screen before you.


Click the Test connection button for a quick test. If everything is done properly, you should have the next screen shot with the success notification.

Click ok and go through all the next step to complete the connection.

Step 7:
Create a new SQL editor and type the following SQL query on Dbeaver.

SELECT name FROM Person;

Step 8:
Run the script by pressing the button command+x and you should have the following result.

The above query returns all the cache objects from the cache testCache. You can also execute the following query:

SELECT name FROM Person p WHERE p.age BETWEEN 30 AND 60;

It should return the result with the following person

Shamim 
Scott

Ignite SQL engine is fully ANSI-99 compliant and let you run any SQL query like analytical or Ad-hoc queries. You can also try to configure Oracle SQL developer or IntellijIdea as a SQL client to work with Apache Ignite.

If you like this article, you would also like the book

Sunday

Complex event processing (CEP) with Apache Storm and Apache Ignite

In this article, author of the book "High Performance in-memory computing with Apache Ignite", will discuss the complex event processing with Apache Strom and the Apache Ignite. Part of this article taken from the complex event processing chapter of the book.

There is no broadly or highly accepted definition of the term Complex Event Processing or CEP. What Complex Event Processing is may be briefly described as the following quote from the Wikipedia:
"Complex Event Processing, or CEP, is primarily an event processing concept that deals with the task of processing multiple events with the goal of identifying the meaningful events within the event cloud. CEP employs techniques such as detection of complex patterns of many events, event correlation and abstraction, event hierarchies, and relationships between events such as causality, membership, and timing, and event-driven processes."

For simplicity, Complex Event Processing (CEP) is a technology for low-latency filtering, aggregating and computing on real-world never ending or streaming event data. The quantity and speed of both raw infrastructure and business events are exponentially growing in IT environments. In addition, the explosion of mobile devices and the ubiquity of high-speed connectivity add to the explosion of mobile data. At the same time, demand for business process agility and execution has only grown. These two trends have put pressure on organizations to increase their capability to support event-driven architecture patterns of implementation. Real-time event processing requires both the infrastructure and the application development environment to execute on event processing requirements. These requirements often include the need to scale from everyday use cases to extremely high velocities or varieties of data and event throughput, potentially with latencies measured in microseconds rather than seconds of response time.

Apache Ignite allows processing continuous never-ending streams of data in scalable and fault-tolerant fashion in in-memory, rather than analyzing data after it's reached the database. Not only does this enable you to correlate relationships and detect meaningful patterns from significantly more data, you can do it faster and much more efficiently. Event history can live in memory for any length of time (critical for long-running event sequences) or be recorded as transactions in a stored database.

Apache Ignite CEP can be used in a wealth of industries area, the following are some first class use cases:

  1. Financial services: the ability to perform real-time risk analysis, monitoring and reporting of financial trading and fraud detection.
  2. Telecommunication: ability to perform real time call detail record and SMS monitoring and DDoS attack.
  3. IT systems and infrastructure: the ability to detect failed or unavailable application or servers in real time.
  4. Logistics: ability to track shipments and order processing in real-time and reports on potential delays on arrival.

There are a few more industrials or functional areas, where you can use Apache Ignite to process streams event data such as Insurance, transportation and Public sector. Complex event processing or CEP contains three main parts of its process:

  1. Event Capture or data ingesting.
  2. Compute or calculation of these data.
  3. Response or action.


As shown in the above figure, data are ingesting from difference sources. Sources can be any sensors (IoT), web application or industry applications. Stream data can be concurrently processed directly on the Ignite cluster in collecting fashion. In addition, data can be enriched from other sources or filter out. After computing the data, computed or aggregated data can be exported to other systems for visualizing or taking an action.

Apache Ignite Storm Streamer module provides streaming via Storm to Ignite cache. Before start using the Ignite streamer lets take a look at the Apache Storm to get a few basics about apache Storm.

Apache storm is a distributed fault-tolerant real-time computing system. In a short time, Apache Storm became a standard for distributed real-time processing system that allows you to process a large amount of data. Apache Storm project is open source and written in Java and Clojure. It became a first choose for real-time analytics. Apache Ignite Storm streamer module provides a convenience way to streaming data via Storm to Ignite cache.

Key concepts:

Apache Storm reads raw stream of data from the one end and passes it through a sequence of small processing units and output the processed information at the other end. Let’s have a detailed look at the main components of Apache Storm –

Tuples – It is the main data structure of the Storm. It’s an ordered list of elements. Generally, tuple supports all primitives data types.


Streams – It’s an unbound and un-ordered sequence of tuples.

Spouts - Source of streams, in simple terms, a spout reads the data from a source for use in topology. A spout can reliable or unreliable. A spout can talk with Queues, Web logs, event data etc.

Bolts - Bolts are logical processing units, it is responsible for processing data and creating new streams. Bolts can perform the operations of filtering, aggregation, joining, interacting with files/database and so on. Bolts receive data from the spout and emit to one or more bolts.

Topology – A topology is a directed graph of Spouts and Bolts, each node of this graph contains the data processing logic (bolts) while connecting edges define the flow of the data (streams).
Unlike Hadoop, Storm keeps the topology running forever until you kill it. A simple topology starts with spouts, emit stream from the sources to bolt for processing data. Apache Storm main job is to run the topology and will run any number of topology at given time.


Ignite out of the box provides an implementation of Storm Bolt (StormStreamer) to streaming the computed data into Ignite cache. On the other hand, you can write down your custom Strom Bolt to ingest stream data into Ignite. To develop a custom Storm Bolt, you just have to implement *BaseBasicBolt* or *IRichBolt* Storm interface. However, if you decide to use StormStreamer, you have to configure a few properties to work the Ignite Bolt correctly. All mandatory properties are shown below:


No Property Name Description
1 CacheName Cache name of the Ignite cache, in which the data will,be store.
2 IgniteTupleField Names the Ignite Tuple field, by which tuple data is,obtained in topology. By default the value is ignite.
3 IgniteConfigFile This property will set the Ignite spring configuration
file. Allows you to send and consume message to and
from Ignite topics.
4 AllowOverwrite It will enabling overwriting existing values in the,cache, default value is false.
5 AutoFlushFrequency Automatic flush frequency in milliseconds. Essentially,,this is the time after which the streamer will make an
attempt to submit all data added so far to remote
nodes. Default is 10 sec.

Now that we have got the basics, let’s build something useful to check how the Ignite StormStreamer works. The basic idea behind the application is to design one topology of spout and bolt that can process a huge amount of data from a traffic log files and trigger an alert when a specific value crosses a predefined threshold. Using a topology, the log file is read line by line and the topology is designed to monitor the incoming data. In our case, the log file will contain data, such as vehicle registration number, speed and the highway name from highway traffic camera. If the vehicle crosses the speed limit (for example 120km/h), Storm topology will send the data to Ignite cache.

Next listing will show a CSV file of the type we are going to use in our example, which contain vehicle data information such as vehicle registration number, the speed at which the vehicle is traveling and the location of the highway.
AB 123, 160, North city
BC 123, 170, South city
CD 234, 40, South city
DE 123, 40, East city
EF 123, 190, South city
GH 123, 150, West city
XY 123, 110, North city
GF 123, 100, South city
PO 234, 140, South city
XX 123, 110, East city
YY 123, 120, South city
ZQ 123, 100, West city

The idea of the above example is taken from the Dr. Dobbs journal. Since this book is not for studying Apache Storm, I am going to keep the example simple as possible. Also, I have added the famous word count example of Storm, which ingests the word count value into Ignite cache through StormStreamer module. If you are curious about the code, it's available at chapter-cep/storm. The above CSV file will be the source for the Storm topology.


As shown in above figure, the FileSourceSpout accepts the input CSV log file, reads the data line by line and emits the data to the SpeedLimitBolt for further threshold processing. Once the processing is done and found any car with exceeding the speed limit, the data is emitted to the Ignite StormStreamer bolt, where it is ingested into the cache. Let's dive into the detailed explanation of our Storm topology.

Step 1:

Because this is a Storm topology, you must add the Storm and the Ignite StormStreamer dependency in the maven project.

<dependency>
  <groupId>org.apache.ignite</groupId>
  <artifactId>ignite-storm</artifactId>
  <version>1.6.0</version>
</dependency>
<dependency>
  <groupId>org.apache.ignite</groupId>
  <artifactId>ignite-core</artifactId>
  <version>1.6.0</version>
</dependency>
<dependency>
  <groupId>org.apache.ignite</groupId>
  <artifactId>ignite-spring</artifactId>
  <version>1.6.0</version>
</dependency>
<dependency>
  <groupId>org.apache.storm</groupId>
  <artifactId>storm-core</artifactId>
  <version>0.10.0</version>
  <exclusions>
  <exclusion>
  <groupId>log4j</groupId>
  <artifactId>log4j</artifactId>
  </exclusion>
  <exclusion>
  <groupId>org.slf4j</groupId>
  <artifactId>slf4j-log4j12</artifactId>
  </exclusion>
  <exclusion>
  <groupId>commons-logging</groupId>
  <artifactId>commons-logging</artifactId>
  </exclusion>
  <exclusion>
  <groupId>org.slf4j</groupId>
  <artifactId>slf4j-simple</artifactId>
  </exclusion>
  <exclusion>
  <groupId>org.slf4j</groupId>
  <artifactId>log4j-over-slf4j</artifactId>
  </exclusion>
  <exclusion>
  <groupId>org.apache.zookeeper</groupId>
  <artifactId>zookeeper</artifactId>
  </exclusion>
  </exclusions>
</dependency>

At the time of writing this book, Apache Storm version 0.10.0 is only supported. Note that, You do not need any Kafka module to run or execute this example as describe in the Ignite documentation.

Step 2:

Create an Ignite configuration file (see example-ignite.xml file in /chapter-cep/storm/src/resources/example-ignite.xml) and make sure that it is available from the classpath. The content of the Ignite configuration is identical from the previous section of this chapter.

<beans xmlns="http://www.springframework.org/schema/beans"
  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
  xmlns:util="http://www.springframework.org/schema/util"
  xsi:schemaLocation="
  http://www.springframework.org/schema/beans
  http://www.springframework.org/schema/beans/spring-beans.xsd
  http://www.springframework.org/schema/util
  http://www.springframework.org/schema/util/spring-util.xsd">
  <bean id="ignite.cfg" class="org.apache.ignite.configuration.IgniteConfiguration">
  <!-- Enable client mode. -->
  <property name="clientMode" value="true"/>
  <!-- Cache accessed from IgniteSink. -->
  <property name="cacheConfiguration">
  <list>
  <!-- Partitioned cache example configuration with configurations adjusted to server nodes'. -->
  <bean class="org.apache.ignite.configuration.CacheConfiguration">
  <property name="atomicityMode" value="ATOMIC"/>

  <property name="name" value="testCache"/>
  </bean>
  </list>
  </property>
  <!-- Enable cache events. -->
  <property name="includeEventTypes">
  <list>
  <!-- Cache events (only EVT_CACHE_OBJECT_PUT for tests). -->
  <util:constant static-field="org.apache.ignite.events.EventType.EVT_CACHE_OBJECT_PUT"/>
  </list>
  </property>
  <!-- Explicitly configure TCP discovery SPI to provide list of initial nodes. -->
  <property name="discoverySpi">
  <bean class="org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi">
  <property name="ipFinder">
  <bean class="org.apache.ignite.spi.discovery.tcp.ipfinder.vm.TcpDiscoveryVmIpFinder">
  <property name="addresses">
  <list>
  <value>127.0.0.1:47500</value>
  </list>
  </property>
  </bean>
  </property>
  </bean>
  </property>
  </bean>
</beans>

Step 3:

Create a ignite-storm.properties file to add the cache name, tuple name and the name of the Ignite configuration as shown below.

cache.name=testCache
tuple.name=ignite
ignite.spring.xml=example-ignite.xml
Step 4:

Next, create FileSourceSpout Java class as shown below,

public class FileSourceSpout extends BaseRichSpout {
  private static final Logger LOGGER = LogManager.getLogger(FileSourceSpout.class);
  private SpoutOutputCollector outputCollector;
  @Override
  public void open(Map map, TopologyContext topologyContext, SpoutOutputCollector spoutOutputCollector) {
  this.outputCollector = spoutOutputCollector;
  }
@Override
  public void nextTuple() {
  try {
  Path filePath = Paths.get(this.getClass().getClassLoader().getResource("source.csv").toURI());
  try(Stream<String> lines = Files.lines(filePath)){
  lines.forEach(line ->{
  outputCollector.emit(new Values(line));
  });
  } catch(IOException e){
  LOGGER.error(e.getMessage());
  }
  } catch (URISyntaxException e) {
  LOGGER.error(e.getMessage());
  }
  }
  @Override
  public void declareOutputFields(OutputFieldsDeclarer outputFieldsDeclarer) {
  outputFieldsDeclarer.declare(new Fields("trafficLog"));
  }
}
The FileSourceSpout code has three important methods


  • open(): This method would get called at the start of the spout and will give you context information.
  • nextTuple(): This method would allow you to pass one tuple to Storm topology for processing at a time, in this method, I am reading the CSV file line by line and emitting the line as a tuple to the bolt.
  • declareOutputFields(): This method declares the name of the output tuple, in our case, the name should be trafficLog.
Step 5:
Now create SpeedLimitBolt.java class which implements BaseBasicBolt interface.

public class SpeedLimitBolt extends BaseBasicBolt {
  private static final String IGNITE_FIELD = "ignite";
  private static final int SPEED_THRESHOLD = 120;
  private static final Logger LOGGER = LogManager.getLogger(SpeedLimitBolt.class);
  @Override
  public void execute(Tuple tuple, BasicOutputCollector basicOutputCollector) {
  String line = (String)tuple.getValue(0);
  if(!line.isEmpty()){
  String[] elements = line.split(",");
  // we are interested in speed and the car registration number
  int speed = Integer.valueOf((elements[1]).trim());
  String car = elements[0];
  if(speed > SPEED_THRESHOLD){
  TreeMap<String, Integer> carValue = new TreeMap<String, Integer>();
  carValue.put(car, speed);
  basicOutputCollector.emit(new Values(carValue));
  LOGGER.info("Speed violation found:"+ car + " speed:" + speed);
  }
  }
  }
  @Override
  public void declareOutputFields(OutputFieldsDeclarer outputFieldsDeclarer) {
  outputFieldsDeclarer.declare(new Fields(IGNITE_FIELD));
  }
}
Let's go through line by line again.


  • execute(): This is the method where you implement the business logic of your bolt, in this case, I am splitting the line by the comma and check the speed limit of the car. If the speed limit of the given car is higher than the threshold, we are creating a new treemap data type from this tuple and emit the tuple to the next bolt, in our case the next bolt will be the StormStreamer.
  • declareOutputFields(): This method is similar to declareOutputFields() method in FileSourceSpout, it declares that it is going to return Ignite tuple for further processing.
Note that, The tuple name IGNITE is important here, the StormStreamer will only process the tuple with name Ignite.

Step 6:

It's the time to create our topology to run our example. Topology ties the spouts and bolts together in a graph, which defines how the data flows between the components. It also provides parallelism hints that Storm uses when creating instances of the components within the cluster. To implement the topology, create a new file named SpeedViolationTopology.java in the src\main\java\com\blu\imdg\storm\topology directory. Use the following as the contents of the file:

public class SpeedViolationTopology {
  private static final int STORM_EXECUTORS = 2;

  public static void main(String[] args) throws Exception {
  if (getProperties() == null || getProperties().isEmpty()) {
  System.out.println("Property file <ignite-storm.property> is not found or empty");
  return;
  }
  // Ignite Stream Ibolt
  final StormStreamer<String, String> stormStreamer = new StormStreamer<>();

  stormStreamer.setAutoFlushFrequency(10L);
  stormStreamer.setAllowOverwrite(true);
  stormStreamer.setCacheName(getProperties().getProperty("cache.name"));

  stormStreamer.setIgniteTupleField(getProperties().getProperty("tuple.name"));
  stormStreamer.setIgniteConfigFile(getProperties().getProperty("ignite.spring.xml"));


  TopologyBuilder builder = new TopologyBuilder();

  builder.setSpout("spout", new FileSourceSpout(), 1);
  builder.setBolt("limit", new SpeedLimitBolt(), 1).fieldsGrouping("spout", new Fields("trafficLog"));
  // set ignite bolt
  builder.setBolt("ignite-bolt", stormStreamer, STORM_EXECUTORS).shuffleGrouping("limit");
  Config conf = new Config();
  conf.setDebug(false);
  conf.setMaxTaskParallelism(1);
  LocalCluster cluster = new LocalCluster();
  cluster.submitTopology("speed-violation", conf, builder.createTopology());
  Thread.sleep(10000);
  cluster.shutdown();
  }
  private static Properties getProperties() {
  Properties properties = new Properties();
  InputStream ins = SpeedViolationTopology.class.getClassLoader().getResourceAsStream("ignite-storm.properties");
  try {
  properties.load(ins);
  } catch (IOException e) {
  e.printStackTrace();
  properties = null;
  }
  return properties;
  }
}

Let's go through line by line again. First, we read the ignite-strom.properties file to get all the necessary parameters to configure the StormStreamer bolt next. The storm topology is basically a Thrift structure. The TopologyBuilder class provides the simple and elegant way to build complex Storm topology. The TopologyBuilder class has methods to setSpout and setBolt. Next, we used the Topology builder to build the Storm topology and added the spout with name spout and parallelism hint of 1 executor. We also define the SpeedLimitBolt to the topology with parallelism hint of 1 executor. Next we set the StormStreamer bolt with shufflegrouping, which subscribes to the bolt, and equally, distributes tuples (limit) across the instances of the StormStreamer bolt.

For development purpose, we create a local cluster using LocalCluster instance and submit the topology using the submitTopology method. Once the topology is submitted to the cluster, we will wait 10 seconds for the cluster to compute the submitted topology and then shutdown the cluster using shutdown method of LocalCluster.

Step 7:

Next, run a local node of Apache Ignite or cluster first. After building the maven project, use the following command to run the topology locally.

mvn compile exec:java -Dstorm.topology=com.blu.imdg.storm.topology.SpeedViolationTopology

The application will produce a lot of system logs as follows.

Now, if we verify the Ignite cache through ignitevisior, we should get the following output into the console.



The output shows the result, what we expected. From our source.csv log file, only five vehicles exceed the speed limit of 120 km/h.

This pretty much sums up the practical overview of the Ignite Storm Streamer.

If you like this article, you would also like the book


Thursday

Book: High performance in-memory computing with apache Ignite

Being author is fun, my first book is going to be published in the end of this year. For now, a sample chapter is available for download.

Monday

High performance in-memory MapReduce with Apache ignite

Apache Ignite provides a set of useful components (Hadoop accelerator) allowing for in-memory Hadoop job execution and file system operations. This in-memory plug and play accelerators can be group by in three following categories:
1) MapReduce: It's an alternative implementation of Hadoop Job tracker and task tracker, which can accelerate job execution performance. It eliminates the overhead associated with job tracker and task trackers in a standard Hadoop architecture while providing low-latency, HPC-style distributed processing.
2) IGFS: It's also an alternate implementation of Hadoop file system named IgniteHadoopFileSystem, which can store data in-memory. This in-memory file system minimise disk IO and improve performances.
3) Secondary file system: This implementation works as a caching layer above HDFS, every read and write operation would go through this layer and can improve performance.
In this blog post, I'm going to configure apache ignite with Hadoop for executing in-memory map reduce job.

А high-level architecture of the Ignite hadoop accelerator is as follows:


Sandbox configuration:
VM: VMware
OS: RedHat enterprise linux 6
CPU: 2
RAM: 2 GB
JVM: 1.7_60
Hadoop Version: 2.7.2
Apache ignite version: 1.6

We are going to install Hadoop Pseudo-Distributed cluster in one machine, means datanode, namenode, task/job tracker, everything will be one virtual machine.
First of all, we will install and configure hadoop and will be proceed to apache ignite. Assume that JAVA 1.7_4* has been and installed and JAVA_HOME are in env variables.
Step 1:
Unpack the hadoop distribution and set the JAVA_HOME path in the etc/hadoop/hadoop-env.sh file as follows:
export JAVA_HOME=JAVA_home path
Step 2:
Add the following configuration in etc/hadoop/core-site.xml
<configuration>
    <property>
        <name>fs.defaultFS</name>
        <value>hdfs://localhost:9000</value>
    </property>
</configuration>
etc/hadoop/hdfs-site.xml:
<configuration>
    <property>
        <name>dfs.replication</name>
        <value>1</value>
    </property>
</configuration>
Step 3:
Setup password less or passphraseless ssh as follows:
$ ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa
  $ cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys
  $ chmod 0600 ~/.ssh/authorized_keys
and try ssh localhost, it shouldn't ask you for password.
Step 4:
Format the HDFS file system:
$ bin/hdfs namenode -format
Start namenode/datanode daemon
$ sbin/start-dfs.sh

Also it's better to add HADOOP_HOME in environmental variable of operation system.
Step 5:
make a few directories in HDFS to run map reduce job.
bin/hdfs dfs -mkdir /user
bin/hdfs dfs -mkdir /input
put some files in directory /input
$ bin/hdfs dfs -put $HADOOP_HOME/etc/hadoop /input
Step 6:
Run the map reduce example word count
$ bin/hadoop jar $HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.2.jar wordcount /input/hadoop output
View the output files on the distributed filesystem:
$ bin/hdfs dfs -cat output/*
it should be a huge file, lets see the fragment of the output
want 1
warnings. 1
when 9
where 4
which 7
while 1
who 6
will 23
window 1
window, 1
with 62
within 4
without 1
work 12
writing, 27
Now lets install and configure the apache Ignite.
Step 7:
Unpack the distributive of the apache ignite some where in your system and add the IGNITE_HOME path to the rood directory of the installation.
Step 8:
Add the following libraries in $IGNITE_HOME/libs:
  1. asm-all-4.2.jar
  2. ignite-hadoop-1.6.0.jar
  3. hadoop-mapreduce-client-core-2.7.2.jar
Step 9:
We are going to use ignite default configuration, config/default-config.xml, now we can start ignite as follows:
$ bin/ignite.sh
Step 10:
Now we will add a few staff to use ignite job tracker instead of hadoop. Add the HADOOP_CLASSPATH to the environmental variables as follows:
export HADOOP_CLASSPATH=$HADOOP_CLASSPATH:$IGNITE_HOME/libs/ignite-core-1.6.0.jar:$IGNITE_HOME/libs/ignite-hadoop-1.6.0.jar:$IGNITE_HOME/libs/ignite-shmem-1.0.0.jar
Step 11:
Stop hadoop name node and data nodes.
$ sbin/stop-dfs.sh
Step 12:
We have to override hadoop mapred-site.xml (there are a several way to add your own mapred-site.xml for ignite, see the documentation here http://apacheignite.gridgain.org/v1.0/docs/map-reduce). For quick start we will add the following fragment of xml to the mapred-site.xml
<property>
    <name>mapreduce.framework.name</name>
    <value>ignite</value>
  </property>
  <property>
    <name>mapreduce.jobtracker.address</name>
    <value>127.0.0.1:11211</value>
  </property>
Step 13:
Run the previous example of the word count map reduce example:
$bin/hadoop jar $HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.2.jar wordcount /input/hadoop output2
Here is the output:

Now Execution time is faster than last time, when we used hadoop task tracker. Lets examine the Ignite task execution statistics through ignite visor:
Here we can see the total execution time and the duration time of the in-memory task tracker, in our case it's HadoopProtocolJobStatusTask(@t1), total 24, execution rate 12 second.
In the next blog post, we will cover the Ignite for IGFS and secondary file system.

If you like this article, you would also like the book

Sunday

Pitfalls of the MyBatis Caches with Apache Ignite

UPD1: This blog has been published in Java Dzone https://dzone.com/articles/pitfalls-of-the-mybatis-caches-with-apache-ignite
UPD2: This blog also published in Habrahabr for Russian reader https://habrahabr.ru/company/at_consulting/blog/280452
UPD3: See also the sample chapter of the book "High performance in-memory computing with Apache Ignite" here.

A week ago, MyBatis and Apache ignite announced of support apache ignite as a MyBatis cache (L2 cache).
technically MyBatis support two levels of Caches:
  1. Local cache, which is always enable by default
  2. L2 cache, optional
As Apache Ignite project is fast growing with it's various functionality, in this blog post we are going to examine the MyBatis support in some details.
The second level cache stores the entity data, but NOT the entities or objects themselves. The data is stored in a 'serialised' format which looks like a hash map where the key is the entity Id, and the value is a list of primitive values.
Here is an example how the cache entries looks like in Apache ignite:
Where
Cache Key: CacheKey [idHash=1499858, hash=2019660929, checksum=800710994, count=6, multiplier=37, hashcode=2019660929, updateList=[com.blu.ignite.mapper.UserMapper.getUserObject, 0, 2147483647, SELECT * FROM all_objects t where t.OBJECT_TYPE='TABLE' and t.object_name=?, USERS, SqlSessionFactoryBean]]
Value class: java.util.ArrayList
Cache Value: [UserObject [idHash=243119413, hash=1658511469, owner=C##DONOTDELETE, object_type=TABLE, object_id=94087, created=Mon Feb 15 13:59:41 MSK 2016, object_name=USERS]]
As for Example, i selected the 'all_objects' objects and the following query from the Oracle Database
SELECT count(*) FROM all_objects;

SELECT * FROM all_objects t where t.OBJECT_TYPE='TABLE' and t.object_name='EMP';

SELECT * FROM all_objects t where t.OBJECT_TYPE='TABLE';
In my case, this given query execution time is ~660 ms in average.
SELECT count(*) FROM all_objects;
And the next following query execution time is more than 700ms:
SELECT t.object_type, count(*) FROM all_objects t group by t.OBJECT_TYPE;
lets add apache ignite as a second level cache and examine the result. If you want to know how to install and configure apache ignite with spring and myBatis, please refer to my previous blog post. Moreover, all the source you can find in github repositories.
As a quick start, lets add the myBatis maven dependency in project.
<dependency>
    <groupId>org.mybatis.caches</groupId>
    <artifactId>mybatis-ignite</artifactId>
    <version>1.0.0-beta1</version>
</dependency>
Then, just specify it in the mapper XML as follows
<mapper namespace="com.blu.ignite.mapper.UserMapper">

    <cache type="org.mybatis.caches.ignite.IgniteCacheAdapter" />

    <select id="getUserObject" parameterType="String" resultType="com.blu.ignite.dto.UserObject" useCache="true">
        SELECT * FROM all_objects t where t.OBJECT_TYPE='TABLE' and t.object_name=#{objectName}
    </select>
    <select id="getAllObjectsTypeByGroup" parameterType="String" resultType="com.blu.ignite.dto.UobjectGroupBy" useCache="true">
        SELECT t.object_type, count(*) as cnt FROM all_objects t group by t.OBJECT_TYPE
    </select>

    <select id="allObjectCount" parameterType="String" resultType="String" useCache="true">
        SELECT count(*) FROM all_objects
    </select>
</mapper>
Also i have the following java mapper
public interface UserMapper {
    User getUser( String id);
    List<string> getUniqueJob();
    UserObject getUserObject(String objectName);
    String allObjectCount();
    List<uobjectgroupby> getAllObjectsTypeByGroup();
}
and the web service as follows:
@WebService(name = "BusinessRulesServices",
        serviceName="BusinessRulesServices",
        targetNamespace = "http://com.blu.rules/services")
public class WebServices {
    private UserServices userServices;

    @WebMethod(operationName = "getUserName")
    public String getUserName(String userId){
        User user = userServices.getUser(userId);
        return user.getuName();
    }
    @WebMethod(operationName = "getUserObject")
    public UserObject getUserObject(String objectName){
        return userServices.getUserObject(objectName);
    }
    @WebMethod(operationName = "getUniqueJobs")
    public List<string> getUniqueJobs(){
        return userServices.getUniqueJobs();
    }
    @WebMethod(exclude = true)
    public void setDao(UserServices userServices){
        this.userServices = userServices;
    }
    @WebMethod(operationName = "allObjectCount")
    public String allObjectCount(){
        return userServices.allObjectCount();
    }
    @WebMethod(operationName = "getAllObjectsTypeCntByGroup")
    public List<uobjectgroupby> getAllObjectsTypeCntByGroup(){
        return userServices.getAllObjectCntbyGroup();
    }

}
If i will invoke the web method 'getAllObjectsTypeCntByGroup' in soupUI, first time it will get very high response time, approximately 1700 ms, because the result is not in the cache. From the second times, response time will be ~4 to ~5 ms.

Invoke web method first time will look like this:
Response time of the second or later invoke of web method
In apache ignite cache entry will look like as follows:

Cache Key: CacheKey [idHash=46158416, hash=1558187086, checksum=2921583030, count=5, multiplier=37, hashcode=1558187086, updateList=[com.blu.ignite.mapper.UserMapper.getAllObjectsTypeByGroup, 0, 2147483647, SELECT t.object_type, count(*) as cnt FROM all_objects t group by t.OBJECT_TYPE, SqlSessionFactoryBean]]
Value class: java.util.ArrayList
Cache Value: [UobjectGroupBy [idHash=2103707742, hash=1378996400, cnt=1, object_type=EDITION], UobjectGroupBy [idHash=333378159, hash=872886462, cnt=444, object_type=INDEX PARTITION], UobjectGroupBy [idHash=756814918, hash=1462794064, cnt=32, object_type=TABLE SUBPARTITION], UobjectGroupBy [idHash=931078572, hash=953621437, cnt=2, object_type=CONSUMER GROUP], UobjectGroupBy [idHash=1778706917, hash=1681913927, cnt=256, object_type=SEQUENCE], UobjectGroupBy [idHash=246231872, hash=1764800190, cnt=519, object_type=TABLE PARTITION], UobjectGroupBy [idHash=1138665719, hash=1030673983, cnt=4, object_type=SCHEDULE], UobjectGroupBy [idHash=232948577, hash=1038362844, cnt=1, object_type=RULE], UobjectGroupBy [idHash=1080301817, hash=646054631, cnt=310, object_type=JAVA DATA], UobjectGroupBy [idHash=657724550, hash=1248576975, cnt=201, object_type=PROCEDURE], UobjectGroupBy [idHash=295410055, hash=33504659, cnt=54, object_type=OPERATOR], UobjectGroupBy [idHash=150727006, hash=499210168, cnt=2, object_type=DESTINATION], UobjectGroupBy [idHash=1865360077, hash=727903197, cnt=9, object_type=WINDOW], UobjectGroupBy [idHash=582342926, hash=1060308675, cnt=4, object_type=SCHEDULER GROUP], UobjectGroupBy [idHash=1968399647, hash=1205380883, cnt=1306, object_type=PACKAGE], UobjectGroupBy [idHash=1495061270, hash=1345537223, cnt=1245, object_type=PACKAGE BODY], UobjectGroupBy [idHash=1328790450, hash=1823695135, cnt=228, object_type=LIBRARY], UobjectGroupBy [idHash=1128429299, hash=1267824468, cnt=10, object_type=PROGRAM], UobjectGroupBy [idHash=760711193, hash=1240703242, cnt=17, object_type=RULE SET], UobjectGroupBy [idHash=317487814, hash=61657487, cnt=10, object_type=CONTEXT], UobjectGroupBy [idHash=1079028994, hash=1960895356, cnt=229, object_type=TYPE BODY], UobjectGroupBy [idHash=276147733, hash=873140579, cnt=44, object_type=XML SCHEMA], UobjectGroupBy [idHash=24378178, hash=1621363993, cnt=1014, object_type=JAVA RESOURCE], UobjectGroupBy [idHash=1891142624, hash=90282027, cnt=10, object_type=DIRECTORY], UobjectGroupBy [idHash=902107208, hash=1995006200, cnt=593, object_type=TRIGGER], UobjectGroupBy [idHash=142411235, hash=444983119, cnt=14, object_type=JOB CLASS], UobjectGroupBy [idHash=373966405, hash=1518992835, cnt=3494, object_type=INDEX], UobjectGroupBy [idHash=580466919, hash=1394644601, cnt=2422, object_type=TABLE], UobjectGroupBy [idHash=1061370796, hash=1861472837, cnt=37082, object_type=SYNONYM], UobjectGroupBy [idHash=1609659322, hash=1543110475, cnt=6487, object_type=VIEW], UobjectGroupBy [idHash=458063471, hash=1317758482, cnt=346, object_type=FUNCTION], UobjectGroupBy [idHash=1886921697, hash=424653540, cnt=7, object_type=INDEXTYPE], UobjectGroupBy [idHash=1455482905, hash=1776171634, cnt=30816, object_type=JAVA CLASS], UobjectGroupBy [idHash=49819096, hash=2110362533, cnt=2, object_type=JAVA SOURCE], UobjectGroupBy [idHash=1916179950, hash=1760023032, cnt=10, object_type=CLUSTER], UobjectGroupBy [idHash=1138808674, hash=215713426, cnt=2536, object_type=TYPE], UobjectGroupBy [idHash=305229607, hash=340664529, cnt=23, object_type=JOB], UobjectGroupBy [idHash=1365509716, hash=623631686, cnt=12, object_type=EVALUATION CONTEXT]]

Performance gain:
With simple calculation we can define the performance gain that we have got: Response Time without cache/Response Time with cache = 1589ms/6ms ~265X faster, or (Response Time without cache - Response Time with Cache)/ Response Time with cache * 100 = (1589-6)/6*100 ~ 26383% percent faster.

Conclusion: Expensive database operation can be reduce by using L2 cache, properly using L2 cache in MyBatis can increase the application performance from 10 to 20 times. Apache Ignite in memory data grid is a very suitable candidate for this purpose, certainly, you can also use Hazelcast, EhCache or any other Caching tools.

If you like this article, you would also like the book

Thursday

Quick start with In memory Data Grid, Apache Ignite

UP1: For complete quick start guide, see also the sample chapter of the book "High performance in-memory computing with Apache Ignite" here. Even you can find the sample examples from the GitHub repository.

IMDG or In memory data grid is not an in-memory relational database, an NoSQL database or a relational database. It is a different breed of software datastore. The data model is distributed across many servers in a single location or across multiple locations. This distribution is known as a data fabric. This distributed model is known as a ‘shared nothing’ architecture. IMDG has following characteristics:

  1. All servers can be active in each site.
  2. All data is stored in the RAM of the servers.
  3. Servers can be added or removed non-disruptively, to increase the amount of RAM available.
  4. The data model is non-relational and is object-based. 
  5. Distributed applications written on the platform independent language.
  6. The data fabric is resilient, allowing non-disruptive automated detection and recovery of a single server or multiple servers.

Most of all time we use IMDG for web session management of Application server and as a distributed cache or L2 cache. Hazelcast community addition was our all time favourite IMDG tools, but from the last few releases of hazelcast community edition, it's performance not happy us at all. As a quick alternative of HazelCast, we decided to make a try with Apache ignite. This post is dedicated to apache ignite and be used for quick startup guide. For installation I will use 2 virtual machines of Redhat operating system with following configurations:

CPU: 2
RAM: 4
HDD: 25 GB
OS: Redhat Santiago

From a lot of features of Apache ignite6 we will only examines following features:
  1. Prepare operating system
  2. Using Spring for using DataGrid
  3. MyBatis Cache configuration
  4. Spring Caching

Instaling apache ignite:
Pre require-ties:
1) Java 1.7 and above
2) open ports: 47500..47509, 8080 (for Rest interface), 47400, 47100:47101, 48100:48101, 31100:31101

After installing JDK in operating system, we have to open ports that mentioned above. By following commands we can manipulates iptables.
vi /etc/sysconfig/iptables
-A INPUT -m state --state NEW -m tcp -p tcp --dport 47500:47509 -j ACCEPT
-A INPUT -m state --state NEW -m tcp -p tcp --dport 47400 -j ACCEPT
-A INPUT -m state --state NEW -m tcp -p tcp --dport 47100 -j ACCEPT
-A INPUT -m state --state NEW -m tcp -p tcp --dport 47101 -j ACCEPT
-A INPUT -m state --state NEW -m tcp -p tcp --dport 48100 -j ACCEPT
-A INPUT -m state --state NEW -m tcp -p tcp --dport 48101 -j ACCEPT
-A INPUT -m state --state NEW -m tcp -p tcp --dport 31100 -j ACCEPT
-A INPUT -m state --state NEW -m tcp -p tcp --dport 31101 -j ACCEPT

/etc/init.d/iptables restart
Installation of Apache ignite in few machines:
1) Lets download the ignite 1.5.0final version from the following links.
2) Unzip the archive any where in os such as /opt/apache-ignite
3) Add environmental path IGNITE_HOME to the home directory of the apache ignite.
4) copy the folder $IGNITE_HOME/libs/optional/ignite-rest-http to /home/user/apache-ignite-fabric-1.5.0/libs, it will enable the apache ignite through rest interface.
5) Run the command ignite.sh examples/config/example-cache.xml to start the apache ignite.
If every thing goes fine you should see the following log in your console
[12:32:01] Ignite node started OK (id=ceb614ca)
[12:32:01] Topology snapshot [ver=4, servers=2, clients=0, CPUs=3, heap=2.0GB]
and ignite will also available through http by URL http://host:port/ignite?cmd=version
Using Spring for using DataGrid:
First of all we have to build maven project to write down bunch of code to examine features of apache Ignite.
1) Add the following dependencies to the pom.xml
<dependency>
            <groupId>org.apache.ignite</groupId>
            <artifactId>ignite-core</artifactId>
            <version>${ignite.version}</version>
        </dependency>
        <dependency>
            <groupId>org.apache.ignite</groupId>
            <artifactId>ignite-spring</artifactId>
            <version>${ignite.version}</version>
        </dependency>
        <dependency>
            <groupId>org.apache.ignite</groupId>
            <artifactId>ignite-indexing</artifactId>
            <version>${ignite.version}</version>
        </dependency>
        <!-- myBatis -->
        <dependency>
            <groupId>org.mybatis.caches</groupId>
            <artifactId>mybatis-ignite</artifactId>
            <version>1.0.0-beta1</version>
        </dependency>
        <dependency>
            <groupId>org.mybatis</groupId>
            <artifactId>mybatis-spring</artifactId>
            <version>1.2.4</version>
        </dependency>
        <dependency>
            <groupId>org.mybatis</groupId>
            <artifactId>mybatis</artifactId>
            <version>3.3.1</version>
        </dependency>
        <!-- Oracle 12-->
        <dependency>
            <groupId>com.oracle</groupId>
            <artifactId>ojdbc6</artifactId>
            <version>11.2.0.3</version>
        </dependency>
Please, note that Oracle JDBC client jar should be in local maven repositories. In my case i use Oracle 11.2.02 client.
2) Add spring-context.xml file in resources directory with following contexts
<beans xmlns="http://www.springframework.org/schema/beans"
       xmlns:context="http://www.springframework.org/schema/context"
       xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
       xmlns:mvc="http://www.springframework.org/schema/mvc"
       xmlns:cache="http://www.springframework.org/schema/cache"
       xsi:schemaLocation="
        http://www.springframework.org/schema/beans
        http://www.springframework.org/schema/beans/spring-beans.xsd
        http://www.springframework.org/schema/cache
        http://www.springframework.org/schema/cache/spring-cache-3.1.xsd
        http://www.springframework.org/schema/context
        http://www.springframework.org/schema/context/spring-context.xsd ">
    <!-- Enable annotation-driven caching. -->
    <cache:annotation-driven/>

    <context:property-placeholder location="classpath:jdbc.properties"/>
    <!-- beans -->

    <bean id="ignite.cfg" class="org.apache.ignite.configuration.IgniteConfiguration">
        <property name="gridName" value="TestGrid"/>
        <!-- Enable client mode. -->
        <property name="clientMode" value="true"/>

        <property name="cacheConfiguration">
            <list>
                <!-- Partitioned cache example configuration (Atomic mode). -->
                <bean class="org.apache.ignite.configuration.CacheConfiguration">
                    <!--<property name="atomicityMode" value="ATOMIC"/>-->
                    <!-- Set cache mode. -->
                    <property name="cacheMode" value="PARTITIONED"/>
                    <property name="backups" value="1"/>
                    <property name="statisticsEnabled" value="true" />
                </bean>
            </list>
        </property>
        <!-- Explicitly configure TCP discovery SPI to provide list of initial nodes. -->
        <property name="discoverySpi">
            <bean class="org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi">
                <property name="ipFinder">
                    <!-- Uncomment static IP finder to enable static-based discovery of initial nodes. -->
                    <!--<bean class="org.apache.ignite.spi.discovery.tcp.ipfinder.vm.TcpDiscoveryVmIpFinder">-->
                    <bean class="org.apache.ignite.spi.discovery.tcp.ipfinder.multicast.TcpDiscoveryMulticastIpFinder">
                        <property name="addresses">
                            <list>
                                <!-- In distributed environment, replace with actual host IP address. -->
                                <value>Add your node ip address</value>
                                <value>add your node ip address</value>
                            </list>
                        </property>
                    </bean>
                </property>
            </bean>
        </property>
    </bean>
    <bean id="sqlSessionFactory" class="org.mybatis.spring.SqlSessionFactoryBean">
        <property name="dataSource" ref="dataSource" />
        <property name="mapperLocations" value="classpath*:com/blu/ignite/dao/*Mapper.xml"/>
    </bean>
    <bean id="dataSource" class="oracle.jdbc.pool.OracleDataSource" destroy-method="close">
        <property name="URL" value="${jdbc.url}" />
        <property name="user" value="${jdbc.username}"/>
        <property name="password" value="${jdbc.password}"/>
        <property name="connectionCachingEnabled" value="true"/>
    </bean>
</beans>
Lets examine a few configuration properties:
property name="clientMode" value="true" - this property will force the current application to run as client.
property name="cacheMode" value="PARTITIONED" - cache mode will be partitioned, cache mode can be replicated also.
property name="backups" value="1" - always there will be one redundant element of cache in another node.
property name="statisticsEnabled" value="true" - this property will activate the cache statistics.

3) now lets write some
public class SpringIgniteRun {
    public static void main(String[] args) throws Exception{
        System.out.println("Run Spring example!!");
        ApplicationContext ctx = new ClassPathXmlApplicationContext("spring-core.xml");

        IgniteConfiguration igniteConfiguration = (IgniteConfiguration) ctx.getBean("ignite.cfg");
        Ignite ignite = Ignition.start(igniteConfiguration);
        // get or create cache
        IgniteCache cache = ignite.getOrCreateCache("myCacheName");
        for(int i = 1; i < 1000; i++){
            cache.put(i, Integer.toString(i));
        }
        for(int i =1; i<1000;i++){
            System.out.println("Cache get:"+ cache.get(i));
        }
        Thread.sleep(20000); // sleep for 20 seconds
        // statistics
        System.out.println("Cache Hits:"+ cache.metrics(ignite.cluster()).getCacheHits());
        ignite.close();
    }
}
above code is self explained, we just create a cache named "myCacheName" and add 1000 String value of Integer. After inserting the value to cache, we also read the elements from cache and check the statistics. through ignitevisorcmd you can also monitor the data grid, follows you can find screen shot of the statistics of the grid
MyBatis Cache configuration: Now lets add MyBatis ORM l2 cache and examine how it's works.
<bean id="servicesBean" class="com.blu.ignite.WebServices">
        <property name="dao" ref="userServicesBean"/>
    </bean>
    <bean id="userServicesBean" class="com.blu.ignite.dao.UserServices">
        <property name="userMapper" ref="userMapper"/>
    </bean>

    <bean id="sqlSessionFactory" class="org.mybatis.spring.SqlSessionFactoryBean">
        <property name="dataSource" ref="dataSource" />
        <property name="mapperLocations" value="classpath*:com/blu/ignite/dao/*Mapper.xml"/>
    </bean>
    <bean id="dataSource" class="oracle.jdbc.pool.OracleDataSource" destroy-method="close">
        <property name="URL" value="${jdbc.url}" />
        <property name="user" value="${jdbc.username}"/>
        <property name="password" value="${jdbc.password}"/>
        <property name="connectionCachingEnabled" value="true"/>
    </bean>


    <bean id="userMapper" autowire="byName" class="org.mybatis.spring.mapper.MapperFactoryBean">
        <property name="mapperInterface" value="com.blu.ignite.mapper.UserMapper" />
        <property name="sqlSessionFactory" ref="sqlSessionFactory" />
    </bean>
    <bean class="org.mybatis.spring.mapper.MapperScannerConfigurer">
        <property name="basePackage" value="com.blu.ignite.mapper" />
    </bean>
We add SQLsessionFactory, MyBatis mapper and Service Bean. Now lets add the *.Mapper.xml
<?xml version="1.0" encoding="UTF-8" ?>

<!DOCTYPE mapper
        PUBLIC "-//mybatis.org//DTD Mapper 3.0//EN" "http://mybatis.org/dtd/mybatis-3-mapper.dtd">

<mapper namespace="com.blu.ignite.mapper.UserMapper">

    <cache type="org.mybatis.caches.ignite.IgniteCacheAdapter" />

    <select id="getUser" parameterType="String" resultType="com.blu.ignite.dto.User" useCache="true">
      SELECT * FROM users WHERE id = #{id}
    </select>

    <select id="getUniqueJob" parameterType="String" resultType="String" useCache="false">
        select unique job from emp order by job desc
    </select>

</mapper>
Full sql (DDL/DML) scripts of emp and dept tables will be find in the directory com/blu/ignite/scripts I have created a simple web service to get the users and the unique jobs for employees. Here is the code for the web service as follows:
@WebService(name = "BusinessRulesServices",
        serviceName="BusinessRulesServices",
        targetNamespace = "http://com.blu.rules/services")
public class WebServices {
    private UserServices userServices;

    @WebMethod(operationName = "getUserName")
    public String getUserName(String userId){
        User user = userServices.getUser(userId);
        return user.getuName();
    }
    @WebMethod(operationName = "getUniqueJobs")
    public List getUniqueJobs(){
        return userServices.getUniqueJobs();
    }
    @WebMethod(exclude = true)
    public void setDao(UserServices userServices){
        this.userServices = userServices;
    }

}
Invoke of the web method getUserName will query the database and cache the query result in Ignite cache. Spring Caching: With spring caching you could achieve caching of the return value of any spring bean method. Apache Ignite will create the cache by the name of the cache which you will provide by the annotation @Cacheable("returnHello") For example, if I have such method as follows:
@Cacheable("returnHello")
    public String sayhello(String str){
        System.out.println("Client says:"+ str);

        return "hello"+str;
    }
The first time when the method will be invoked, a replicated cache with argument name will create in ignite, next time every invocation of the above method will return the value from the cache.
For now it's enough. Soon i will return back with some new features of apache ignite. Full source code of the project will found in the github. If you like this article, you would also like the book

Monday

Benchmarking high performance java collection framework

I am an ultimate fan of java high performance framework or library. Java native collection framework always works with primitive wrapper class such as Integer, Float e.t.c. Boxing and unboxing of wrapper class to primitive data type always decrease the java execution performance. Most of us, always looking for such a library or framework to works with primitive data type in collections for increasing performance of Java application. Most of the time i uses javolution framework to get better performance, however, this holiday i have read about a few new java collections frameworks and decided to do some homework benchmarking to find out, how much they could better than Java native collection framework.
I have examine two new java collection framework, one of them are fastutil and another one are HPPC. For benchmarking i have used java JMH with mode Throughput. For benchmarking i took similar collection for java ArrayList, HashSet and HasMap from two above described frameworks.
Collections:
  1. ArrayList
  2. HashSet
  3. HashMap
Datatype:
Integer for Java native collection and int for HPPC and fastutils.
Host machine configuration:
OS: OSX El Capitan
CPU: 4
RAM: 16
HDD: SDD
JMH Configuration:
Fork: 10
iteration: 10
warm iteration: 10
Benchmark of List:
  1. Java native: ArrayList<Integer>
  2. FastUtil:it.unimi.dsi.fastutil.ints.IntArrayList
  3. HPPC:com.carrotsearch.hppc.IntArrayList
Number of elements: 10 000
Operations: add, retrieve by Iterator
Result of the benchmark is as follows:
it's obvious that FastUtil IntArrayList collection win with huge score. if you are wonder and decided that fastutil framework will always win the benchmark, you will be surprised. Lets examine another collections
Benchmark of Set:
  1. Java native: HashSet<Integer>
  2. FastUtil:it.unimi.dsi.fastutil.ints.IntSet
  3. HPPC:com.carrotsearch.hppc.IntHashSet
Number of elements: 10 000
Operations: put, retrieve by Iterator
Result of the benchmark is as follows:
Wonder!! me too, i have over checked the code. Java native HashSet wins over all other framework, very interesting the result of the fastutil score, it's only 89.
Benchmark of Map:
  1. Java native: HashMap<Integer>
  2. FastUtil:it.unimi.dsi.fastutil.ints.Int2IntArrayMap
  3. HPPC:com.carrotsearch.hppc.IntIntHashMap
Number of elements: 10 000
Operations: put, retrieve by Iterator
Result of the benchmark is as follows:
Similar score with Set, Java native Map is wining over other frameworks. Question is, why fastutil collection framework giving such a poor score? Answer of this question we will examine in next few blogs.