Tuesday

Book: High performance in-memory computing with Apache Ignite has been published

The book "High performance in-memory computing with apache Ignite" has been released and
available at http://leanpub.com/ignite

Print copy of the book is available at Lulu.com & Amazon bookstore.

Support independent publishing: Buy this book on Lulu.
UP1: NOW the book is available for purchase from the Russian federation through PayPal (Ignore the yellow warning).


The goal of the book is to provide a guide for those who really need to implement the In-memory platform in their projects. At the same time, the idea behind the book is not writing a manual.

This book wraps all the topics like in-memory data grid, highly available service grid, streaming and in-memory computing use cases from high-performance computing to get the performance gain. The book will be particularly useful for those, who have the following use cases:

  • You have database bottleneck in your application and want to solve the problem.
  • You have a high volume of ACID transactions in your system.
  • You want to develop and deploy microservices in distributed fashion.
  • You have existing Hadoop ecosystem (OLAP) and want to improve the performance of the Map/Reduce jobs without making any changes in your existing Map/Reduce jobs.
  • You want to share Spark RDD directly in-memory (without storing the state to disk), which can dramatically increase the performance of the Spark jobs.
  • You are planning to migrate to microservices and the web session clustering is the problem for you.
  • You are planning to process continuous never-ending streams and complex events of data in scalable and fault-tolerant fashion.
  • You want to use distributed computations in parallel fashion to gain high performance, low latency, and linear scalability.
  • You heard about Off-heap memory but don't know how to use it in your application.
For every topic, a complete application is delivered, which will help the audience to quick start with the topic. The book is a project-based guide, where each chapter focuses on the complete implementation of a real-world scenario, the commonly occurring challenges in each scenario has also discussed, along with tips and tricks and best practices on how to overcome them. Every chapter is independent and a complete project.

Who is this book for

Target audience of this book will be IT architect, team leaders, a programmer with minimum programming knowledge, who want to get the maximum performance from their applications.

No excessive knowledge is required, though it would be good to be familiar with JAVA and Spring framework. The book is also useful for any reader, who already familiar with Oracle Coherence, Hazelcast, Infinispan or memcached.

See the full table of contents of the book here.

Happy Reading.

Sunday

The Apache Ignite Native persistence, a brief overview

In-memory approaches can achieve blazing speed by putting the working set of the data into the system memory. When all data is kept in memory, the need to deal with issues arising from the use of traditional spinning disks disappears. This means, for instance, there is no need to maintain additional cache copies of data and manage synchronization between them. But there is also a downside to this approach because the data is in memory only, it will not survive if the whole cluster gets terminated. Therefore, this types of data stores are not considered persistence at all.
In this blog post, I will do an effort to explore the Apache Ignite new native persistence feature and provide a clear, understandable picture how the Apache Ignite native persistence works. 
In most cases, you can’t (should not) store the whole data set in memory for your application, most often you should store relatively small hot or active subset of data to increase the performance of the application. The rest of the data should be stored somewhere in low-cost disks or tape for archiving. There are two main in-memory database storage requirements available:
  • Permanent media, to store committed transactions, thereby maintaining durability and for recovery purpose if the in-memory database needs to be reloaded into the memory.
  • Permanent storage, to hold a backup copy of the entire in-memory database.
Permanent storage or media can be any distributed or local file system, SAN, NoSQL database or even RDBMS like Postgres or Oracle. Apache Ignite (since 1.5) provides an elegant way to connect persistence data stores such as RDBMS or NoSQL DB like Mongo DB or Cassandra. Most often persistence in an RDBMS will be bottlenecks and you never got a horizontal scaling in your system. For more information, I recommended you to have a look at the sample chapter of the book "High performance in-memory computing with Apache Ignite".


So, from the version 2.1.0, Apache Ignite provides ACID and SQL-compliant disk store that transparently integrates with Ignite's durable memory as an optional disk layer storing data and indexes on SSD, Flash, 3D XPoint, and other types of non-volatile storages.
The Apache Ignite native persistence uses new durable memory architecture that allows storing and processing data and indexes both in-memory and on disk. Whenever the feature enables, Apache Ignite stores a superset of data on disk, and a subset of data in RAM based on its capacity. If a subset of data or an index is missing in RAM, the Durable Memory will take it from the disk as shown new pictures below.
Data can be also stored in the central disk storage where all the Ignite nodes connected as shown below.

Before we start, let's cover the prerequisites of the project in our sandbox:
  1. Apache Ignite version 2.1.0
  2. JVM 1.8
  3. Apache Maven version >3.0.3
  4. *nix based operating system
Installation.
There are basically two ways to use Apache Ignite:
  • Download the binary distribution and unzip the archive somewhere in your os and run the ./ignite.sh bash script with the spring config files.
  • Create a maven project with the required Apache Ignite dependencies, configure the node through the java code and run it.
Here, I am going to use the first option.
Step 1.
  • Download the Apache Ignite binary distribution and unzip the distribution somewhere in your sandbox. 
  • Modify the IGNITE_HOME/examples/config/persistentstore/example-persistent-store.xml file and comment the following part of the cache configuration.
<property name="cacheConfiguration">
<list>
    <bean class="org.apache.ignite.configuration.CacheConfiguration">
        <property name="name" value="testCache"/>
            <property name="backups" value="1"/>
            <property name="atomicityMode" value="TRANSACTIONAL"/>
            <property name="writeSynchronizationMode" value="FULL_SYNC"/>
            <property name="indexedTypes">
            <list>
            <value>java.lang.Long</value>
                <value>org.apache.ignite.examples.model.Organization</value>
            </list>
            </property>
        </bean>
    </list>
</property>
Note that, to enable the Ignite native persistence, you only need to pass the following configuration (an instance of the PersistentStoreConfiguration), which already pre-configured in the example-persistent-store.XML file.


<property name="persistentStoreConfiguration">
<bean class="org.apache.ignite.configuration.PersistentStoreConfiguration"/>
</property>
  • Run the following command from the IGNITE_HOME directory.
./ignite.sh $IGNITE_HOME/examples/config/persistentstore/example-persistent-store.xml
Step 2.
  • create a Maven project with the following command.
mvn archetype:create -DgroupId=com.blu.imdg -DartifactId=ignite-persistence
  • Add the following dependencies in the pom.xml
<dependency>
      <groupId>org.apache.ignite</groupId>
      <artifactId>ignite-core</artifactId>
      <version>2.1.0</version>
    </dependency>
    <dependency>
      <groupId>org.apache.ignite</groupId>
      <artifactId>ignite-spring</artifactId>
      <version>2.1.0</version>
    </dependency>
    <dependency>
      <groupId>org.apache.ignite</groupId>
      <artifactId>ignite-indexing</artifactId>
      <version>2.1.0</version>
    </dependency>
  • Create a Java class with the following contents.
public class HelloWorld {
    public static void main(String[] args) {
        System.out.println("Hello Ignite");
        // create a new instance of TCP Discovery SPI
        TcpDiscoverySpi spi = new TcpDiscoverySpi();
        // create a new instance of tcp discovery multicast ip finder
        TcpDiscoveryMulticastIpFinder tcMp = new TcpDiscoveryMulticastIpFinder();
        tcMp.setAddresses(Arrays.asList("localhost")); // change your IP address here
        // set the multi cast ip finder for spi
        spi.setIpFinder(tcMp);
        // create new ignite configuration
        IgniteConfiguration cfg = new IgniteConfiguration();
        cfg.setClientMode(true);
        // set the discovery§ spi to ignite configuration
        cfg.setDiscoverySpi(spi);
        // Start ignite
        Ignite ignite = Ignition.start(cfg);
        ignite.active(true);
        // get or create cache
        IgniteCache cache = ignite.getOrCreateCache("testCache");
        // put some cache elements
        for (int i = 1; i <= 100; i++) {
            cache.put(i, Integer.toString(i));
        }
        // get them from the cache and write to the console
        for (int i = 1; i <= 100; i++) {
            System.out.println("Cache get:" + cache.get(i));
        }
        ignite.close();

    }
}
Note that, we are using Ignite client mode for manipulating data. After running the HelloWorld application 100 elements should be inserted in the cache (cache name testCache).
Step 3.
  • let's use the ignitevisor command tool to examine the data. use cache -scan command in ignitevisor command tool. You should get the similar illustration in your console. All 100 elements in the cache.
    • Now, let's see whats happened under the hood. Run the following command from the IGNITE_HOME/work directory
du -h .
You should get something like in your console as shown below.
If Apache Ignite native persistence enables, Ignite will persist all the data and the index in memory and on disk across all the cluster nodes. 
If you will go through the directory db/0_0_0_0_0_0_0_1_10_211_55_2_10_37_129_2_127_0_0_1_192_168_1_37_47500 (in my case), you will find individual folder for every cache. The folder with name cache-testCache will contain all the cache entries (100 elements) which we have just inserted.
The file index.bin is the index of the cache entries and every cache element gets their individual page file. Why did this happen? now Ignite architecture is page based architecture. Let's take a closer look, memory now splits into regions -> regions split into segments -> segments split into pages. Pages can be swapped into the disk. Pages can store:
  • data 
  • metadata
  • index
Page are fixed-length block, it also supports automatic defragmentation. If you take a closer look at the pages size, all of them are 14 KB. Whenever Ignite needs to load data from the disk, it just loads the page file and so it's very fast.
Also, there is another concept over write-ahead log (WAL). If you doing an update, first it will be updating the data in-memory and marks the page dirty, and then it will persist the data into the write-ahead log. Ignite just append the update into the WAL file. WAL file is very much similar to Cassandra commitlog file, with one difference. Cassandra writes parallel into in-memory and the commitlog file on disk, on the other hand, Ignite update the data into the memory first and then append the data into the WAL. For more information, I recommend you to have a look at the documentation, which is quite exhaustive.
Step 4.
  • Restart the Ignite node, and check the cache testCache with ignitevisor. You will end up with a surprise that no data into the cache.
  • Let's slightly modify our helloworld class and run the application again, comment or delete the following fragments of the code as shown below.
// put some cache elements
for (int i = 1; i <= 100; i++) {
  cache.put(i, Integer.toString(i));
}
Run the application and check the cache testCache through ignitevisor and you application console. 
Whenever any read request occurs, Ignite first check the data into the memory. If the dataset doesn't exist in memory, Ignite immediately load the cache entries from the disk and load into the memory. Also note that, all entries into the memory in offheap. 
Benefits.
With Ignite native persistence, now you can easily do backup for the data recovery, Denis Magda writes a comprehensive article for data recovery by using Ignite native persistence. One thing I have to mention here is the data replication between clusters. By using Ignite native persistence, now you can replicate data from one cluster to another on line. You can use any standard disk based data replication tools to copy the changed data set from the primary data center to the stand-in data center or Ignite cluster.