Recent blog posts

Posted by on in Technology

The impact of technology evolution encompasses advances in sensor technologies, connectivity, analytics and cloud environments that will expand the impact of data on enterprise performance management and pose challenges for system integrations for most companies.

As industries are transitioning from analog to digitalized PLCs and SCADA, they would have to leverage sensor-based data to optimize control and design their assets and processes – both in real time and over time for faster decision making as well as embedding software in traditional industrial equipment.

Developing and deploying these systems securely and reliably represents one of the biggest challenges.

Going far beyond the current definition of networks, the most complicated and powerful network yet is now being built. In it, devices embedded in power lines, waterlines, assembly-lines, household appliances, industrial equipment, and vehicles will increasingly communicate with one another without the need for any human involvement.

The reach of these integration capabilities will go far beyond infrastructure and manufacturing. Today, for example, clinicians diagnose health conditions through a lengthy assessment. But simply matching historical pathological patterns, lifestyle patterns and matching those to live diagnostics collections systems provides for a more accurate diagnostic approach to serious ailments or early-warning signal. To make the most of such opportunities, health-care companies must figure out how to integrate systems far beyond the hospital. Much like in-memory big data analyses, this presents a problem of data collection closer to the source of the data. 

You may wonder collecting and transmitting data from several industrial machines and devices is not a new concept. Since the early 80s, data from industrial assets has been captured, stored, monitored and analysed to help improve key business impacts. In this era of digitization, as the industrial sensors and devices create hybrid data environments, systems integration will propagate more data from more locations, in more formats and from more systems than ever before. Data management and governance challenges that have pervaded operations for decades will now become a pressing reality. Strategies to manage the volume and variety of data, would need to be administered now to harness the opportunity IoT and BigData promises.

Despite of the above stated challenges, some strategies incorporated in core operations can help increase the odds to success:

  • Multiple Protocols

As the number of sensors and devices grow, increase in the number of data acquisition ‘protocols’ are creating a greater need for new ‘interfaces’ for device networking and integration within the existing data ecosystems.

  • Data Variety

As devices and sensors are deployed to fill the existing information gaps and operationalize assets outside the traditional enterprise boundaries, centralizing data management systems must be able to integrate disparate data types in order to create a unified view of operations and align them with the business objectives.

  • New Data Silos

Systems built with a purpose produce data silos that create barriers to using data for multiple purposes, by multiple stakeholders. Without foresight connected devices solutions presents the new silo – undermining the intent to construct architectures that incorporate connected devices to build broader, interactive data ecosystems.

As discussed above, for more than 30 years industries across the globe have been leveraging sensor-based data to gain visibilities into operations, support continuous improvement as well as optimize overall enterprise performance.  As advances in the technology make it cost-effective to deploy connected solutions, industries would need to develop a strategic approach for integrating sensor data with pre-existing data environments. These advancements would traverse towards creating a seamless, extensible data ecosystem with the need for cooperation between multiple vendors, partners and system integrators.

Last modified on
Hits: 221
Rate this blog entry:

In testing, Test Summary report is an important deliverable.  It represents the quality of a product.  As automation testing is mostly carried out in the absence of human, I recommend that test results should be presented in a good way.

Automation test report should be useful to people of all levels like automation experts, manual tester who is not aware of a code, high-level management. 


In an ideal case test automation report should comprise of following:

  • Statistical data like number of test cases passed, failed, skipped
  • Cause of test failure
  • Evidence (like screenshots indicating success/failure conditions)

Additional to above if we have following things in our test report then it will be impressive and useful:

  • Pass and fail percentage of tests
  • Test execution time for individual test case and a test suite
  • Test environment details
  • Representation of statistical data in the form of charts
  • Grouping of test cases as per the type like Functional, Regression etc.

TestNG or JUnit does not provide good reporting capabilities. TestNG default reports are not attractive. So for that we have to develop the customized reports.

I suggest using ExtentReport for automation test reporting will be more effective. This library allows us to accomplish the above mentioned things.

About ExtentReport:

It is an open-source test automation reporting API for Java and .NET  developers. The report is generated in HTML form.

Following are some features of ExtentReport:

  • Easy to use
  • Results are displayed in the form of pie charts
  • Provides passed test case percentage
  • Displays test execution time
  • Environment details can be added in an easy way
  • Screenshots can be attached to the report
  • Test reports can be filtered out based on the test results (Pass/Fail/Skip etc.)
  • Filtering stepwise results like info/pass/fail etc.
  • Categorized report  for Regression/Functional etc. testing
  • Test step logs can be added
  • Can be used with JUnit/TestNG
  • It can be used as a listener for TestNG
  • We can create parallel runs as well. So single report can be created for the parallel runs
  • We can add the configuration to report
  • Results from multiple runs can be combined to single report

Downloading and installation:

Download ExtentReport jar from and add it as a dependency to your java project.



ExtentX is a report server and project-wise test analysis dashboard for ExtentReports.


How ExtentReport works:

To see how ExtentReport exactly works, here is a simple example – One test case will pass and another will fail.


import org.openqa.selenium.WebDriver;

import org.openqa.selenium.firefox.FirefoxDriver;

import org.testng.Assert;

import org.testng.annotations.AfterTest;

import org.testng.annotations.BeforeTest;

import org.testng.annotations.Test;

import com.relevantcodes.extentreports.ExtentReports;

import com.relevantcodes.extentreports.ExtentTest;

import com.relevantcodes.extentreports.LogStatus;


public class ExtentReportTest{

     private WebDriver driver;

     ExtentReports extent;

     ExtentTest test;

     StringBuffer verificationErrors = new StringBuffer();



     public void testSetUp() {

           driver = new FirefoxDriver();

           extent = new ExtentReports(".\\TestAutomationReport.html", true);    //Report initializing

           extent.addSystemInfo("Product Version", "3.0.0")   //System or environment info

                 .addSystemInfo("Author", "Sachin Kadam");




     public void TC1() {

           test = extent

                     .startTest("Test case 1", "Check the google home page title")  //Start test case

                     .assignAuthor("Sachin Kadam")   

                     .assignCategory("Regression", "Functional");

           String appURL = "";


           test.log(LogStatus.INFO, "Navigating to URL : "+appURL);   //Log info

           customVerify(driver.getTitle(), "Google");

           extent.endTest(test);   //End test case





     public void TC2() {

           test = extent

                   .startTest("Test case 2", "Check the wikipedia home page title") //Start test case


                   .assignAuthor("Sachin Kadam");

           String appURL = "";


           test.log(LogStatus.INFO, "Navigating to URL : "+appURL); //Log info

           customVerify(driver.getTitle(), "Google"); //Incorrect expected title to fail test case

           extent.endTest(test);   //End test case




     //custom assertion method for string comparison

     public void customVerify(String actual, String expected){


           Assert.assertEquals(actual, expected);

           //Log pass results

           test.log(LogStatus.PASS, "Expected title:"+expected + " :: Current title:" + actual); 

           }catch(Error e){

                 //Log fail results along with error

                   test.log(LogStatus.FAIL, "Expected title:"+expected + " :: Current title:" + actual +" :: "+ e.toString());                                                 






     public void tearDown(){





    //Method for logging correct results to TestNG report in case of failure

     public void checkForErrors(){ 



           verificationErrors = new StringBuffer();





Finally generated HTML report looks like:







I hope you will find ExtentReport very useful, easy to use, impressive and productive.

For more reference:


- Sachin Kadam


Last modified on
Hits: 565
Rate this blog entry:

Posted by on in Thoughts

Gartner’s 10 strategic predictions for 2017 and beyond, makes me unwillingly delve into imagining what the future holds.

As John leaves work and heads to the building lobby, his car is already waiting for him. Self-driving cars are almost mainstream. He just indicates to his car, “Drive me home”. After arriving home, which is already cooled/heated to his preference, he picks up the freshly brewed pot of coffee to pour himself a cup. As he walks into the living room, he says “Play HBO” and the TV turns on with HBO channel playing. Deeply engrossed in the movie, John is suddenly reminded by his virtual assistant (AWS Echo) reminding him about a dinner party scheduled for later in the evening. He tells his virtual assistant to buy some flowers and a good bottle of wine. Using virtual reality, he is immediately present in the virtual mall and able to hand pick these items. As he does a virtual checkout, these selected items are being delivered by a drone to his home in another half an hour and John is all set for the party.

In some time technology will make all of this a reality. Some of it is already a reality though. Let us now look at the technology underlying all of this. At the fundamental level we have Internet of Everything. All devices are connected to the grid all the time. This allowed John’s car to estimate and share his arrival time with devices at home. This in turn allowed his air conditioner to set the appropriate temperature level and coffee maker to brew his preferred coffee beforehand. Almost all the interactions are voice based rather than some clicks on a screen. Devices with audio input will be trained to be activated only on specific person’s voice (biometric audio-based authentication is implicit). Even the acting of purchasing something is not happening on the mobile application anymore. Most of the shopping will be using virtual reality channel and the experience will be most gratifying. No more running to the local store for last minute errands. Deliveries happen by drone in the most efficient manner possible.

Virtual stores of the future will have no physical stores nor warehouses, instead they will rely on JIT inventory from the suppliers directly. Goods will be shipped from the supplier directly to the consumers based on orders received by the virtual stores. The virtual store will completely change shopping experience for its consumers using virtual reality. It will allow consumers to touch and feel objects prior to purchasing theses. Credit transactions will happen transparently in the background based on bio-metric approval from the consumer. The virtual reality googles will perform an IRIS scan to authenticate the consumer and digitally sign the transaction and approve it. Block chain will be used by merchants to maintain these financial transactions in an authentic, non-repudiate-able fashion.

All devices in the home will be connected and share analytics metrics with manufacturers. For example – the air-conditioning/heating unit will share detailed metrics on performance of the compressor, power consumption trends, etc. with its manufacturer. This allows the manufacturer to leverage this data to perform analytics to predict outages and faults well in advance. This in turn ensures that the service technician (possibly a robot) does a home visit before the device breaks down. Preventive maintenance will help continuity and prevent outages. Consumers alongside businesses will help benefit tremendously from this.

Overall life style and experience will change dramatically. People will leverage fitness bands/trackers and share data with their healthcare provider as well as Health Insurance Company. This will enable the healthcare provider to proactively track health of an individual (again through analytics) to detect issues before these arise. Also, insurance companies will base the premium based on the healthiness level of an individual alongside life style patterns. The latter will include diet / food habits (from your virtual store grocery shopping), exercise regime (fitness tracker), etc.

With everything integrated – security is the key. With IoT devices, it is imperative that security is baked in at multiple levels.





Let us look at these in more detail below:


Device security – The device needs to protect itself from attackers and hackers. This includes (but is not limited) to the following: hardening the device at OS level, securing confidential information on the device (data at rest on the device), firewalling the device, etc.


Authentication – Each entity (device, cloud service, edge node/gateway, etc.) needs to authenticate itself to the corresponding entity. If there are default username/passwords in the device, then it needs to enforce password reset on initial power-on (along with factory reset option). Ideally the device should not use static password for authentication. In our earlier post on OTP – based device authentication for improved security we have discussed a novel approach which helps address the challenges faced by IOT device manufacturers today.


You can read more about OTP – based device authentication for improved security by clicking here.


Network communication channel security – Today there are various communication channels at play, for example – devices communicating with their respective cloud service providers, devices communicating with fog/edge computing services/devices, devices interacting with other devices, etc. It is important that each communication channel is secured and there exists trust between the communicating endpoints. The channel can be secured using TLS as appropriate.


Cloud service security – The cloud service provides the backbone for services provided. The attack vector surface needs to be minimal and hardened / firewalled for DDoS attacks. Data from the devices is collected at the cloud service end and needs to be secured (data at rest). This data need not be visible to the cloud service provider as well (depending on the nature of the data and service provided). Provider needs to ensure that appropriate backup and disaster recovery plans are in place. Also, the provider needs to present their business continuity plan to its subscribers. Cloud Security Alliance (CSA) provides good guidance to cloud service providers.


Privacy – This relates more to data sharing across disparate service providers. With IoT, devices will end-up communicating with devices / services from other providers. How much information can be shared across service providers with user content needs to be carved out explicitly? Service providers will need to incentivize users to allow sharing information with other providers. The user needs to benefit from the sharing eventually to allow it.


To summarize security is a key aspect for success of IoT.


Tagged in: IoT security
Last modified on
Hits: 308
Rate this blog entry:

The recent massive distributed denial of service (DDoS) attack on 21st October 2016 affected numerous cloud service providers (Amazon, Twitter, GitHub, Netflix, etc.). It is interesting to note that this attack leveraged hundreds of thousands of internet connected consumer devices (aka IOT devices) which were infected with malware called Mirai. Who would have suspected that the attackers involved were essentially consumer devices such as cameras and DVRs?

A Chinese electronics component manufacturer (Hangzhou Xiongmai Technology) admitted that its hacked products were behind the attack (reference: ComputerWorld). Our observation is that the security vulnerabilities involving weak default passwords in vendor’s products were partly to blame. These vulnerable devices were first infected with Mirai botnet and subsequently these Mirai infected devices launched an assault to disrupt access to popular websites by flooding Dyn, a DNS service provider, with an overwhelming amount of internet traffic. Mirai botnet is capable of launching multiple types of DDoS attacks, including TCP SYN-flooding, UDP flooding, DNS attack, etc. Dyn mentioned in a statement – “we observed 10s of millions of discrete IP addresses associated with the Mirai botnet that were part of the attack” – such is the sheer volume of the attack by leveraging millions of existing IOT devices out there.

Subsequently Xiongmai shared that it had already patched the flaws in its products in September 2015, which ensures that the customers have to change the default username and password when used for the first time. However, products running older versions of the firmware are still vulnerable.

This attack reveals several fundamental problems with IOT devices in the way things stand today:

  • Default username and passwords
  • Easily hackable customer-chosen easy-to-remember (read as “weak”) passwords
  • Challenges with over-the-air (OTA) updates etc.

The first two problems are age old issues and it is surprising to see these come up with newer technologies involving IOT devices as well. Vendors have still not moved away from these traditional techniques of default username and passwords, nor have customers adopted strong passwords. Probably it is time, we simply accept the latter will not happen and remove the onus from customer having to set strong passwords (it is just not going to happen!).

One-time passwords (OTP) can be quite helpful here. One-time password, as the name suggests, is a password that is valid for only one login session. It is a system generated password which is essentially not vulnerable to replay attacks. There are two relevant standards for OTP – HOTP [HMAC-based One-Time Password] and TOTP [Time-based One-Time Password]. Both standards require a shared secret between the device and authentication system along with a moving factor, which is either counter-based (HOTP) or time-based (TOTP).

GS Lab’s OTP-based device authentication system presents a novel approach which helps address the challenges faced by IOT device manufacturers today. It provides unstructured device registry which is flexible enough to include information on various types of devices and an authentication sub-system which caters to authenticating IOT devices tracked in the device registry via OTP. The authentication sub-system is built on top of existing OTP standards (HOTP and TOTP) and helps alleviate the need for static (presumably weak) passwords in IOT devices. It provides support for MQTT and REST protocols which are quite prevalent in the IOT space. More support for additional protocols (like CoAP, etc.) is already planned and in the works. OTP-based device authentication system is built on top of our open source OTP Manager library.

Here are some of the advantages of using GS Lab’s OTP-based device authentication system:

  • Strong passwords – system generated based on shared secret key
  • Not vulnerable to replay attacks – passwords are for one-time use only
  • Freedom from static user-defined passwords
  • Standards based solution – HOTP and TOTP standards
  • Relevant for resource constrained devices – crypto algorithms used by HOTP and TOTP standards work with devices with limited CPU, memory capabilities.
  • Ability to identify malicious devices – rogue devices can be identified using HOTP counter value
  • Provides device registry for simplified management



Last modified on
Hits: 471
Rate this blog entry:


Customer provides a complete suite of events and video management solutions using cloud server. This server will enable the client devices (mobiles, web) to configure, control and view media from the enabled cloud camera. The server will host a web application, which functions as the intermediary for communication and authentication between the client and the camera.


The GS Lab engagement involved feature development, QA, DevOps and test automation development. The test automation team has developed functional and performance test suites to test the product.

Field Requirement

  • The customer wanted a test framework which can simulate the event/video surveillance scenarios of different end customers.

  • The test framework should test the timely delivery of the audio/video events to the event tracking web portal/mobile app.

  • The test framework should benchmark various internal cloud servers in the audio video surveillance solution/product.

Solution Provided by GS Lab




GS Lab has developed an audio/video surveillance test framework (tool) using Python and Selenium. Following are the major features provided by this framework:

  • The test framework can test 500 video & audio live streams across 500 cameras (1 audio/video stream per camera). This is a customer product limitation for live streaming.

  • The test framework can test the video surveillance controlling app (Android as well as iOS) and web portal (across Chrome, Firefox, Internet Explorer and Safari).

  • The test framework can start and stop the live camera video/audio stream on the fly.

  • The framework can test the operation specific notifications and logs across different servers in the surveillance solution certifying the successful completion of the operation.

  • The test framework supports:

    1. Complete functional, regression & performance testing of all the event/video management scenarios

    2. On the fly addition/deletion of audio/video stream in the surveillance solution

    3. Testing of 24/7 recording of the live streaming to be stored on Amazon S3 cloud storage

    4. Testing the notifications for any (audio / video) camera event

    5. Checking the timely delivery of the events to the portal or mobile apps (Android & iOS)


Value Addition

Following are the major benefits of the test framework for an audio video surveillance product:

  • The real world scenarios of the end customer can be simulated using this framework.

  • It supports Continuous Integration (CI) with all available open source tools.

  • The framework can save 60% bandwidth of the QA team in every production release.

  • The framework can readily be used for the performance testing after the completion of regression testing with minimal changes.

  • It can benchmark different servers involved in the video surveillance solution.

  • The framework helped the development team to identify the performance issues due to crucial parameters (CPU, memory etc.) of the backend servers of the surveillance solution.

Last modified on
Hits: 137
Rate this blog entry:

Posted by on in Technology

On a mundane February afternoon, as I headed for lunch, I remember getting a phone
call from within my company, and with it an opportunity to participate in an IoT
training program! Little did I know that the training sessions were supposed to be
on-line, live, interactive but early in the morning. I'm not a morning person, and
was hesitant a little, but somehow, 'curious me' prevailed over 'hesitant me' and
I subscribed. Having heard quite a bit about Internet of Things (IoT), I wanted to
get a taste of it. And this training program presented that opportunity. It not only
talked about learning, but also about making hands dirty to build something!
    Right after the introductory session, it was clear that we could reap the
benefits in a much better way if participated as a team. So, we formed a team with
developers carrying experience in different areas such as UI, server side, native
applications, hardware devices, etc. Then on-wards, we embarked on a journey in a
quest to learn what it means & takes to build an IoT project using an IoT platform.
What follows here is an account of our experiences.

Learning an IoT platform
This was as good as it could get. We got to learn an IoT platform, an Atomic domain
language (TQL that is), ways to integrate with hardware devices, sensors, actuators.
There were well organized set of sessions, which took us on a tour of the platform
and how to use it. The course contained advanced features like clustering, macros
which made it even more 'pragmatic'.

Hands-on is the key, and you get to do plenty of it
One of the best part of this program is : you get to do hands on. In fact, you are
kinda forced to make your hands dirty. I think it's not w/o a reason that the philosophy
of 'learning by doing' exists! We played a lot with raspberry pi, arduino uno, sensors,
actuators and of course TQL system itself. This rendezvous did present us with it's
fare share of issues, but it was all worth.

Technically enriching discussions
One of the reasons for me to subscribe to this training program was to hear about the
IoT platform, directly from the creators of it. It is a big deal!
This was evident from the interactions which we or the community used to have
during as well after the sessions. e.g. Why a particular feature is implemented
in a certain way, why are certain things restricted on the platform, etc. This helped
participants, especially those who were developers/architects, learn about what goes
into making of an IoT platform.

Vibrant support forum
When you open the slack web app for TQL team, you get a random but nice message
to start with. One of the Slack messages that struck the chord with me instantly
was : We're all in this together. This message sums up the kind of support the
Atomiton folks are committed to providing. The questions are answered to depth
with minute details, with the reason explained as well as available alternative/work-around.

Mutually rewarding community
As the participants are required to build projects, they naturally get to showcase it
to the community. This helps everyone understand how the platform can be put to use
to solve real-life problems, how others in the community are using it in an innovative
and creative way, and in much larger context, what IoT is all about.

When you are doing something over and above your regular work, you need high
levels of commitment. And you also need a great deal of motivation!
There was enough of it, at right times, to keep us going. And it rightly came
with tips & suggestions for improvement.

Improvement areas : What can possibly be done to make this even better?

Developer is king!
Developer is the king, and he needs to be pampered. ;) More the developer-friendly
features in the TQL studio, the better it is. Hover-for-help-msg, auto-completion,
templates-at-fingertips (for queries, macros, usage of javascript, in-line-comments)
are some of our suggestions to enhance the TQL studio experience.

Auto-generation of basic queries from models
This will save some work for the developer. Also, it will serve as a guide for
writing custom/complex queries. I would go a step further, and suggest auto-generation
of code for UI : to access data over web-sockets as well as over http.

Highlight security aspects
Make this a must in the training program. Let this be a differentiator.
Following are the aspects which are worth giving a thought :

    • Can h/w devices be given fingerprints (unique identities)?
    • If a web app is being served using app-attachment feature, then how to expose it over https?
    • How to invoke an external service over https?
    • Security in built-in protocol handlers

Hardware bottlenecks

One of the observations our team made after the completion of the final project was :
Working with 'things' is not the same as working with pure software!
We then thought, what would make working with 'things' easier? We realized,
it would be knowledge of setting this h/w up, knowledge of integrating with it,
would make working with it easier. Suggestion here is to make it a child's play.
Crowd-sourcing could well be utilized here. Making this easy and simple would make
participants focus more on the project and utilizing TQL System's features in full glory.
Items to focus here :
Raspberry pi - n/w connectivity, mainly, a list of FAQs with respect
to n/w connectivity, especially, what are the many different ways to do it.
Basic sensors and their connections with Arduino Uno and/or raspberry pi.

A step further, it would be great to share notes on comparison of
off the shelf hardware Vs. specialized high-end hardware. e.g. Raspberry Vs Libelium.
Can Raspberry be used in production environment?

Session prerequisites
It would help if the prerequisites are mentioned for each of the sessions, and the
content is also made available for these prerequisites.
For ex. right from the first session, the participants need to have an understanding
of raspberry pi & Arduino Uno. If they have already gone through it, then the first
session becomes a hello-world purely to TQL system rather than a hello-world to all
of h/w devices and then TQL system.


Tagged in: IoT TQL
Last modified on
Hits: 336
Rate this blog entry:

Posted by on in Technology

Trove Overview

OpenStack Trove is a DBaaS (Database as a service) solution. It offers IT organizations the ability to operate a complete DBaaS platform within the enterprise. IT organizations can offer a rich variety of databases to their internal customers with the same ease of use that Amazon offers with its AWS cloud and the RDS product. Openstack trove supports both RDBMS as well as NoSQL databases.


Database as a service on cloud intends to reduce complex and repetitive administrative tasks related to database management and maintenance. Tasks involve operations such as - Database instance deployments, database creation, configuration, periodic backups of database instances, patching. It also involves continuous health monitoring for database instances.

Trove Architecture


API’s for Trove

APIs are exposed to manage following service constructs -

  • Database instances
  • Database instance actions
  • User management
  • Databases
  • Flavors
  • Data stores
  • Configuration groups
  • Clusters
  • Backups

Openstack4j Popularity

OpenStack4j is an open source Openstack client which allows provisioning and control of an OpenStack system. This library has gained quite popularity in the open source/java community for the simple reason that it has the most fluent API’s available to interact with Openstack.

It also lists in the Openstack official wiki as the Java library to interact with Openstack

Support for Trove in Openstack4j

Openstack4j being popular and most preferred Java library has immense requirement for having trove API support. With its simplistic fluent API and intelligent error handling the experience of interacting Openstack has been made easy.

Example code snippets to interact with trove:

Create Database Instance:


Create Database:


Create Database User:


 For more document visit -


Contributors: Shital Patil - & Sumit Gandhi -


Last modified on
Hits: 598
Rate this blog entry:

Posted by on in Technology


Pre Computers Era

This can be termed as ‘pen and paper’ era. It witnessed the building of the foundation.  The concept of numbers became concrete. The zero was invented by Brahmagupta or Aryabhata depending on which way you look at it. The number systems evolved. The earliest known tool used in computation was the Abacus and it is thought to have been invented in 2400 BC.


A number of devices based on mechanical principles were invented to help in computing leading to even analog computers. The computational theories also evolved with the advent of logarithms etc.

Computers Era

The concept of using digital electronics for computing leading to modern computers is recorded around 1931. Alan Turing modelled computation to lead to the well-known Turing Machine. The ENIAC was the first electronic general purpose computer, announced to the public in 1946.


Since then the computers have come a long way. There are super computers.


There are a variety of devices like mainframes, servers, desktops, laptops, mobiles etc. There are specialized hardware like gateways, routers, switches etc. for networking


These enabled the culmination into internet and the World Wide Web as we know it. Storage arrays for all the storage related capabilities including snapshots, backups, archival etc. There are Application Specific Integrated Circuits (ASIC)


so on and so forth.

Software Defined Era

Soon enough this hardware started getting driven by software. The software started getting more and more sophisticated. It evolved over paradigms like multi-tier architecture, loosely couple system, off-host processing etc. There was advent in the area of virtualization


A lot of concepts in computing could be abstracted easily at various levels. This enabled a lot of use cases. E.g. routing-logic moved to software, and hence networks could be reconfigured on the fly enabling migration of servers / devices on response to user / application requirements. The tiered storage can be exposed as a single block store as well as file system store at the same time. It gives capability of laying out the data efficiently in the backend without compromising the ease of its management effectively from a variety of applications.

The cloud started making everything available everywhere for everyone. The concepts like Software Defined Networking (SDN)


Software Defined Storage (SDS)


leading to Software Defined Everything (yes, some people have started coining such a term that you will start seeing widely soon enough). Hardware is getting commoditized. There is specialized software on the rise addressing the needs.

Beyond Software

It is still not clear what will replace software. However some trends and key players have already started to emerge in this direction. There can be a number of components like open source readily available as building blocks. One might have to just put them together for solving the variety of problems without writing much code.  Computing has moved away from “computing devices” into general-purpose common devices like watches, clothing, cars, speakers, even toasters etc. Every device is becoming intelligent. The hardware ecosystem is more or less commoditized already, but software is also along the same path. Witness the proliferation of Openstack


or IoT platforms for example. One might have to simply configure them to address the needs. E.g. Openstack cinder can be configured to clone volumes for creating test-dev environments efficiently. IoT can make a production plant efficient in real time by continuous monitoring, re configuration and management of its resources. It could be Docker containers that one has to only deploy for plug and play to have complete solutions in play. The hand writing recognition, voice commanded devices can lead to complete working solution on a matter of thought! The machine learning can provide already fully functional machines like smart cars etc.

Who knows, a day might come when without doing anything, everything will be achieved even through thin air so to speak! At this time it might sound like a wild stretch of imagination but just quickly reflect over the evolution of computing so far. It might take a really long time to get there. In fact, it might be time for no one making such posts but just a matter of making some Google searches, looking around with open eyes, feeling it with all the senses for everyone to have already grasped the gist of the message!

Last modified on
Hits: 493
Rate this blog entry:

Posted by on in Technology

The authors for this blog are Abdul Waheed and Paresh Borkar. 

Many organizations today still struggle with providing strong authentication for their web-based applications. Most organizations continue to rely solely on passwords for user authentication, which tend to be weak (to be easy to memorize), shared across systems, etc. Though there have been strides towards strong authentication mechanism like 2FA, adoptance has been low.

It gives me immense pleasure to announce that GS Lab is open sourcing its OTP Library asset. Abdul Waheed from GS Lab was instrumental in developing this asset, which is a standards based library that enables organizations to adopt One Time Password (OTP) based Two Factor Authentication (2FA) for Java/J2EE business critical applications, leading to improved security posture. It supports HMAC-based One Time Password (HOTP) and Time-based One Time Password (TOTP) standards and works with the free, off-the-shelf Google Authenticator mobile app to provide a friendly user experience.



  • Java/J2EE based library - used on server side
  • Standards based support (HOTP and TOTP)
  • Supported client - Google Authenticator
  • Ability to generate QRCode (to be scanned by Google Authenticator)
  • Integration with server is simple, straightforward requires minimal effort
  • Support for security features like throttling, look ahead, encryption, etc.

OTP Library

Key Benefits

  • Add 2FA to existing Java/J2EE server applications
  • Standards compliant (HOTP and TOTP standards support)
  • Minimum integration overhead
  • Small footprint
  • Leverage existing free off-the-shell Google Authenticator Mobile app
  • Already adopted by market leaders like AWS for 2FA needs.
  • User friendly experience using QRCode
  • No costs associated with SMS/Text messaging and no related software requirements.

It is open source and can be easily downloaded from GitHub. Thank you Abdul for your contributions in making this happen!

Last modified on
Hits: 995
Rate this blog entry:

Posted by on in Technology

This project was started with a thought of having an easy automation tool to interact with Openstack. Considering the challenges one has with existing Openstack CLI, this tool offers a very good starting point in overcoming those challenges


Unlike the existing Openstack CLI, this tool does not require any pre-requisite software to be installed. Openstack4j CLI is completely written in Java and consumes the API from openstack4j library; to run, it just needs JRE 6+ installed which in most operating systems is by default available. It comprises of a single executable jar that is portable on any Java or OS platform.


It’s an all in one solution - Single client for all the Openstack services. Openstack4j CLI is all in one, it bundles all primary Openstack service clients into one; mainly glance, nova, neutron, cinder etc.

Easy to use

Fluent CLI's, easy to use and understand CLI commands to do precisely what is needed. With Fluent and easy to understand commands, it takes care of dependent resource creation for the particular cases where resources from other Openstack services (neutron, cinder) are needed that encourages automation and abstracts out unnecessary complexity from user so that he can focus on intent of operation.


Inbuilt memory feature that remembers the output of the command. Openstack4j CLI comes with an inbuilt memory feature, that saves all the resource Ids generated from previously executed command and automatically replaces the values in subsequent command as and when needed.

More Info:

Last modified on
Hits: 878
Rate this blog entry:

Group-Based Policy Overview

With the popularity of Openstack and growing community-based initiative in more than hundred countries; there is a major community-based initiative by thousands of contributors. It's time to focus on real challenges that involve deployment and delivery of applications and services with flexibility, security, speed and scale rather than just orchestration of infrastructure components. In order to achieve this there is a need for a declarative policy engine. One such project is Group Based policy.

The advantage we get in using Group-Based Policy (GBP) framework is the abstraction that it has which reduces the complexity for any developer to configure network, security for its infrastructure. More over these abstractions are general enough to apply to computing and storage resources as well.

The different sets of components that form GBP are elaborated in the figure below.



Openstack4j Popularity

OpenStack4j is an open source Openstack client which allows provisioning and control of an OpenStack system. This library has gain quite popularity in the open source/java community for the simple reason that it has the most fluent API’s available to interact with Openstack.

It also lists in the Openstack official wiki as the java library to interact with Openstack

Support for GBP in Openstack4j

As openstack4j was the most widely used library amongst the developer community it was a good idea to have the support for GBP as well. With its simplistic fluent API and intelligent error handling the experience of interacting Openstack has been made its easy.

Example code to interact with GBP:

Policy Actions

PolicyAction policyAction = Builders.policyAction()







Last modified on
Hits: 878
Rate this blog entry:

Posted by on in Technology


There has been much discussion around various authentication methods, which range from username-password to leveraging OTPs, hardware tokens or biometrics, to client certificates etc. Each of these methods provide varying level of confidence in the overall authentication process. This makes one wonder which authentication method is best for a particular organization’s needs. The fundamental question is - is there is any one ‘silver bullet’ authentication method? The answer is ‘no’. You may need to decide which one to use depending on the environment and context.

Understanding the need

As an example – let’s compare an employee who is logged on to your corporate intranet (probably using AD domain authentication), requesting access to an intranet application, with someone from outside. In the latter case, you would want to request for stronger authentication to ascertain the identity of the person. Here you may choose to ask for OTP in the authentication process as an additional factor. This is a good example of leveraging context to determine the type of authentication required.

Let us consider another scenario where someone is trying to access a privileged application outside of business hours or from an unknown IP address. In such a case, again you would want to request stronger authentication depending on the nature of the privileged application.

Understanding the authentication context

Context is essentially the surrounding detail about the environment, which can be determined passively (i.e. without need for user intervention). Some typical examples of context include:

  • Location context - Using geo-location to determine where the user is logging in from.
  • Known machine - Has the user logged in using this machine before? This is typically done by computing something known as a device fingerprint and tracking it.
  • Time of the day - Is the user logging in at an odd time of the day or night, which does not match with the users' typical login patterns?
  • IP address – Has the user logged in from the same IP address before?

If we look at the above pieces of information which form the context, then we realize that leveraging context-aware authentication essentially means ‘compare the current context with what is considered normal for that user’. Thus, we have to first establish what can be considered normal behaviour for any given user. This is where analytics come in to play. Using intelligent analytics, we can identify typical normal patterns for users and this system keeps on learning newer patterns or registers outliers. Based on these learnings, it can request for step-up authentication whenever required.

How does this work?

The solution closely follows and tracks user activity to determine normal patterns (using analytics). For every new authentication attempt, the system compares the authentication context with what is considered normal for given user. It identifies the variance from the normal level, and translates that variance to a risk score. Depending on the risk score identifies, it determines the need for step-up authentication along with the type of step-up required.

For example – a user’s typical pattern is to login from North America during business hours. Now this user tries to login from Asia Pacific region from a known machine, then she/he will be prompted for OTP as well. If this user tried to login from Asia Pacific region from an unknown machine, then she/he could be prompted for biometric authentication as well.

How does this help?

The end user is not prompted for strong authentication unless there is an explicit need for it. This helps provide a better user experience while doing the delicate balancing act of providing strong authentication whenever required. Best of both worlds!

Last modified on
Hits: 1158
Rate this blog entry:

In my previous blog post, I explained the general concept of Streaming Analytics and the kind of problems it can solve. In this post, I would like to discuss how traditional Big Data analytics and Streaming Analytics are different, and why Streaming Analytics is becoming a very crucial component for modern applications (even before the application data reaches a stage of qualifying for conventional Big Data analysis)

Near Real-time or Real-time? This particular aspect, in conjunction with your business needs, will influence your decision of whether to go for Streaming Analytics or a Big Data (Hadoop based) analytics or a combination of both.

As a classic example of Big Data - let’s consider the example of retailer giant Amazon (most common and easy to understand). You may have noticed - when you shop at, based on your shopping history and the items you are currently searching for – the portal offers suggestions, such as, ‘products you may be interested in’, ‘customers who bought this also bought’, ‘frequently bought products’ etc.

With hundreds of thousands of people shopping throughout the day on, have you ever wondered how much data the system could be processing? It is mind-blowingly large and qualifies as a ‘Big Data’ problem, which is typically solved using Hadoop like systems. The Big Data analysis is continuously churning out or modifying existing analytical models and algorithms. These are then applied to the data to come up with contextual suggestions, or to re-target frequent customers with discounted products, or offer discounts based on user history, or the holiday season that may be on. These models incorporate feedback on forums and social media (Facebook, Twitter), impact on sales of types of products (due to factors such as season, holidays, geography, age etc.) and set new prices for next shoppers etc. Using such continuous analysis, Amazon manages massive inventory and supply chain ensuring optimum distribution of their inventory.

Although consumers are provided with immediate suggestions and context based results, is this achieved by Streaming Analytics? Not really. The example above talks of ‘Variety’ and ‘Volume’ but the ‘Velocity’ aspect is not really significant here. And even if data was pouring in at high volume - it is first settled in HDFS and then MapReduce based computation is applied to carry out a very ‘structured’ (meaning know exactly 'how' the data is to be analyzed), and more importantly carry out a ‘batch’ analysis. The analysis results are useful only after the entire batch is processed. And the time it takes is long enough for a Streaming Analytics system to qualify it as ‘too late’ to be able to make any ‘real time’ decisions.

Analytics of Data in Motion or ‘Streaming Analytics’ really deals with:

  • Very high velocity and high volume of data - at a minimum of few hundreds of thousands of events per second
  • Immediate analysis or predictions of favourable or unfavourable events that you want to detect early - before the data settles into the disk or HDFS for a more structured and known analysis
  • The analysis could be ad-hoc as against a structured MapReduce based computation
  • Not needing to depend on batches to finish being processed. You want to act as soon as the "first" indications of a possible problems are visible. True "Real Time"

E.g. A sensor detecting a spike in the voltage in a manufacturing plant may raise an event which would trigger shutting down the affected production line to prevent possible damage. Another example: Thousands of devices and sensors (switches, routers, VMs and their associated devices like ports, cards etc.) in an infrastructure setup may generate hundreds of thousands of signals each second collectively. You want get rid of false positives and detect a sequence of events that can lead to a failure (e.g. – if the fan of a server goes down, you could predict the chain of events that may follow - heating up of the CPU, diminishing CPU utilization, longer response time leading to the VMs going down). So as soon as the fan goes down, you want to make your setup pay special attention to the performance of the VMs running on that machine or redirect the requests to other VMs, till the fan comes up.

But didn’t we say earlier that for Streaming Analytics, time and size both are very crucial? The very fact that hundreds of thousands, or even millions of events are to be processed per second - gives the problem the ‘Big Data’ character. But the purpose is different - to take decisions when the data is flowing - without having to store it in HDFS or without waiting for the batch to complete processing. Depending on what problem you are solving, you may even not need to persist the events in HDFS or DB/ NoSQL at all. You may take a decision, do a course correction in matter of few milliseconds and forget about those very events that enabled you take a decision.

Talking of scale, LinkedIn is a great example. Look at these statistics related to LinkedIn user activity over the last 4 years:

  • 2011: 1 billion messages per day
  • 2012: 20 billion messages per day
  • 2013: 200 billion messages per day
  • 2015: 1.1 trillion messages per day

1.1 trillion messages per day is a whopping 4.5 million messages per second. The processing at such a volume is done to ensure ‘real-time’ insights into operational efficiency of the IT infrastructure of LinkedIn. It is an excellent example of why conventional Big Data (Hadoop like systems) won’t work. Data has to be processed the ‘Streaming Analytics’ way - on hundreds of nodes in parallel, blending, filtering, enriching and analyzing data, rejecting unwanted data and passing on the filtered/ enriched data for further analysis or to a data store as ‘analysed result’.

On the other hand, if you look at the "people you may know" feature of LinkedIn - that is a result of Hadoop based Big-Data analysis - where the analysis built on existing data across millions of users and their connections, history etc., may already be in place, continuously updated and is only presented to the user when he logs in.

Today's applications are handling data which is many orders of magnitude larger than the data they used to handle a few years ago - look at the LinkedIn example above. It’s time to make applications ready for Streaming Analytics - the new way of data processing. Streaming, processing and analyzing of extremely high volume and high velocity data in parallel across multiple nodes to derive actionable insights that will control the behaviour of the application is going to be a key architectural consideration in applications of today and tomorrow. Application owners and architects need to gear up to embrace Streaming Analytics as a key building block in their application ecosystem.

At GS Lab we have developed a Streaming Analytics platform that provides a complete solution (data ingestion to processing to analysis to visualization), and is designed to seamlessly integrate into the applications ecosystem as a data processing and analyzing engine that is highly scalable and customizable for applications in any domain. Converting insights from high volume and high speed data into high operational efficiency, conversion to business opportunities and increased ROI should no longer be the privilege of a few. GS Lab's Streaming Analytics platform enables enterprises - small, medium and large - to take advantage of their data to achieve high efficiency, optimization, high ROI, and savings.

Last modified on
Hits: 1193
Rate this blog entry:

Streaming Analytics is generating plenty of buzz these days. We’ve already discussed the concept in this previous blog post by Mandar Garge, Stream It On. Streaming Analytics can be broadly defined as the analysis of data as it is generated or moving in your application ecosystem.

Until about 5 years ago, the compelling need for Streaming Analytics was not felt or the technology that makes it possible was not viable and affordable. Today, ‘out-of-the-box’ Streaming Analytics solutions are no longer in favour with the industry; rather, there is a preference for customized solutions. Each organization has a uniquely different problem when it comes to handling the deluge of data. Additionally open source tools and technologies that have been proven and tested over time for solving high volume data problems are now available.

Let’s discuss what is takes to create customizable Streaming Analytics solutions. I will share the typical considerations and component architecture while building a Data Pipeline to perform Streaming Analytics at the required scale.

The What and Why

Analysing streaming data is necessary to make real-time decisions based on the insights from it. Let’s try breaking this up.


While the definition of the time-window for Real time decisions varies per problem, in most cases Streaming analytics aims at analyzing data as it being generated. Depending on the business impact and the latencies between data generation and decision making the definition of real-time it could vary from a few minutes to a few milliseconds.


In most cases the decisions as a result of Streaming Analytics are predictive in nature. Some quick examples of these decisions are:

  • Fraud detection: Detecting a fraudulent or malicious financial transaction within a few milliseconds of it occurring. Preventive actions such as blocking the concerned transaction and alerting the parties concerned also become possible.
  • Anomaly detection: Sending out warnings or alerts to the IT staff when critical components in the enterprise IT infrastructure start to show signs of malfunction.

Depending on the use case, the preventive or remedial actions resulting from Streaming Analytics could be either manual or automated.


Conceptual model

Streaming Analytics is typically needed when it is a non-trivial effort to analyse data within a meaningful time-window using traditional tools (like an ETL or even Excel). Streaming Analytics usually applies to situations where:

  • Data that is being generated is high-volume, typically starting at orders of a few billion data points a day
  • Decisions are taken based on the data analysis in real time

The combination of software systems used to analyze streaming data is typically referred to as a Data-Pipe (or Pipeline).

Designing Data Pipelines

Some of the important points to consider when designing data-pipes are:

  • Speed/ Throughput

    • How quickly can the system process data coming to it?
    • How many messages/second or bytes/second can the data-pipe process?

When large volumes of data are to be processed for decision making it becomes really critical to process it quickly. Delayed analysis has lesser and sometimes no value at all. A well designed data-pipeline has a high throughput.

  • Latency

Analysis is number crunching and will invariably consume CPU cycles and time. Any pipeline should not throttle the flow of data significantly. This again would lead to delayed analysis. Also it is important to ensure that the components of the system do not introduce any bottlenecks that snowball latency over time. Latency should always be minimal and constant.

  • Flexibility

A data pipe, as we will see later, is a heterogeneous structure with each component having a specific function. For example there would be:

    • A data aggregator,
    • A queue,
    • An engine to perform streaming functions on the data,
    • An index or database where it would be stored,
    • A view component to visualize data

Given the pace at which Streaming Analytics as a field is gaining momentum, it is important to keep components loosely coupled. This is so that there is freedom to choose the best component for a particular function.

  • Scalability

Since the volume of processing in such data pipes is very high, often these systems reach the limits of the hardware that they are deployed on. It should be easy for such systems to horizontally scale by adding more nodes to the cluster.

  • Fault Tolerance

Fault tolerance is closely related to scalability. Nodes on a cluster should be able to auto-recover in case one of them goes down and the data pipeline should not lose any data


Let's look at some typical components that constitute a typical data pipeline.

Data Collection

Data sources


Data sources are diverse and their nature depends on the domain of the problem. There could be multiple sources of data that feed to a single data analysis pipeline. A data pipeline would need to have connectors to various systems that it feeds from. For example, SNMP, Log files, RSS feeds or other custom connectors.


Input Queue


An input queue unifies the data collected from various sources. This helps to perform real time analysis on data being generated by various systems and correlate them to get valuable insights.



Stream Processing Engine

Stream processing

The Stream processing engine is the heart of any data pipeline. This component allows analysis of data in a snapshot as well as data within time windows/ moving windows. A rule configuration mechanism coupled with this allows the system to be setup to analyze data based on custom rules. This component would most likely use an in-memory or distributed cache to maintain state across messages.



Data Store


The analyzed data should be routed onto a data store so that it can be read for charts and other representations. The system could decide to store only a curated subset of the total events (messages) that occurred.
In order to be able to serve downstream visualization, the data store should support high speed writes and fast queries. The choice of data store type would be driven by the complexity or variation of data, indexing capabilities and possibly need for features like aggregations.



Visualization final

Finally there needs to be a visualization component which allows you to visualize the data in predefined ways and also allows easy customization. 






Overall Component Architecture

Overall architecture final

As we have seen the overall component architecture involves stitching these components together so that the functionality, performance and reliability required are achieved.

There are various other aspects like DevOps, runtime configurability, etc. that need to be considered while building such systems that are beyond the scope of this article.

Watch this space

The space of Streaming Analytics is still nascent and evolving. The problems are different and unique. The solution therefore needs to be flexible enough to solve any problem that deals with analyzing data at real time. Businesses are discovering that problems and business cases can be better solved with a customizable Streaming Analytics solution based on open source tools rather than rigid and expensive analytics solutions.

At GS Lab we have developed a Streaming Analytics platform that allows businesses to configure and customize the platform to suit their business domain. The platform can be customized by building data pipelines with tools and technologies that best suit the problem to be solved. Additionally customization can be built on top of this data pipeline within days and not weeks or months.

Also at GS Lab we have also been leveraging tool-chains like below to build custom solutions for customers:

  • Apache Kafka, Apache Storm, Elasticsearch, Kibana/custom Dashboards
  • Apache Kafka, Apache Flink, Cassandra
  • Apache Kafka, Apache Spark, Elasticsearch, Custom Dashboards
  • Logstash, Elasticsearch, Kibana

Watch this space for solutions that GS Lab is has to offer and the specifics of Streaming Analytics solutions in more depth.

Last modified on
Hits: 1693
Rate this blog entry:

I have always been fascinated by the way in which real-time streaming technology has evolved. Today this technology can be used to deliver multimedia content simultaneously to participants of a network-based communication. Multimedia content may include audio, video, graphics, animation, images, text, etc. To be effective, streaming multimedia is presented in a continuous fashion, and excessive delays or missing content can be detected by participants. Often, buffering techniques are used to enable a consistent presentation of content, given an inconsistent transmission and receipt of content.

This transmission of multimedia content, which includes audio and video, in real-time to multiple recipients may be referred to as audio-video conferencing. Audio-video conferencing offers a number of advantages such as real-time communication capability between multiple participants, without the delay, cost, scheduling, and travel time of face-to-face meetings. Audio-video conferencing may make use of the Internet and associated Internet protocols to deliver content to the various participants. This greatly extends the connection capability of audio-video conferencing to a worldwide range.

One challenge which I have personally witnessed is that the quality of service in transmitting real-time streaming data over the Internet cannot be guaranteed, and disruptions may be experienced frequently. Disruptions really play a spoilsport in an important meeting where people just get dropped from the video conference.

In some cases the disruption may be of a short duration, but many participants of audio-video conferencing have had frustrating experiences in which the real-time streaming of data failed and the conference was abruptly terminated.

As we all agree that ‘necessity is the mother of invention’, I started my research along with colleagues to try and understand what existing solutions are available to overcome this annoying disruption.

The existing solutions for audio - video conference failover were designed around having a secondary MCU (Multipoint control unit) in case of network failure with the primary MCU. The MCU is a server component which is usually costly hardware requiring a lot of configuration and bandwidth allocation. Here are some typical examples below:

We felt that there was a need for a solution that would be simple and built with something that we already have. There was a need to design an innovative system and process to join the dots.

The solution we proposed is to utilize the existing resources within the conference, which are generally the client endpoints instead of high end MCUs. 
In this new client based MCU selection -

  • The client end points are always available so the system can proactively nominate one of the client endpoints which has conference hosting capability .
  • The conference hosting capability can be judged based on the hardware capability and the network in which the endpoint is located.
  • It would be preferable to have an endpoint which is the moderator of the conference, since the moderator usually stays in the call for the entire duration.
  • Even in the case the moderator leaves the call, a new client endpoint in the conference will be nominated as a secondary MCU.

The solution devised by us offers a number of powerful benefits:

  • Low cost
  • High efficiency
  • Ease of implementation

[Editor's Note: This blog post describes Sagar's contribution to the patent 'Maintaining Audio Video Continuity' while he was working with his previous employer.]

Last modified on
Hits: 1620
Rate this blog entry:

Posted by on in Technology

If you ever thought that the data in your application's ecosystem had valuable insights, you were right and chances are that you are already leveraging those. But if you thought the value was only in the archived data ('data at rest'), think again. The 'moving' data has equal significance, and often, more business value and insights hidden in it.
Streaming analysis, as the term suggests, refers to analyzing the data right upfront - when data is 'streaming' or 'moving' in your application ecosystem. It’s not a rear-view-mirror analysis, but rather a front-view one, allowing you to steer your business, based on what you see happening in real time (often keeping in mind the rear-view). This needs a shift in mentality - from ‘batch processing’ to ‘stream processing’.
Insights into moving (real-time) data are very often known as 'perishable insights' - those that are emerging from urgent business situations and events that can be detected and acted on at a moment's notice. They are perishable - if you do not act on them immediately, they lose their value, and you potentially lose a business opportunity (which you may not have realized existed). These simple or complex events can uncover hidden risks, as well as untapped business opportunities only if you act immediately.

Streaming analytics, most of the times has to do with the streaming of massive volumes of data. It has be a system capable of handling ‘big + fast’ data rather than ‘big’ data alone. It has to be:

  • A high performance system, processing data at extremely high velocity and volume. It should adjust, transform and normalize data extracted from numerous sources with a variety of formats. It should be capable of processing tens of thousands of events per second.
  • Able to offer rapid integration with different data sources (system data, application data, market data, social networks, transactions, mobile devices, IoT devices, sensors, images and files).
  • Easily scalable and fault-tolerant.
  • Capable of analyzing the data for warnings, alerts, signals and patterns - all in real time.
  • Offer itself as a platform with application development capabilities and development tools - from proprietary SDKs to those based on open-source frameworks (Spark, Storm, etc.), technologies (Java, Python, Scala) and tools (NoSQL databases, Relational databases etc.).

The peculiarity of such a platform is heavy ‘in-memory’ analytics, as analyzing the streamed data after it has been temporarily written to a disk is a ‘too late analysis’ and the data stream may lose steam.
Streaming analytics system has to have following capabilities::

  • Rich visualization and monitoring
  • Intelligence to detect urgent, problematic or opportune events
  • Automatic course correction/ real-time responsiveness - as simple as alerting or messaging the stake holders or launching a complex business workflow

Furthermore, intelligence can be built in to predict what might happen in future, based on what is contained in the moving data and co-relating it with what was found in the previous month's or year's, or may be even previous 20 years' data - all in real time.

Let’s have a look at some examples of how real-time analysis can make a difference:

  1. Fraud Detection: Periodic processing of a credit card’s transactions would reveal a fraud much after it has happened. Real-time analytics enables detection of a fraud while the transaction is in process, allowing the system to automatically stop the suspicious transaction before it’s too late.
  2. Supply Chain: Manufacturers who analyze weekly or monthly reports of production orders, to make adjustments to their production, can greatly benefit from streaming data analysis (of how their product is being sold off the counter) to make immediate adjustments in the processes to yield optimum output or avoid creating an overstock.
  3. Online Retail: Retailers can process disparate data streams (ERP, CRM etc.), and historical purchasing data/ shopping patterns of a customer, to offer more contextual offers which may lead to more sell of their products than the buyer originally intended to buy. Have you ever seen a “people who bought this also bought these items” message? Real time analysis! Retailers have identified this huge business potential just because they made use of this very short “right-time” window to make contextual offers.
  4. Ad Exchange and Ad serving: Have you ever noticed that when you visit your favorite news website, you see the advertisement of a product you searched on the internet a couple of days back? Chances are that this ad was only served to you and some others, based on search history. To earn this spot on the webpage, thousands of ads have undergone a bidding process in blink of an eye – thanks to real time analysis!
  5. Real time stock price prediction: Analysis is done by integrating huge data from firms like NYSE, twitter messages (from relevant financial community) about the stock, public sentiment about the stock, and correlating it with the results of complex statistical modeling algorithms on historical data of the stock.
  6. Global sports betting market: 80% of betting takes place after the event has started. Prediction of pricing movements using only batch processing on historical data before the start of the sports event is hardly of any use.

The point is that all of the examples above have been around for a while, performing real time analysis, albeit by using home grown algorithms or existing commercial systems. Streaming Analytics aims to provide a platform where more and more organizations can do this by building a thin layer on top of this platform.

Digital data in tech as well as non-tech organizations is almost doubling every year. The IoT’ization will only increase this volume multifold. Many do not know what exactly is to be done with this enormous amount of digital data, or are still gearing up to use conventional analytics effectively. But those who take the 'streaming analytics' path will leap ahead and gain tremendously from the descriptive, predictive and prescriptive analysis – all in real time.
Customized solutions created by building on top of, or integrating open-source components such as Spark or Storm, Kafka or RabbitMQ, Elasticsearch or Solr, Cassandra or MongoDb or Aerospike, customized ETL processes, BI or visualization tools – will be the most sought after solutions and will give the commercial streaming analytics biggies a run for their money in time to come.

Streaming analytics is not a replacement for conventional analytics. It complements and supplements existing techniques to make analytics more intuitive and valuable to organizations. If conventional analytics on massive volumes of data let your organization make intelligent business decisions quicker, then streaming analytics is the equivalent of firing the booster engine of a rocket and flinging it into a higher orbit. Why wait till the end of the week for the data or month to derive hidden business value when you could get it within minutes, seconds or even milliseconds? Just Stream it on and zoom ahead.

Last modified on
Hits: 1907
Rate this blog entry:

Logging – for a large part of the history of computer software, creators of software products have had a love-hate (but mostly just indifferent) relationship with logging. They love logs when products run into problems – which is more often than most people would imagine; they hate logs because software doesn’t do anything automatically and they have to tell it to log its every whim and fancy; and when software runs the way it’s expected to, well, then nobody really cares what’s in the logs. It’s like what parents are to a teen, love ‘em when you need something, hate ‘em when they make you explain where you were at night and don’t really give too much thought when they’re not doing either.

The situation is not helped by the fact that logging sits on the intersection of two very important concerns of software businesses and developers – how to use the least amount of computing resources and how to take the least programming effort possible to achieve what you want to, respectively. And guess what, logging requires resources and you have to write actual lines of text by hand to ensure it’s done; clearly not a tenable position. And so, best case scenario, it gets treated as a necessary evil; worst case scenario, it gets cut down mercilessly.

Or, at least, that was the case until very recently.

Over the last few years, logging has come out of the shadows and it’s now actually fashionable to talk about ‘log management’, ‘log analytics’, etc. A number of factors have caused this shift and many of them are the same things that cause any such shift – abundance of cheap computing resources, more powerful hardware and a good availability of open source components. It has become as simple as setting up one tool-chain to get started very quickly on understanding what’s happening in the logs from your application. However, beyond being easy enough so that anyone with even a little software experience can do it, here are three reasons why you should care about it and get started as soon as possible.

There’s hidden data in your logs that you don’t know of

Logs are a geologic record of exactly what went on in your product over time. Not only do logs give you access to what happened at a specific point of time in the past – the traditional use, analyzing them also exposes trends in the usage of your product and in problems that your product runs into. These data points are useful to product managers and support teams alike to understand users and provide them a better experience. Traditionally the domain of time-consuming discussions and ineffective surveys, this need can now be served effectively by looking at the insights hidden right inside data your already own.

With a low barrier to entry, anyone can get in on the fun

There are several log analysis products and services covering the spectrum from comprehensive log management to hosted log analysis and visualization, which offer a range of plans for their services. Irrespective of whether you are a startup or an established large company, you will be able to get a solution that exactly fits your needs. With such a solution, you will be able to keep dumping information in logs while deciding which of it is useful at a later stage. Unfortunately, so will your competitors.

You can actually use logs as a data sink

While it seems counter-intuitive, there’s a perfect case to be made for using logs as a way of storing and processing data. While you have to use resources and spend development effort when trying to store data about various events and objects in your system, with logs all you need to do is dump that data into the logs. Then, with an appropriate setup, you will be able to extract the results you want out of these either on the fly for immediate action or later for historical analysis. Definitely beats having to worry about creating the perfect database design to hold all the data that you would ever want for analysis.

Given the state of log management and analysis ecosystem, it’s not only easy but also hugely beneficial to look at your own logs to derive knowledge about your users and your product. This is sure to result in stronger products and better user experiences. Happy logging!

Last modified on
Hits: 7437
Rate this blog entry:

These first few weeks at GS have given me some food for thought. While I have been bombarded with TLAs and FLAs, (and some SLAs as well), most of which are new to me (I have just crawled out from under the kernel / device driver / firmware rock, you see), the constant running theme in all this crossfire has been cloud, of any and all kinds, OpenSource stacks, and at the root of all of these, hypervisors and virtualization.

I came across this blog Docker and Containers – my IDF14 favorite and one particular point got me thinking. I reproduce the particular statement from the blog verbatim here - "Twenty years ago, the problem of exploiting more performance than one app could use was to put more than one app on the same system." The blog further goes on to point out that virtualization provided a convenient and relatively secure way of packaging these apps in their own OS container - a VM, thus creating the foundations of today's cloud.

Rewind to the statement, I pointed out above - Twenty years ago, the problem of exploiting more performance than one app could use was to put more than one app on the same system. Let's start with a loose analogy of a computer system with an automobile, particularly a car. The processor in the computer system can be likened to the engine in the car, the car's passengers to the apps that run on the system. It's immediately obvious that a car carrying only one passenger is inefficient, especially if its capacity is 5 passengers. Similarly, a computer system with a processor capable of supporting multiple apps is inherently inefficient when only one app runs on it.

In the transportation world, carpooling evolved as a technique to better utilize the resources available in a car, just as in the IT world, this was achieved by running multiple apps on the same system. As we all know, carpooling has numerous limitations, chief among them being the unique time and location constraints of each passenger. The corresponding limitation in the IT world is the library and OS dependency of apps the author talks about in the blog to which I referred. But the manner in which these limitations have been overcome is fundamentally different in the automobile world and the IT world, and I will further argue that the automobile industry has historically evolved in a more efficient manner than the IT industry.

In the automobile world, long before carpooling even existed, we had a thriving community of different automobile makers, each designing and building their own engine. Even if we limit the discussion to cars, manufacturers historically have each designed and produced a range of cars, with differing engines, for their customer base. Each car manufacturer has usually 2 or even 3 base engine models with variants for different market segments. This availability of choice in the automobile sector has allowed the market to evolve in a more sensible manner - witness the rise of 2 wheelers in India vis-a-vis the absolute domination of cars in the US and parts of Europe.

If the IT industry had evolved in a similar manner, one would have expected to see a range of general purpose processors, available from different vendors, catering to different market segments and needs. It is possible that market dynamics would have allowed for a processor manufacturer to solve the "exploiting more performance than one app could use" problem, by designing a low cost, minimal featured processor, that was optimized for running a single app or a suite of closely related apps.

Instead, what we got was virtualization, first of the server - its CPU, its NIC, and now of the network - its switches and routers.

The carpooling analogy is imperfect in that carpooling is a relatively late phenomenon and appeared after the automobile industry had already matured and stabilized with numerous key players. The underlying point, I believe, remains valid.

Now virtualization has its clear benefits when analyzed from the standpoint of overcoming the limitations of poor semantics of tightly coupled data - I'm talking for example the longstanding association of the MAC address of a NIC with a server's identity. But was it really necessary to virtualize the CPU as a way of using its underutilized capacity? Or would it have been better for a processor vendor to design and deliver a processor that addressed this need?

Your thoughts?

Last modified on
Hits: 1615
Rate this blog entry:

For long, Yoga for a balanced lifestyle and Ayurveda, a formidable contender in the alternative medicine space, have been the leading spiritual products manufactured out of India. These were well supported by educational and professional institutions and a large number of voluntary participatory groups, which gave them the early momentum and a form of an industry ecosystem.


While India moved straight from the spiritual revolution to the agriculture and onto the information revolution (Toffler), it had a stop-start run with the industrial (manufacturing) revolution. Even though India is a formidable manufacturer - leading in many sectors including automobile, petroleum, steel, textile, pharmaceutical, chemical and defense industries, which are traditionally huge job creators, it has not been able to sustain this development across the distinct geographies of India. Pockets of India benefited from this economic development and local job creation, but the vast majority of the country was left wanting on both fronts. Perhaps, as a result, an estimated 50% of north India is expected to continue migration down to central India and further down to South India where most of the economically developed pockets happen to be.


The information revolution that started in India from the early to mid-nineties changed this paradigm. India was well equipped for this, thanks to:

  • being in a convenient time-zone,
  • timely government interventions with partial Rupee convertibility,
  • liberalization, and the resultant freeing up of investments in the telecom sector, which led to opening of new communication channels
  • abundant availability of English-speaking workforce

All these helped India to catapult to the center of the round-the-clock software development services model, which gave a time-to-market advantage to those overseas clients who needed to deliver products fast. This model, software services out of India, is today considered the norm. Anything tech outfits globally could not, or did not want to do in their own home countries, India has come to become a chosen hub to outsource, or locate their offices. That gave rise to an excellent career opportunity to the Indian knowledge workers who ever aspired to do well in their life. So talent was not a problem and jobs were pouring and still are. Salaries have been rising, corresponding with new demands for skills, and so are the standards of living of the workforce across the country, including the interiors. All this has helped to create a sense of confidence and has led to a sophisticated exposure to the western and eastern world.


And yet, product management hasn’t had the share of this pie that you would expect it to. This is largely because the product market does not focus on the local economy. When product development is focused on a target market in close proximity, the development is fed by product management, which keeps a close market-watch, collecting intelligence and feedback. Access to the local market to validate acceptance of the product idea and sharp product management acumen is central to the success of the product. Yet when outsourcing evolved, engineering in the home country (where the product was conceived) became the eyes, ears and brains that decided the core engineering, and the offshore engineers often did not have a chance to contribute to critical product decisions. The side effect of this is that international market often questions India’s ability to make products out of India. Once you are embedded as a part inside a sub-tier of an engineering team of the customer – whether knowingly or many times unknowingly – engineers receive filtered information, restricting their view and finally their contribution remain largely defined and task based. This has a direct impact on what can possibly help the product succeed and meet its target demographic or what choices of technology, feature or market forces can make it go belly up. Exposure to failure is a key contributor to creativity and subsequently to productivity. NASSCOM’s 10000 startups initiative is reigniting the heat in emerging India, and tapping its hunger to be creative and productive. The new government’s initiative, ‘Make in India’ is counting on producing world-class products out of India. And initiatives such as a joint collaboration between Microsoft, Facebook and other technology vendors is expected to fuel the talent of rural India unlike ever before.


Making products in India is possible, and is very real.

Last modified on
Hits: 1877
Rate this blog entry:

Posted by on in Technology

Passwords are a necessary evil, and are everywhere. Many organizations still rely completely on passwords for authentication purposes. While most of us are well aware of the limitations of passwords, we rarely move beyond them. How many of us use Two Factor Authentication (2FA) provided by cloud service providers like Google for all the services we use? Very few if any. In this blog post, I hope to steer you towards stronger authentication measures which are cost effective and reduce your reliance on passwords.

What options are available to strengthen security and reduce the dependence on passwords? There are several and before we dive into these, let us understand how different factors are used. There are essentially three factors that can be used in the authentication process:

  • What the user knows: Passwords, PINs, etc.
  • What the user has: Something that the user possesses like a hardware or software token.
  • What the user is: Something intrinsic to the user such as a biometric (finger print, retina scan, etc).

Additional factors can be brought in to strengthen the authentication process. For example: a banking application uses login and password for authentication. But when a high value transaction is to be carried out, then the user needs to enter a One Time Password (OTP). The OTP can be generated on the users mobile device or it can be sent to the device. Essentially, the mobile device is something that the user has in her possession.

OTP has several advantages:

  • standard based support (HOTP - RFC 4226, TOTP - RFC 6238, OATH)
  • as the name suggests these values are valid for one time use only.
  • it is very difficult to predict the next OTP value
  • OTP is generated or delivered out of band.

Most organizations that use this method require the OTP to be delivered using SMS or text messaging. However, this implies that the user (receiver) needs to bear charges associated with text messaging and needs to be in a mobile coverage area. Also, SMS based delivery is not secure.

A good alternative is to use standard based OTP generation on the smartmobile device instead of a delivery based approach. OTP standards like HOTP (HMAC-based One-Time Password Algorithm) and TOTP (Time-based One-Time Password Algorithm) facilitate an OTP generation based model on mobile devices. Both these standards are quite similar, in that they require a shared secret that is shared between the mobile device and the application utilizing OTP services. The only difference is that HOTP uses a counter based synchronization mechanism whereas TOTP is time-based. These standards are blessed by the Internet Engineering Task Force (IETF) and offer dependability.

There are free mobile applications available which take care of locally generating the OTP. Google Authenticator is a good example and supports most mobile flavors from iOS, Android, BlackBerry, etc. I would recommend a free application like Google Authenticator, as it is already leveraged by industry leaders like Google, Amazon Web Services (AWS), etc. The setup process for Google Authenticator is simple and user-friendly.

The server application leveraging OTP will need to be enhanced to add the required support. GS Lab has an OTP library which is standards based and supports both HOTP and TOTP standards. This library is currently geared for Java/ J2EE applications and provides a means to quickly enable your application to support strong two factor authentication. The OTP library works with the free, off-the-shelf Google Authenticator mobile application, which simplifies the deployment process considerably for your users. If you are interested to know more about our OTP library, do drop us a line at

In future blog posts, we will do a deep dive on OTP standards and more interesting stuff around context-aware authentication and other concepts.

Last modified on
Hits: 1803
Rate this blog entry:
Very low screen size go to mobile site instead

Click Here