Recent blog posts

Posted by on in Technology

Software Requirement is detail description of the system under implementations. It outlines the practical usage of product or Service, Condition or capability to which a system must conform. Requirements can range from high-level abstract statements of services or system constraints to detailed mathematical functional specifications. Here we will be discussing about requirement analysis and its consideration w.r.t. QA.

Software development life cycle (SDLC) models describe different phases of the software cycle and the order in which those phases are executed - Requirements gathering and analysis, Design, Implementation or coding, Testing, Deployment, Maintenance.

What is Requirement Analysis: It is the process of determining user expectations for a system under consideration. These should be quantifiable and detailed.

Requirement Analysis:

  • Serves as a foundation for test plans and project plan
  • Serves as an agreement between developer and customer
  • Process to make stated and unstated requirements clear
  • Process to validate requirement for completeness, unambiguity and feasibility.

 

Below picture depicting consequence of poor requirement analysis and its impact on Software development life cycle. 

Requirements Analysis

Here we can clearly see if the requirement analysis is not done in early phase of the SDLC then its impact is huge to fix it in later phases. Few consequences of poor requirement analysis are like incorrect feature delivery, poor product quality, number of change control to fix system flaws, extension of project deadlines etc.  More we delay in analysing the requirement more it costs and which impacts project delivery and quality.

 

Listed challenges in Requirement Analysis phase in QA:

  • In early stage of SDLC the scope is not defined clearly.
  • Many times there is ambiguous understanding of processes.
  • Communication between project team and stakeholders plays important role.
  • Insufficient inputs from customer leads to assumptions and those are not accepted in UAT.
  • Inconsistency within single process in multiple users
  • Conflicting customer views.
  • Frequent new requirements.

Tools and techniques used for analyzing the requirements are,

  1. Use Cases: It is a methodology used in requirement analysis to identify, clarify, and organize the requirements. It is set of possible sequences of interactions between systems and users in a particular environment and related to a particular goal.
  2. Requirement Understanding Document (RUD) - Document covers details of Requirement understanding pertaining below points: 
    • Assumptions
    • System Details
    • Logical System Requirements
    • System Entity
    • Hardware
    • Acceptance Criteria
  1. Prioritize each requirement
  2. Discuss with team and identify testing scope
  3. Break down requirements in tasks and user stories.

How to Analyse Requirements?

  • Find out what software has to do.
  • Identify requirements by questioning like, Why, What, Who, How etc.
  • Find out how complex application would be and its impact on testing.
  • Which all things would need to be tested.

Requirement validation: Validate requirements based on below points so that at the end of the requirement analysis phase all required information available.

  1. Correctness: find out incorrect statement/requirement.
  2. Completeness: find missing requirement.
  3. Feasibility: find what all features are possible to test and which are beyond the scope.
  4. Testability: Different testing applicable.
  5. Ambiguity: find single interpretation of requirements (statement not clear due to multiple meanings).
  6. Consistency: find out requirement consistency and pointing to single requirement.

After validating whole requirement go ahead and categorize it into 3 types, functional, non-functional and special requirements. This categorization will help in creating detailed Test cases for different testing types.

 

QA Role:

QA involved in requirement analysis activity to ensure that the requirements identified by the BA and accepted by the customer are measurable. Also this activity provides inputs to various stages of SDLC to identify the availability of resources, scheduling and testing preparation. Below activities QA need to perform.

  • Analyze each and every requirement from specification document, use cases.
  • List down high level scenarios.
  • Clarify queries and functionality from stakeholders.
  • Promote suggestions to implement the features or any logical issues.
  • Raise defect or clarification against the specification document.
  • Track the defect or clarification rose against the specification document.
  • Create high level Test Scenarios.
  • Create Traceability Matrix.

Outcome of the Requirement Analysis Phase: 

  • Requirement Understanding Document.
  • High level scenarios.
  • High level test strategy and testing applicability. 

  

With all above mentioned techniques and checklist of requirement analysis, Tester is ready Sign off from Requirement Analysis phase.

Last modified on
Hits: 21
Rate this blog entry:
0

Jarvis: [while Tony is wearing the Mark II Armor] Test complete. Preparing to power down and begin diagnostics... 

Tony Stark: Uh, yeah, tell you what. Do a weather and ATC check, start listening in on ground control. 

Jarvis: Sir, there are still terabytes of calculations required before an actual flight is... 

Tony Stark: Jarvis... sometimes you gotta run before you can walk. 

If you are an ardent fan of Mr. Tony Stark (like I am), then you would have guessed that this is from Iron man (2008).

Well, this might not be a fiction or fantasy anymore; this is already happening! Bots like Jarvis are taking over the world. Siri, Cortana, Google Assistant are all living (are they living beings really?? Anyway, moving on!) examples of one form of such bots. Now, various organizations are deploying bots as their first point of interaction for consumers/customers. Does this really make sense or it's just another technology fad? Let's find out.

Chatbot or simply a bot, is actually a way of using (or reusing) an existing channel to interact with users. Conversation-based interface is being used by us in many ways in our day to day life - be it for discussing requirements with customer over Skype or for resolving a query related to your phone bill through chat or IVR. So boti-fying your business or app does make whole lot sense. Some clear and visible benefits are:

1) Always-connected customer experience: Bots can ensure that your customers are always attended to, always responded to and hence would provide certain level of assured customer experience.

2) Reaching out to customer - really: Given that bots can operate on various channels like Skype, Slack, FB messenger, SMSes, emails - you name it, you are also able to cover a wide variety of types of customers. This particularly is relevant for B2C kind of businesses.

3) Better use of human-capital: For a business that needs a good support infrastructure to handle customer queries & complaints, a bot based solution would create the opportunity to move personnel to more meaningful tasks. For instance, the initial conversation with the customer can be handled by bot and after qualifying the need to talk to an expert, a human expert would get involved. This would also save some costs and may improve efficiency.

4) 24x7 availability at low cost: Bots don't need holidays or breaks :) Once you build them, train them and have right technology set up to have them available at all times, you are all set. You will never lose any customer or any lead.

Following are some business scenarios where bots are already being deployed/used or can be used:

1) You log onto your bank app on smartphone and you are greeted by a bot that could answer all your basic queries like account balance, last 3 transactions, your credit card amount that is due and so on.

2) You visit an online shopping site and start talking to a bot. It would suggest right products for you to buy based on your chat history or shopping history. More so, it will complete the transaction end-to-end till payment.

3) [This is developer special] You want to check out the latest changes that have gone in the build that was deployed just now. You can talk to a Slack bot and get all these details.

4) [This could be too futuristic] You can talk to your home to keep itself ready to welcome you with rightly set AC temperature, correct lighting and may be hot food in oven too (thanks to IoT).

5) And many more..

Ok; so you get the point. You've got to build bots. Here is how.

A typical bot-based architecture would look like following:

1) Channels: These are essentially the apps that users are already familiar with or are using. It could be Skype, Slack, Facebook Messenger, WhatsApp. You can even have an email or voice recognition system as a channel. The key aspect to keep in mind here is the discovery of the bot through channels. You would need to publish your bots through these channels so that users can add them into their IMs.

2) Bot Framework: This is the key component that makes a bot - bot! This is the brain of bot basically. You have plenty of options here including Microsoft Bot Framework, api.ai, wit.ai and so on. Most of these are based on NLP (natural language processing) concept and you will have to train them so that they "understand" your business well. This is crucial and yet a tricky part.

3) Peripheral Services: You would typically need peripheral services and integrations to take care of things like authentication (Using Active Directory or Google/Facebook Sign on services), data management (databases), analytics (insights about usage), scheduling (calendar/email integration) etc.

Based on your business requirements, time-to-market, cost of operation and maturity of bot frameworks, you can finalize on the right bot framework. Most frameworks support most of the channels.

Too good to be true? Well, there could be following challenges with bot-based solutions:

1) Data privacy and security: This could be critical for bots that are dealing with financial data, healthcare information etc. You don't want your bots to be hacked or compromised. So you need to ensure that your bots are having right security mechanisms in place.

2) Maturity of underlying technologies: I am referring to mainly the NLP aspect of bot frameworks here. While there is a lot of euphoria around bots, the bot frameworks are required to be tested well in terms of scalability, reliability with respect to robustness of NLP aspect.

3) The "Human" angle: Well, a machine can never be human. So, while we have bots conversing with humans, there will be instances where human intervention is needed to pacify a really angry customer or to handle a complex scenario. While designing bot-based solutions, we need to be cognizant of this aspect.

Needless to say, the idea of having bot do the talking (literally) has some merit. It's up to us humans to make right use of them :)

Last modified on
Hits: 117
Rate this blog entry:

While NoSQL and NewSQL systems are maturing as high-performance data store options and being adopted increasingly, relational databases are based on a proven and solid model. Several scalable products still use them and need sharding, caching, routing when their databases grow into large clusters to protect their investment without significant re-engineering and operate 24×7.

 

Lack of adequate caching is often one of the most common performance problems that performance engineers come across when investigating performance issues. Many performance problems can be solved by the effective application of caching to reduce the frequency of expensive operations like database accesses or fetching web pages or reducing execution count of expensive functions by memoization. Caching can thus be leveraged at all layers from processors to disks, CDNs for web applications to web servers to databases, filesystems etc. They can be as simple as dictionaries/ hash tables provided by programming languages as a data structure or complex in nature like distributed hash tables (DHT) or enterprise grids. However, we use caching when evidence of a bottleneck demands it and not as a golden hammer or a band-aid. Between various caching solutions, It seems that relational caches or caching database middlewares are not too uncommon (see the ‘transparent sharding middleware’ section in this paper on NewSQL systems).

 

ScaleArc is a database load balancing middleware software, having a long roster of customers and impressive features including zero downtime and real-time monitoring. Of all these features, GS Lab’s performance engineering team thought ScaleArc’s transparent caching could be particularly helpful to improve the performance of products using relational databases and help them meet high scalability goals when combined with the other features. GS Lab engineering came up with a reproducible benchmark as a proof of the pudding to assess its promise.

 

Tools in the SysBench benchmark suite are widely used to measure the performance of various subsystems. In a recent case investigating suboptimal IO performance, GS Lab used the SysBench I/O benchmark to find that an improper RAID level was being used for a relational database running on expensive hardware. Fixing this to use the correct RAID level led to a big speedup without the need for end-to-end performance testing. SysBench is widely available and suitable as an independently reproducible benchmark. Therefore, we used the SysBench OLTP benchmark in this study to measure the performance of a MySQL NDB cluster with ScaleArc's ACID compliant cache.

Test_Environment.jpg

Scalearc has published a similar benchmarking exercise by Percona on Percona's variant of MySQL. We used NDB cluster since it is used by one of our customers and no such study exists for NDB cluster as far as we know. Also, we only evaluated results of caching for a subset of SysBench OLTP workload consisting of read-only queries (and skipped the read-write workload) to find an upper limit on performance gains through caching.

Avg_ResponseTime_ms.jpg

The results show a big improvement (up to 9x) in throughput of cached read-only queries and a great reduction in response times.

RequestperSec.jpg

Though the speedup will not be as spectacular for typical OLTP workloads consisting of a mix of reads and writes (compared to analytical workloads with a high percentage of reads), the results are highly promising given that systems can get a big performance boost with zero change in the application code or database.

 

We are publishing the results of this study as a white paper. All the artifacts required to reproduce the exercise including the test environment configuration, load generator code, supporting scripts, raw results and summary data etc. are available in this GitHub repository.

Last modified on
Hits: 219
Rate this blog entry:

Posted by on in Technology

The impact of technology evolution encompasses advances in sensor technologies, connectivity, analytics and cloud environments that will expand the impact of data on enterprise performance management and pose challenges for system integrations for most companies.

As industries are transitioning from analog to digitalized PLCs and SCADA, they would have to leverage sensor-based data to optimize control and design their assets and processes – both in real time and over time for faster decision making as well as embedding software in traditional industrial equipment.

Developing and deploying these systems securely and reliably represents one of the biggest challenges.

Going far beyond the current definition of networks, the most complicated and powerful network yet is now being built. In it, devices embedded in power lines, waterlines, assembly-lines, household appliances, industrial equipment, and vehicles will increasingly communicate with one another without the need for any human involvement.

The reach of these integration capabilities will go far beyond infrastructure and manufacturing. Today, for example, clinicians diagnose health conditions through a lengthy assessment. But simply matching historical pathological patterns, lifestyle patterns and matching those to live diagnostics collections systems provides for a more accurate diagnostic approach to serious ailments or early-warning signal. To make the most of such opportunities, health-care companies must figure out how to integrate systems far beyond the hospital. Much like in-memory big data analyses, this presents a problem of data collection closer to the source of the data. 

You may wonder collecting and transmitting data from several industrial machines and devices is not a new concept. Since the early 80s, data from industrial assets has been captured, stored, monitored and analysed to help improve key business impacts. In this era of digitization, as the industrial sensors and devices create hybrid data environments, systems integration will propagate more data from more locations, in more formats and from more systems than ever before. Data management and governance challenges that have pervaded operations for decades will now become a pressing reality. Strategies to manage the volume and variety of data, would need to be administered now to harness the opportunity IoT and BigData promises.

Despite of the above stated challenges, some strategies incorporated in core operations can help increase the odds to success:

  • Multiple Protocols

As the number of sensors and devices grow, increase in the number of data acquisition ‘protocols’ are creating a greater need for new ‘interfaces’ for device networking and integration within the existing data ecosystems.

  • Data Variety

As devices and sensors are deployed to fill the existing information gaps and operationalize assets outside the traditional enterprise boundaries, centralizing data management systems must be able to integrate disparate data types in order to create a unified view of operations and align them with the business objectives.

  • New Data Silos

Systems built with a purpose produce data silos that create barriers to using data for multiple purposes, by multiple stakeholders. Without foresight connected devices solutions presents the new silo – undermining the intent to construct architectures that incorporate connected devices to build broader, interactive data ecosystems.

As discussed above, for more than 30 years industries across the globe have been leveraging sensor-based data to gain visibilities into operations, support continuous improvement as well as optimize overall enterprise performance.  As advances in the technology make it cost-effective to deploy connected solutions, industries would need to develop a strategic approach for integrating sensor data with pre-existing data environments. These advancements would traverse towards creating a seamless, extensible data ecosystem with the need for cooperation between multiple vendors, partners and system integrators.


Last modified on
Hits: 338
Rate this blog entry:
0

In testing, Test Summary report is an important deliverable.  It represents the quality of a product.  As automation testing is mostly carried out in the absence of human, I recommend that test results should be presented in a good way.

Automation test report should be useful to people of all levels like automation experts, manual tester who is not aware of a code, high-level management. 

 

In an ideal case test automation report should comprise of following:

  • Statistical data like number of test cases passed, failed, skipped
  • Cause of test failure
  • Evidence (like screenshots indicating success/failure conditions)

Additional to above if we have following things in our test report then it will be impressive and useful:

  • Pass and fail percentage of tests
  • Test execution time for individual test case and a test suite
  • Test environment details
  • Representation of statistical data in the form of charts
  • Grouping of test cases as per the type like Functional, Regression etc.

TestNG or JUnit does not provide good reporting capabilities. TestNG default reports are not attractive. So for that we have to develop the customized reports.

I suggest using ExtentReport for automation test reporting will be more effective. This library allows us to accomplish the above mentioned things.

About ExtentReport:

It is an open-source test automation reporting API for Java and .NET  developers. The report is generated in HTML form.

Following are some features of ExtentReport:

  • Easy to use
  • Results are displayed in the form of pie charts
  • Provides passed test case percentage
  • Displays test execution time
  • Environment details can be added in an easy way
  • Screenshots can be attached to the report
  • Test reports can be filtered out based on the test results (Pass/Fail/Skip etc.)
  • Filtering stepwise results like info/pass/fail etc.
  • Categorized report  for Regression/Functional etc. testing
  • Test step logs can be added
  • Can be used with JUnit/TestNG
  • It can be used as a listener for TestNG
  • We can create parallel runs as well. So single report can be created for the parallel runs
  • We can add the configuration to report
  • Results from multiple runs can be combined to single report

Downloading and installation:

Download ExtentReport jar from http://extentreports.relevantcodes.com/index.html and add it as a dependency to your java project.

 

ExtentX:

ExtentX is a report server and project-wise test analysis dashboard for ExtentReports.

 

How ExtentReport works:

To see how ExtentReport exactly works, here is a simple example – One test case will pass and another will fail.

 

import org.openqa.selenium.WebDriver;

import org.openqa.selenium.firefox.FirefoxDriver;

import org.testng.Assert;

import org.testng.annotations.AfterTest;

import org.testng.annotations.BeforeTest;

import org.testng.annotations.Test;

import com.relevantcodes.extentreports.ExtentReports;

import com.relevantcodes.extentreports.ExtentTest;

import com.relevantcodes.extentreports.LogStatus;

 

public class ExtentReportTest{

     private WebDriver driver;

     ExtentReports extent;

     ExtentTest test;

     StringBuffer verificationErrors = new StringBuffer();

    

     @BeforeTest

     public void testSetUp() {

           driver = new FirefoxDriver();

           extent = new ExtentReports(".\\TestAutomationReport.html", true);    //Report initializing

           extent.addSystemInfo("Product Version", "3.0.0")   //System or environment info

                 .addSystemInfo("Author", "Sachin Kadam");

     }

    

     @Test

     public void TC1() {

           test = extent

                     .startTest("Test case 1", "Check the google home page title")  //Start test case

                     .assignAuthor("Sachin Kadam")   

                     .assignCategory("Regression", "Functional");

           String appURL = "http://google.com";

           driver.get(appURL);

           test.log(LogStatus.INFO, "Navigating to URL : "+appURL);   //Log info

           customVerify(driver.getTitle(), "Google");

           extent.endTest(test);   //End test case

           checkForErrors();

     }

    

     @Test

     public void TC2() {

           test = extent

                   .startTest("Test case 2", "Check the wikipedia home page title") //Start test case

                   .assignCategory("Functional")

                   .assignAuthor("Sachin Kadam");

           String appURL = "https://www.wikipedia.org";

           driver.get(appURL);

           test.log(LogStatus.INFO, "Navigating to URL : "+appURL); //Log info

           customVerify(driver.getTitle(), "Google"); //Incorrect expected title to fail test case

           extent.endTest(test);   //End test case

           checkForErrors();

     }

    

     //custom assertion method for string comparison

     public void customVerify(String actual, String expected){

        try{

           Assert.assertEquals(actual, expected);

           //Log pass results

           test.log(LogStatus.PASS, "Expected title:"+expected + " :: Current title:" + actual); 

           }catch(Error e){

                 //Log fail results along with error

                   test.log(LogStatus.FAIL, "Expected title:"+expected + " :: Current title:" + actual +" :: "+ e.toString());                                                 

                verificationErrors.append(e);

          }

     }

 

     @AfterTest

     public void tearDown(){

           driver.quit();

           extent.flush();

     }

     

    //Method for logging correct results to TestNG report in case of failure

     public void checkForErrors(){ 

           if(!"".equals(verificationErrors.toString())){

           Assert.fail(verificationErrors.toString());

           verificationErrors = new StringBuffer();

           }

     }

}

 

Finally generated HTML report looks like:

 ExtentReport01

 

ExtentReport02

 

ExtentReport03

 

I hope you will find ExtentReport very useful, easy to use, impressive and productive.

For more reference: http://extentreports.relevantcodes.com/index.html

 

- Sachin Kadam

 

Last modified on
Hits: 854
Rate this blog entry:

Posted by on in Thoughts

Gartner’s 10 strategic predictions for 2017 and beyond, makes me unwillingly delve into imagining what the future holds.

As John leaves work and heads to the building lobby, his car is already waiting for him. Self-driving cars are almost mainstream. He just indicates to his car, “Drive me home”. After arriving home, which is already cooled/heated to his preference, he picks up the freshly brewed pot of coffee to pour himself a cup. As he walks into the living room, he says “Play HBO” and the TV turns on with HBO channel playing. Deeply engrossed in the movie, John is suddenly reminded by his virtual assistant (AWS Echo) reminding him about a dinner party scheduled for later in the evening. He tells his virtual assistant to buy some flowers and a good bottle of wine. Using virtual reality, he is immediately present in the virtual mall and able to hand pick these items. As he does a virtual checkout, these selected items are being delivered by a drone to his home in another half an hour and John is all set for the party.

In some time technology will make all of this a reality. Some of it is already a reality though. Let us now look at the technology underlying all of this. At the fundamental level we have Internet of Everything. All devices are connected to the grid all the time. This allowed John’s car to estimate and share his arrival time with devices at home. This in turn allowed his air conditioner to set the appropriate temperature level and coffee maker to brew his preferred coffee beforehand. Almost all the interactions are voice based rather than some clicks on a screen. Devices with audio input will be trained to be activated only on specific person’s voice (biometric audio-based authentication is implicit). Even the acting of purchasing something is not happening on the mobile application anymore. Most of the shopping will be using virtual reality channel and the experience will be most gratifying. No more running to the local store for last minute errands. Deliveries happen by drone in the most efficient manner possible.

Virtual stores of the future will have no physical stores nor warehouses, instead they will rely on JIT inventory from the suppliers directly. Goods will be shipped from the supplier directly to the consumers based on orders received by the virtual stores. The virtual store will completely change shopping experience for its consumers using virtual reality. It will allow consumers to touch and feel objects prior to purchasing theses. Credit transactions will happen transparently in the background based on bio-metric approval from the consumer. The virtual reality googles will perform an IRIS scan to authenticate the consumer and digitally sign the transaction and approve it. Block chain will be used by merchants to maintain these financial transactions in an authentic, non-repudiate-able fashion.

All devices in the home will be connected and share analytics metrics with manufacturers. For example – the air-conditioning/heating unit will share detailed metrics on performance of the compressor, power consumption trends, etc. with its manufacturer. This allows the manufacturer to leverage this data to perform analytics to predict outages and faults well in advance. This in turn ensures that the service technician (possibly a robot) does a home visit before the device breaks down. Preventive maintenance will help continuity and prevent outages. Consumers alongside businesses will help benefit tremendously from this.

Overall life style and experience will change dramatically. People will leverage fitness bands/trackers and share data with their healthcare provider as well as Health Insurance Company. This will enable the healthcare provider to proactively track health of an individual (again through analytics) to detect issues before these arise. Also, insurance companies will base the premium based on the healthiness level of an individual alongside life style patterns. The latter will include diet / food habits (from your virtual store grocery shopping), exercise regime (fitness tracker), etc.

With everything integrated – security is the key. With IoT devices, it is imperative that security is baked in at multiple levels.

 

 

b2ap3_thumbnail_IOT-Security.jpg

 

Let us look at these in more detail below:

 

Device security – The device needs to protect itself from attackers and hackers. This includes (but is not limited) to the following: hardening the device at OS level, securing confidential information on the device (data at rest on the device), firewalling the device, etc.

 

Authentication – Each entity (device, cloud service, edge node/gateway, etc.) needs to authenticate itself to the corresponding entity. If there are default username/passwords in the device, then it needs to enforce password reset on initial power-on (along with factory reset option). Ideally the device should not use static password for authentication. In our earlier post on OTP – based device authentication for improved security we have discussed a novel approach which helps address the challenges faced by IOT device manufacturers today.

 

You can read more about OTP – based device authentication for improved security by clicking here.

 

Network communication channel security – Today there are various communication channels at play, for example – devices communicating with their respective cloud service providers, devices communicating with fog/edge computing services/devices, devices interacting with other devices, etc. It is important that each communication channel is secured and there exists trust between the communicating endpoints. The channel can be secured using TLS as appropriate.

 

Cloud service security – The cloud service provides the backbone for services provided. The attack vector surface needs to be minimal and hardened / firewalled for DDoS attacks. Data from the devices is collected at the cloud service end and needs to be secured (data at rest). This data need not be visible to the cloud service provider as well (depending on the nature of the data and service provided). Provider needs to ensure that appropriate backup and disaster recovery plans are in place. Also, the provider needs to present their business continuity plan to its subscribers. Cloud Security Alliance (CSA) provides good guidance to cloud service providers.

 

Privacy – This relates more to data sharing across disparate service providers. With IoT, devices will end-up communicating with devices / services from other providers. How much information can be shared across service providers with user content needs to be carved out explicitly? Service providers will need to incentivize users to allow sharing information with other providers. The user needs to benefit from the sharing eventually to allow it.

 

To summarize security is a key aspect for success of IoT.

 

Tagged in: IoT security
Last modified on
Hits: 419
Rate this blog entry:

The recent massive distributed denial of service (DDoS) attack on 21st October 2016 affected numerous cloud service providers (Amazon, Twitter, GitHub, Netflix, etc.). It is interesting to note that this attack leveraged hundreds of thousands of internet connected consumer devices (aka IOT devices) which were infected with malware called Mirai. Who would have suspected that the attackers involved were essentially consumer devices such as cameras and DVRs?

A Chinese electronics component manufacturer (Hangzhou Xiongmai Technology) admitted that its hacked products were behind the attack (reference: ComputerWorld). Our observation is that the security vulnerabilities involving weak default passwords in vendor’s products were partly to blame. These vulnerable devices were first infected with Mirai botnet and subsequently these Mirai infected devices launched an assault to disrupt access to popular websites by flooding Dyn, a DNS service provider, with an overwhelming amount of internet traffic. Mirai botnet is capable of launching multiple types of DDoS attacks, including TCP SYN-flooding, UDP flooding, DNS attack, etc. Dyn mentioned in a statement – “we observed 10s of millions of discrete IP addresses associated with the Mirai botnet that were part of the attack” – such is the sheer volume of the attack by leveraging millions of existing IOT devices out there.

Subsequently Xiongmai shared that it had already patched the flaws in its products in September 2015, which ensures that the customers have to change the default username and password when used for the first time. However, products running older versions of the firmware are still vulnerable.

This attack reveals several fundamental problems with IOT devices in the way things stand today:

  • Default username and passwords
  • Easily hackable customer-chosen easy-to-remember (read as “weak”) passwords
  • Challenges with over-the-air (OTA) updates etc.

The first two problems are age old issues and it is surprising to see these come up with newer technologies involving IOT devices as well. Vendors have still not moved away from these traditional techniques of default username and passwords, nor have customers adopted strong passwords. Probably it is time, we simply accept the latter will not happen and remove the onus from customer having to set strong passwords (it is just not going to happen!).

One-time passwords (OTP) can be quite helpful here. One-time password, as the name suggests, is a password that is valid for only one login session. It is a system generated password which is essentially not vulnerable to replay attacks. There are two relevant standards for OTP – HOTP [HMAC-based One-Time Password] and TOTP [Time-based One-Time Password]. Both standards require a shared secret between the device and authentication system along with a moving factor, which is either counter-based (HOTP) or time-based (TOTP).

GS Lab’s OTP-based device authentication system presents a novel approach which helps address the challenges faced by IOT device manufacturers today. It provides unstructured device registry which is flexible enough to include information on various types of devices and an authentication sub-system which caters to authenticating IOT devices tracked in the device registry via OTP. The authentication sub-system is built on top of existing OTP standards (HOTP and TOTP) and helps alleviate the need for static (presumably weak) passwords in IOT devices. It provides support for MQTT and REST protocols which are quite prevalent in the IOT space. More support for additional protocols (like CoAP, etc.) is already planned and in the works. OTP-based device authentication system is built on top of our open source OTP Manager library.

Here are some of the advantages of using GS Lab’s OTP-based device authentication system:

  • Strong passwords – system generated based on shared secret key
  • Not vulnerable to replay attacks – passwords are for one-time use only
  • Freedom from static user-defined passwords
  • Standards based solution – HOTP and TOTP standards
  • Relevant for resource constrained devices – crypto algorithms used by HOTP and TOTP standards work with devices with limited CPU, memory capabilities.
  • Ability to identify malicious devices – rogue devices can be identified using HOTP counter value
  • Provides device registry for simplified management

 

References

Last modified on
Hits: 584
Rate this blog entry:

Customer

Customer provides a complete suite of events and video management solutions using cloud server. This server will enable the client devices (mobiles, web) to configure, control and view media from the enabled cloud camera. The server will host a web application, which functions as the intermediary for communication and authentication between the client and the camera.

Engagement

The GS Lab engagement involved feature development, QA, DevOps and test automation development. The test automation team has developed functional and performance test suites to test the product.

Field Requirement

  • The customer wanted a test framework which can simulate the event/video surveillance scenarios of different end customers.

  • The test framework should test the timely delivery of the audio/video events to the event tracking web portal/mobile app.

  • The test framework should benchmark various internal cloud servers in the audio video surveillance solution/product.

Solution Provided by GS Lab

b2ap3_thumbnail_Automation-Framework.png

 

 

GS Lab has developed an audio/video surveillance test framework (tool) using Python and Selenium. Following are the major features provided by this framework:

  • The test framework can test 500 video & audio live streams across 500 cameras (1 audio/video stream per camera). This is a customer product limitation for live streaming.

  • The test framework can test the video surveillance controlling app (Android as well as iOS) and web portal (across Chrome, Firefox, Internet Explorer and Safari).

  • The test framework can start and stop the live camera video/audio stream on the fly.

  • The framework can test the operation specific notifications and logs across different servers in the surveillance solution certifying the successful completion of the operation.

  • The test framework supports:

    1. Complete functional, regression & performance testing of all the event/video management scenarios

    2. On the fly addition/deletion of audio/video stream in the surveillance solution

    3. Testing of 24/7 recording of the live streaming to be stored on Amazon S3 cloud storage

    4. Testing the notifications for any (audio / video) camera event

    5. Checking the timely delivery of the events to the portal or mobile apps (Android & iOS)

 

Value Addition

Following are the major benefits of the test framework for an audio video surveillance product:

  • The real world scenarios of the end customer can be simulated using this framework.

  • It supports Continuous Integration (CI) with all available open source tools.

  • The framework can save 60% bandwidth of the QA team in every production release.

  • The framework can readily be used for the performance testing after the completion of regression testing with minimal changes.

  • It can benchmark different servers involved in the video surveillance solution.

  • The framework helped the development team to identify the performance issues due to crucial parameters (CPU, memory etc.) of the backend servers of the surveillance solution.

Last modified on
Hits: 230
Rate this blog entry:
0

Posted by on in Technology

On a mundane February afternoon, as I headed for lunch, I remember getting a phone
call from within my company, and with it an opportunity to participate in an IoT
training program! Little did I know that the training sessions were supposed to be
on-line, live, interactive but early in the morning. I'm not a morning person, and
was hesitant a little, but somehow, 'curious me' prevailed over 'hesitant me' and
I subscribed. Having heard quite a bit about Internet of Things (IoT), I wanted to
get a taste of it. And this training program presented that opportunity. It not only
talked about learning, but also about making hands dirty to build something!
    Right after the introductory session, it was clear that we could reap the
benefits in a much better way if participated as a team. So, we formed a team with
developers carrying experience in different areas such as UI, server side, native
applications, hardware devices, etc. Then on-wards, we embarked on a journey in a
quest to learn what it means & takes to build an IoT project using an IoT platform.
What follows here is an account of our experiences.


Learning an IoT platform
This was as good as it could get. We got to learn an IoT platform, an Atomic domain
language (TQL that is), ways to integrate with hardware devices, sensors, actuators.
There were well organized set of sessions, which took us on a tour of the platform
and how to use it. The course contained advanced features like clustering, macros
which made it even more 'pragmatic'.

Hands-on is the key, and you get to do plenty of it
One of the best part of this program is : you get to do hands on. In fact, you are
kinda forced to make your hands dirty. I think it's not w/o a reason that the philosophy
of 'learning by doing' exists! We played a lot with raspberry pi, arduino uno, sensors,
actuators and of course TQL system itself. This rendezvous did present us with it's
fare share of issues, but it was all worth.

Technically enriching discussions
One of the reasons for me to subscribe to this training program was to hear about the
IoT platform, directly from the creators of it. It is a big deal!
This was evident from the interactions which we or the community used to have
during as well after the sessions. e.g. Why a particular feature is implemented
in a certain way, why are certain things restricted on the platform, etc. This helped
participants, especially those who were developers/architects, learn about what goes
into making of an IoT platform.

Vibrant support forum
When you open the slack web app for TQL team, you get a random but nice message
to start with. One of the Slack messages that struck the chord with me instantly
was : We're all in this together. This message sums up the kind of support the
Atomiton folks are committed to providing. The questions are answered to depth
with minute details, with the reason explained as well as available alternative/work-around.

Mutually rewarding community
As the participants are required to build projects, they naturally get to showcase it
to the community. This helps everyone understand how the platform can be put to use
to solve real-life problems, how others in the community are using it in an innovative
and creative way, and in much larger context, what IoT is all about.

Motivation
When you are doing something over and above your regular work, you need high
levels of commitment. And you also need a great deal of motivation!
There was enough of it, at right times, to keep us going. And it rightly came
with tips & suggestions for improvement.

Improvement areas : What can possibly be done to make this even better?

Developer is king!
Developer is the king, and he needs to be pampered. ;) More the developer-friendly
features in the TQL studio, the better it is. Hover-for-help-msg, auto-completion,
templates-at-fingertips (for queries, macros, usage of javascript, in-line-comments)
are some of our suggestions to enhance the TQL studio experience.

Auto-generation of basic queries from models
This will save some work for the developer. Also, it will serve as a guide for
writing custom/complex queries. I would go a step further, and suggest auto-generation
of code for UI : to access data over web-sockets as well as over http.

Highlight security aspects
Make this a must in the training program. Let this be a differentiator.
Following are the aspects which are worth giving a thought :

    • Can h/w devices be given fingerprints (unique identities)?
    • If a web app is being served using app-attachment feature, then how to expose it over https?
    • How to invoke an external service over https?
    • Security in built-in protocol handlers


Hardware bottlenecks

One of the observations our team made after the completion of the final project was :
Working with 'things' is not the same as working with pure software!
We then thought, what would make working with 'things' easier? We realized,
it would be knowledge of setting this h/w up, knowledge of integrating with it,
would make working with it easier. Suggestion here is to make it a child's play.
Crowd-sourcing could well be utilized here. Making this easy and simple would make
participants focus more on the project and utilizing TQL System's features in full glory.
Items to focus here :
Raspberry pi - n/w connectivity, mainly, a list of FAQs with respect
to n/w connectivity, especially, what are the many different ways to do it.
Basic sensors and their connections with Arduino Uno and/or raspberry pi.

A step further, it would be great to share notes on comparison of
off the shelf hardware Vs. specialized high-end hardware. e.g. Raspberry Vs Libelium.
Can Raspberry be used in production environment?

Session prerequisites
It would help if the prerequisites are mentioned for each of the sessions, and the
content is also made available for these prerequisites.
For ex. right from the first session, the participants need to have an understanding
of raspberry pi & Arduino Uno. If they have already gone through it, then the first
session becomes a hello-world purely to TQL system rather than a hello-world to all
of h/w devices and then TQL system.

 

Tagged in: IoT TQL
Last modified on
Hits: 415
Rate this blog entry:

Posted by on in Technology

Trove Overview

OpenStack Trove is a DBaaS (Database as a service) solution. It offers IT organizations the ability to operate a complete DBaaS platform within the enterprise. IT organizations can offer a rich variety of databases to their internal customers with the same ease of use that Amazon offers with its AWS cloud and the RDS product. Openstack trove supports both RDBMS as well as NoSQL databases.

DBAAS

Database as a service on cloud intends to reduce complex and repetitive administrative tasks related to database management and maintenance. Tasks involve operations such as - Database instance deployments, database creation, configuration, periodic backups of database instances, patching. It also involves continuous health monitoring for database instances.

Trove Architecture

trove-arch1

API’s for Trove

APIs are exposed to manage following service constructs -

  • Database instances
  • Database instance actions
  • User management
  • Databases
  • Flavors
  • Data stores
  • Configuration groups
  • Clusters
  • Backups

Openstack4j Popularity

OpenStack4j is an open source Openstack client which allows provisioning and control of an OpenStack system. This library has gained quite popularity in the open source/java community for the simple reason that it has the most fluent API’s available to interact with Openstack.

It also lists in the Openstack official wiki as the Java library to interact with Openstack

https://wiki.openstack.org/wiki/SDKs

Support for Trove in Openstack4j

Openstack4j being popular and most preferred Java library has immense requirement for having trove API support. With its simplistic fluent API and intelligent error handling the experience of interacting Openstack has been made easy.

Example code snippets to interact with trove:

Create Database Instance:

createDbinst

Create Database:

createDb

Create Database User:

createDbUser

 For more document visit - http://www.openstack4j.com/learn/trove

 

Contributors: Shital Patil - shital.patil@gslab.com & Sumit Gandhi - sumit.gandhi@gslab.com

 

Last modified on
Hits: 689
Rate this blog entry:

Posted by on in Technology

 

Pre Computers Era

This can be termed as ‘pen and paper’ era. It witnessed the building of the foundation.  The concept of numbers became concrete. The zero was invented by Brahmagupta or Aryabhata depending on which way you look at it. The number systems evolved. The earliest known tool used in computation was the Abacus and it is thought to have been invented in 2400 BC.

 1

A number of devices based on mechanical principles were invented to help in computing leading to even analog computers. The computational theories also evolved with the advent of logarithms etc.

Computers Era

The concept of using digital electronics for computing leading to modern computers is recorded around 1931. Alan Turing modelled computation to lead to the well-known Turing Machine. The ENIAC was the first electronic general purpose computer, announced to the public in 1946.

 2

Since then the computers have come a long way. There are super computers.

 3

There are a variety of devices like mainframes, servers, desktops, laptops, mobiles etc. There are specialized hardware like gateways, routers, switches etc. for networking

 4

These enabled the culmination into internet and the World Wide Web as we know it. Storage arrays for all the storage related capabilities including snapshots, backups, archival etc. There are Application Specific Integrated Circuits (ASIC)

 5

so on and so forth.

Software Defined Era

Soon enough this hardware started getting driven by software. The software started getting more and more sophisticated. It evolved over paradigms like multi-tier architecture, loosely couple system, off-host processing etc. There was advent in the area of virtualization

 6

A lot of concepts in computing could be abstracted easily at various levels. This enabled a lot of use cases. E.g. routing-logic moved to software, and hence networks could be reconfigured on the fly enabling migration of servers / devices on response to user / application requirements. The tiered storage can be exposed as a single block store as well as file system store at the same time. It gives capability of laying out the data efficiently in the backend without compromising the ease of its management effectively from a variety of applications.

The cloud started making everything available everywhere for everyone. The concepts like Software Defined Networking (SDN)

 7

Software Defined Storage (SDS)

 8

leading to Software Defined Everything (yes, some people have started coining such a term that you will start seeing widely soon enough). Hardware is getting commoditized. There is specialized software on the rise addressing the needs.

Beyond Software

It is still not clear what will replace software. However some trends and key players have already started to emerge in this direction. There can be a number of components like open source readily available as building blocks. One might have to just put them together for solving the variety of problems without writing much code.  Computing has moved away from “computing devices” into general-purpose common devices like watches, clothing, cars, speakers, even toasters etc. Every device is becoming intelligent. The hardware ecosystem is more or less commoditized already, but software is also along the same path. Witness the proliferation of Openstack

 9

or IoT platforms for example. One might have to simply configure them to address the needs. E.g. Openstack cinder can be configured to clone volumes for creating test-dev environments efficiently. IoT can make a production plant efficient in real time by continuous monitoring, re configuration and management of its resources. It could be Docker containers that one has to only deploy for plug and play to have complete solutions in play. The hand writing recognition, voice commanded devices can lead to complete working solution on a matter of thought! The machine learning can provide already fully functional machines like smart cars etc.

Who knows, a day might come when without doing anything, everything will be achieved even through thin air so to speak! At this time it might sound like a wild stretch of imagination but just quickly reflect over the evolution of computing so far. It might take a really long time to get there. In fact, it might be time for no one making such posts but just a matter of making some Google searches, looking around with open eyes, feeling it with all the senses for everyone to have already grasped the gist of the message!

Last modified on
Hits: 589
Rate this blog entry:

Posted by on in Technology

The authors for this blog are Abdul Waheed and Paresh Borkar. 

Many organizations today still struggle with providing strong authentication for their web-based applications. Most organizations continue to rely solely on passwords for user authentication, which tend to be weak (to be easy to memorize), shared across systems, etc. Though there have been strides towards strong authentication mechanism like 2FA, adoptance has been low.

It gives me immense pleasure to announce that GS Lab is open sourcing its OTP Library asset. Abdul Waheed from GS Lab was instrumental in developing this asset, which is a standards based library that enables organizations to adopt One Time Password (OTP) based Two Factor Authentication (2FA) for Java/J2EE business critical applications, leading to improved security posture. It supports HMAC-based One Time Password (HOTP) and Time-based One Time Password (TOTP) standards and works with the free, off-the-shelf Google Authenticator mobile app to provide a friendly user experience.

 

Features

  • Java/J2EE based library - used on server side
  • Standards based support (HOTP and TOTP)
  • Supported client - Google Authenticator
  • Ability to generate QRCode (to be scanned by Google Authenticator)
  • Integration with server is simple, straightforward requires minimal effort
  • Support for security features like throttling, look ahead, encryption, etc.

OTP Library

Key Benefits

  • Add 2FA to existing Java/J2EE server applications
  • Standards compliant (HOTP and TOTP standards support)
  • Minimum integration overhead
  • Small footprint
  • Leverage existing free off-the-shell Google Authenticator Mobile app
  • Already adopted by market leaders like AWS for 2FA needs.
  • User friendly experience using QRCode
  • No costs associated with SMS/Text messaging and no related software requirements.

It is open source and can be easily downloaded from GitHub. Thank you Abdul for your contributions in making this happen!

Last modified on
Hits: 1116
Rate this blog entry:

Posted by on in Technology

This project was started with a thought of having an easy automation tool to interact with Openstack. Considering the challenges one has with existing Openstack CLI, this tool offers a very good starting point in overcoming those challenges

Setup

Unlike the existing Openstack CLI, this tool does not require any pre-requisite software to be installed. Openstack4j CLI is completely written in Java and consumes the API from openstack4j library; to run, it just needs JRE 6+ installed which in most operating systems is by default available. It comprises of a single executable jar that is portable on any Java or OS platform.

All-in-one

It’s an all in one solution - Single client for all the Openstack services. Openstack4j CLI is all in one, it bundles all primary Openstack service clients into one; mainly glance, nova, neutron, cinder etc.

Easy to use

Fluent CLI's, easy to use and understand CLI commands to do precisely what is needed. With Fluent and easy to understand commands, it takes care of dependent resource creation for the particular cases where resources from other Openstack services (neutron, cinder) are needed that encourages automation and abstracts out unnecessary complexity from user so that he can focus on intent of operation.

Smart

Inbuilt memory feature that remembers the output of the command. Openstack4j CLI comes with an inbuilt memory feature, that saves all the resource Ids generated from previously executed command and automatically replaces the values in subsequent command as and when needed.

More Info: http://vinodborole.github.io/openstack4j-shell/

Last modified on
Hits: 982
Rate this blog entry:

Group-Based Policy Overview

With the popularity of Openstack and growing community-based initiative in more than hundred countries; there is a major community-based initiative by thousands of contributors. It's time to focus on real challenges that involve deployment and delivery of applications and services with flexibility, security, speed and scale rather than just orchestration of infrastructure components. In order to achieve this there is a need for a declarative policy engine. One such project is Group Based policy.

The advantage we get in using Group-Based Policy (GBP) framework is the abstraction that it has which reduces the complexity for any developer to configure network, security for its infrastructure. More over these abstractions are general enough to apply to computing and storage resources as well.

The different sets of components that form GBP are elaborated in the figure below.

 

VB

Openstack4j Popularity

OpenStack4j is an open source Openstack client which allows provisioning and control of an OpenStack system. This library has gain quite popularity in the open source/java community for the simple reason that it has the most fluent API’s available to interact with Openstack.

It also lists in the Openstack official wiki as the java library to interact with Openstack

https://wiki.openstack.org/wiki/SDKs

Support for GBP in Openstack4j

As openstack4j was the most widely used library amongst the developer community it was a good idea to have the support for GBP as well. With its simplistic fluent API and intelligent error handling the experience of interacting Openstack has been made its easy.

Example code to interact with GBP:

Policy Actions

PolicyAction policyAction = Builders.policyAction()

.name("all-action")

.actionType(PolicyActionProtocol.ALLOW)

.description("all-action")

.build();

policyAction=osClientv2.gbp().policyAction().create(policyAction);

Reference:

  1. http://www.openstack4j.com/
  2. https://github.com/ContainX/openstack4j
Last modified on
Hits: 987
Rate this blog entry:

Posted by on in Technology

Overview

There has been much discussion around various authentication methods, which range from username-password to leveraging OTPs, hardware tokens or biometrics, to client certificates etc. Each of these methods provide varying level of confidence in the overall authentication process. This makes one wonder which authentication method is best for a particular organization’s needs. The fundamental question is - is there is any one ‘silver bullet’ authentication method? The answer is ‘no’. You may need to decide which one to use depending on the environment and context.

Understanding the need

As an example – let’s compare an employee who is logged on to your corporate intranet (probably using AD domain authentication), requesting access to an intranet application, with someone from outside. In the latter case, you would want to request for stronger authentication to ascertain the identity of the person. Here you may choose to ask for OTP in the authentication process as an additional factor. This is a good example of leveraging context to determine the type of authentication required.

Let us consider another scenario where someone is trying to access a privileged application outside of business hours or from an unknown IP address. In such a case, again you would want to request stronger authentication depending on the nature of the privileged application.

Understanding the authentication context

Context is essentially the surrounding detail about the environment, which can be determined passively (i.e. without need for user intervention). Some typical examples of context include:

  • Location context - Using geo-location to determine where the user is logging in from.
  • Known machine - Has the user logged in using this machine before? This is typically done by computing something known as a device fingerprint and tracking it.
  • Time of the day - Is the user logging in at an odd time of the day or night, which does not match with the users' typical login patterns?
  • IP address – Has the user logged in from the same IP address before?

If we look at the above pieces of information which form the context, then we realize that leveraging context-aware authentication essentially means ‘compare the current context with what is considered normal for that user’. Thus, we have to first establish what can be considered normal behaviour for any given user. This is where analytics come in to play. Using intelligent analytics, we can identify typical normal patterns for users and this system keeps on learning newer patterns or registers outliers. Based on these learnings, it can request for step-up authentication whenever required.

How does this work?

The solution closely follows and tracks user activity to determine normal patterns (using analytics). For every new authentication attempt, the system compares the authentication context with what is considered normal for given user. It identifies the variance from the normal level, and translates that variance to a risk score. Depending on the risk score identifies, it determines the need for step-up authentication along with the type of step-up required.

For example – a user’s typical pattern is to login from North America during business hours. Now this user tries to login from Asia Pacific region from a known machine, then she/he will be prompted for OTP as well. If this user tried to login from Asia Pacific region from an unknown machine, then she/he could be prompted for biometric authentication as well.

How does this help?

The end user is not prompted for strong authentication unless there is an explicit need for it. This helps provide a better user experience while doing the delicate balancing act of providing strong authentication whenever required. Best of both worlds!

Last modified on
Hits: 1254
Rate this blog entry:

In my previous blog post, I explained the general concept of Streaming Analytics and the kind of problems it can solve. In this post, I would like to discuss how traditional Big Data analytics and Streaming Analytics are different, and why Streaming Analytics is becoming a very crucial component for modern applications (even before the application data reaches a stage of qualifying for conventional Big Data analysis)

Near Real-time or Real-time? This particular aspect, in conjunction with your business needs, will influence your decision of whether to go for Streaming Analytics or a Big Data (Hadoop based) analytics or a combination of both.

As a classic example of Big Data - let’s consider the example of retailer giant Amazon (most common and easy to understand). You may have noticed - when you shop at Amazon.com, based on your shopping history and the items you are currently searching for – the portal offers suggestions, such as, ‘products you may be interested in’, ‘customers who bought this also bought’, ‘frequently bought products’ etc.

With hundreds of thousands of people shopping throughout the day on Amazon.com, have you ever wondered how much data the system could be processing? It is mind-blowingly large and qualifies as a ‘Big Data’ problem, which is typically solved using Hadoop like systems. The Big Data analysis is continuously churning out or modifying existing analytical models and algorithms. These are then applied to the data to come up with contextual suggestions, or to re-target frequent customers with discounted products, or offer discounts based on user history, or the holiday season that may be on. These models incorporate feedback on forums and social media (Facebook, Twitter), impact on sales of types of products (due to factors such as season, holidays, geography, age etc.) and set new prices for next shoppers etc. Using such continuous analysis, Amazon manages massive inventory and supply chain ensuring optimum distribution of their inventory.

Although consumers are provided with immediate suggestions and context based results, is this achieved by Streaming Analytics? Not really. The example above talks of ‘Variety’ and ‘Volume’ but the ‘Velocity’ aspect is not really significant here. And even if data was pouring in at high volume - it is first settled in HDFS and then MapReduce based computation is applied to carry out a very ‘structured’ (meaning know exactly 'how' the data is to be analyzed), and more importantly carry out a ‘batch’ analysis. The analysis results are useful only after the entire batch is processed. And the time it takes is long enough for a Streaming Analytics system to qualify it as ‘too late’ to be able to make any ‘real time’ decisions.

Analytics of Data in Motion or ‘Streaming Analytics’ really deals with:

  • Very high velocity and high volume of data - at a minimum of few hundreds of thousands of events per second
  • Immediate analysis or predictions of favourable or unfavourable events that you want to detect early - before the data settles into the disk or HDFS for a more structured and known analysis
  • The analysis could be ad-hoc as against a structured MapReduce based computation
  • Not needing to depend on batches to finish being processed. You want to act as soon as the "first" indications of a possible problems are visible. True "Real Time"

E.g. A sensor detecting a spike in the voltage in a manufacturing plant may raise an event which would trigger shutting down the affected production line to prevent possible damage. Another example: Thousands of devices and sensors (switches, routers, VMs and their associated devices like ports, cards etc.) in an infrastructure setup may generate hundreds of thousands of signals each second collectively. You want get rid of false positives and detect a sequence of events that can lead to a failure (e.g. – if the fan of a server goes down, you could predict the chain of events that may follow - heating up of the CPU, diminishing CPU utilization, longer response time leading to the VMs going down). So as soon as the fan goes down, you want to make your setup pay special attention to the performance of the VMs running on that machine or redirect the requests to other VMs, till the fan comes up.

But didn’t we say earlier that for Streaming Analytics, time and size both are very crucial? The very fact that hundreds of thousands, or even millions of events are to be processed per second - gives the problem the ‘Big Data’ character. But the purpose is different - to take decisions when the data is flowing - without having to store it in HDFS or without waiting for the batch to complete processing. Depending on what problem you are solving, you may even not need to persist the events in HDFS or DB/ NoSQL at all. You may take a decision, do a course correction in matter of few milliseconds and forget about those very events that enabled you take a decision.

Talking of scale, LinkedIn is a great example. Look at these statistics related to LinkedIn user activity over the last 4 years:

  • 2011: 1 billion messages per day
  • 2012: 20 billion messages per day
  • 2013: 200 billion messages per day
  • 2015: 1.1 trillion messages per day

1.1 trillion messages per day is a whopping 4.5 million messages per second. The processing at such a volume is done to ensure ‘real-time’ insights into operational efficiency of the IT infrastructure of LinkedIn. It is an excellent example of why conventional Big Data (Hadoop like systems) won’t work. Data has to be processed the ‘Streaming Analytics’ way - on hundreds of nodes in parallel, blending, filtering, enriching and analyzing data, rejecting unwanted data and passing on the filtered/ enriched data for further analysis or to a data store as ‘analysed result’.

On the other hand, if you look at the "people you may know" feature of LinkedIn - that is a result of Hadoop based Big-Data analysis - where the analysis built on existing data across millions of users and their connections, history etc., may already be in place, continuously updated and is only presented to the user when he logs in.

Today's applications are handling data which is many orders of magnitude larger than the data they used to handle a few years ago - look at the LinkedIn example above. It’s time to make applications ready for Streaming Analytics - the new way of data processing. Streaming, processing and analyzing of extremely high volume and high velocity data in parallel across multiple nodes to derive actionable insights that will control the behaviour of the application is going to be a key architectural consideration in applications of today and tomorrow. Application owners and architects need to gear up to embrace Streaming Analytics as a key building block in their application ecosystem.

At GS Lab we have developed a Streaming Analytics platform that provides a complete solution (data ingestion to processing to analysis to visualization), and is designed to seamlessly integrate into the applications ecosystem as a data processing and analyzing engine that is highly scalable and customizable for applications in any domain. Converting insights from high volume and high speed data into high operational efficiency, conversion to business opportunities and increased ROI should no longer be the privilege of a few. GS Lab's Streaming Analytics platform enables enterprises - small, medium and large - to take advantage of their data to achieve high efficiency, optimization, high ROI, and savings.

Last modified on
Hits: 1283
Rate this blog entry:

Streaming Analytics is generating plenty of buzz these days. We’ve already discussed the concept in this previous blog post by Mandar Garge, Stream It On. Streaming Analytics can be broadly defined as the analysis of data as it is generated or moving in your application ecosystem.

Until about 5 years ago, the compelling need for Streaming Analytics was not felt or the technology that makes it possible was not viable and affordable. Today, ‘out-of-the-box’ Streaming Analytics solutions are no longer in favour with the industry; rather, there is a preference for customized solutions. Each organization has a uniquely different problem when it comes to handling the deluge of data. Additionally open source tools and technologies that have been proven and tested over time for solving high volume data problems are now available.

Let’s discuss what is takes to create customizable Streaming Analytics solutions. I will share the typical considerations and component architecture while building a Data Pipeline to perform Streaming Analytics at the required scale.

The What and Why

Analysing streaming data is necessary to make real-time decisions based on the insights from it. Let’s try breaking this up.

Real-time

While the definition of the time-window for Real time decisions varies per problem, in most cases Streaming analytics aims at analyzing data as it being generated. Depending on the business impact and the latencies between data generation and decision making the definition of real-time it could vary from a few minutes to a few milliseconds.

Decisions

In most cases the decisions as a result of Streaming Analytics are predictive in nature. Some quick examples of these decisions are:

  • Fraud detection: Detecting a fraudulent or malicious financial transaction within a few milliseconds of it occurring. Preventive actions such as blocking the concerned transaction and alerting the parties concerned also become possible.
  • Anomaly detection: Sending out warnings or alerts to the IT staff when critical components in the enterprise IT infrastructure start to show signs of malfunction.

Depending on the use case, the preventive or remedial actions resulting from Streaming Analytics could be either manual or automated.

Considerations 

Conceptual model

Streaming Analytics is typically needed when it is a non-trivial effort to analyse data within a meaningful time-window using traditional tools (like an ETL or even Excel). Streaming Analytics usually applies to situations where:

  • Data that is being generated is high-volume, typically starting at orders of a few billion data points a day
  • Decisions are taken based on the data analysis in real time

The combination of software systems used to analyze streaming data is typically referred to as a Data-Pipe (or Pipeline).

Designing Data Pipelines

Some of the important points to consider when designing data-pipes are:

  • Speed/ Throughput

    • How quickly can the system process data coming to it?
    • How many messages/second or bytes/second can the data-pipe process?

When large volumes of data are to be processed for decision making it becomes really critical to process it quickly. Delayed analysis has lesser and sometimes no value at all. A well designed data-pipeline has a high throughput.

  • Latency

Analysis is number crunching and will invariably consume CPU cycles and time. Any pipeline should not throttle the flow of data significantly. This again would lead to delayed analysis. Also it is important to ensure that the components of the system do not introduce any bottlenecks that snowball latency over time. Latency should always be minimal and constant.

  • Flexibility

A data pipe, as we will see later, is a heterogeneous structure with each component having a specific function. For example there would be:

    • A data aggregator,
    • A queue,
    • An engine to perform streaming functions on the data,
    • An index or database where it would be stored,
    • A view component to visualize data

Given the pace at which Streaming Analytics as a field is gaining momentum, it is important to keep components loosely coupled. This is so that there is freedom to choose the best component for a particular function.

  • Scalability

Since the volume of processing in such data pipes is very high, often these systems reach the limits of the hardware that they are deployed on. It should be easy for such systems to horizontally scale by adding more nodes to the cluster.

  • Fault Tolerance

Fault tolerance is closely related to scalability. Nodes on a cluster should be able to auto-recover in case one of them goes down and the data pipeline should not lose any data

Components

Let's look at some typical components that constitute a typical data pipeline.

Data Collection

Data sources

 

Data sources are diverse and their nature depends on the domain of the problem. There could be multiple sources of data that feed to a single data analysis pipeline. A data pipeline would need to have connectors to various systems that it feeds from. For example, SNMP, Log files, RSS feeds or other custom connectors.

 

Input Queue

Queuing

An input queue unifies the data collected from various sources. This helps to perform real time analysis on data being generated by various systems and correlate them to get valuable insights.

 

 

Stream Processing Engine

Stream processing

The Stream processing engine is the heart of any data pipeline. This component allows analysis of data in a snapshot as well as data within time windows/ moving windows. A rule configuration mechanism coupled with this allows the system to be setup to analyze data based on custom rules. This component would most likely use an in-memory or distributed cache to maintain state across messages.

 

 

Data Store

Databases

The analyzed data should be routed onto a data store so that it can be read for charts and other representations. The system could decide to store only a curated subset of the total events (messages) that occurred.
In order to be able to serve downstream visualization, the data store should support high speed writes and fast queries. The choice of data store type would be driven by the complexity or variation of data, indexing capabilities and possibly need for features like aggregations.

 

Visualization

Visualization final

Finally there needs to be a visualization component which allows you to visualize the data in predefined ways and also allows easy customization. 

 

 

 

 

 

Overall Component Architecture

Overall architecture final

As we have seen the overall component architecture involves stitching these components together so that the functionality, performance and reliability required are achieved.

There are various other aspects like DevOps, runtime configurability, etc. that need to be considered while building such systems that are beyond the scope of this article.

Watch this space

The space of Streaming Analytics is still nascent and evolving. The problems are different and unique. The solution therefore needs to be flexible enough to solve any problem that deals with analyzing data at real time. Businesses are discovering that problems and business cases can be better solved with a customizable Streaming Analytics solution based on open source tools rather than rigid and expensive analytics solutions.

At GS Lab we have developed a Streaming Analytics platform that allows businesses to configure and customize the platform to suit their business domain. The platform can be customized by building data pipelines with tools and technologies that best suit the problem to be solved. Additionally customization can be built on top of this data pipeline within days and not weeks or months.

Also at GS Lab we have also been leveraging tool-chains like below to build custom solutions for customers:

  • Apache Kafka, Apache Storm, Elasticsearch, Kibana/custom Dashboards
  • Apache Kafka, Apache Flink, Cassandra
  • Apache Kafka, Apache Spark, Elasticsearch, Custom Dashboards
  • Logstash, Elasticsearch, Kibana

Watch this space for solutions that GS Lab is has to offer and the specifics of Streaming Analytics solutions in more depth.

Last modified on
Hits: 1782
Rate this blog entry:

I have always been fascinated by the way in which real-time streaming technology has evolved. Today this technology can be used to deliver multimedia content simultaneously to participants of a network-based communication. Multimedia content may include audio, video, graphics, animation, images, text, etc. To be effective, streaming multimedia is presented in a continuous fashion, and excessive delays or missing content can be detected by participants. Often, buffering techniques are used to enable a consistent presentation of content, given an inconsistent transmission and receipt of content.

This transmission of multimedia content, which includes audio and video, in real-time to multiple recipients may be referred to as audio-video conferencing. Audio-video conferencing offers a number of advantages such as real-time communication capability between multiple participants, without the delay, cost, scheduling, and travel time of face-to-face meetings. Audio-video conferencing may make use of the Internet and associated Internet protocols to deliver content to the various participants. This greatly extends the connection capability of audio-video conferencing to a worldwide range.

One challenge which I have personally witnessed is that the quality of service in transmitting real-time streaming data over the Internet cannot be guaranteed, and disruptions may be experienced frequently. Disruptions really play a spoilsport in an important meeting where people just get dropped from the video conference.

In some cases the disruption may be of a short duration, but many participants of audio-video conferencing have had frustrating experiences in which the real-time streaming of data failed and the conference was abruptly terminated.

As we all agree that ‘necessity is the mother of invention’, I started my research along with colleagues to try and understand what existing solutions are available to overcome this annoying disruption.

The existing solutions for audio - video conference failover were designed around having a secondary MCU (Multipoint control unit) in case of network failure with the primary MCU. The MCU is a server component which is usually costly hardware requiring a lot of configuration and bandwidth allocation. Here are some typical examples below:
https://patents.google.com/patent/WO2002067120A1/en?q=audio&q=video&q=failover&q=mechanism
https://patents.google.com/patent/US20030142670A1/en?q=audio&q=video&q=failover&q=mechanism
https://patents.google.com/patent/US6839865B2/en?q=audio&q=video&q=failover&q=mechanism

We felt that there was a need for a solution that would be simple and built with something that we already have. There was a need to design an innovative system and process to join the dots.

The solution we proposed is to utilize the existing resources within the conference, which are generally the client endpoints instead of high end MCUs. 
In this new client based MCU selection -

  • The client end points are always available so the system can proactively nominate one of the client endpoints which has conference hosting capability .
  • The conference hosting capability can be judged based on the hardware capability and the network in which the endpoint is located.
  • It would be preferable to have an endpoint which is the moderator of the conference, since the moderator usually stays in the call for the entire duration.
  • Even in the case the moderator leaves the call, a new client endpoint in the conference will be nominated as a secondary MCU.

The solution devised by us offers a number of powerful benefits:

  • Low cost
  • High efficiency
  • Ease of implementation

[Editor's Note: This blog post describes Sagar's contribution to the patent 'Maintaining Audio Video Continuity' while he was working with his previous employer.]

Last modified on
Hits: 1704
Rate this blog entry:

Posted by on in Technology

If you ever thought that the data in your application's ecosystem had valuable insights, you were right and chances are that you are already leveraging those. But if you thought the value was only in the archived data ('data at rest'), think again. The 'moving' data has equal significance, and often, more business value and insights hidden in it.
 
Streaming analysis, as the term suggests, refers to analyzing the data right upfront - when data is 'streaming' or 'moving' in your application ecosystem. It’s not a rear-view-mirror analysis, but rather a front-view one, allowing you to steer your business, based on what you see happening in real time (often keeping in mind the rear-view). This needs a shift in mentality - from ‘batch processing’ to ‘stream processing’.
 
Insights into moving (real-time) data are very often known as 'perishable insights' - those that are emerging from urgent business situations and events that can be detected and acted on at a moment's notice. They are perishable - if you do not act on them immediately, they lose their value, and you potentially lose a business opportunity (which you may not have realized existed). These simple or complex events can uncover hidden risks, as well as untapped business opportunities only if you act immediately.
 

Streaming analytics, most of the times has to do with the streaming of massive volumes of data. It has be a system capable of handling ‘big + fast’ data rather than ‘big’ data alone. It has to be:

  • A high performance system, processing data at extremely high velocity and volume. It should adjust, transform and normalize data extracted from numerous sources with a variety of formats. It should be capable of processing tens of thousands of events per second.
  • Able to offer rapid integration with different data sources (system data, application data, market data, social networks, transactions, mobile devices, IoT devices, sensors, images and files).
  • Easily scalable and fault-tolerant.
  • Capable of analyzing the data for warnings, alerts, signals and patterns - all in real time.
  • Offer itself as a platform with application development capabilities and development tools - from proprietary SDKs to those based on open-source frameworks (Spark, Storm, etc.), technologies (Java, Python, Scala) and tools (NoSQL databases, Relational databases etc.).

 
The peculiarity of such a platform is heavy ‘in-memory’ analytics, as analyzing the streamed data after it has been temporarily written to a disk is a ‘too late analysis’ and the data stream may lose steam.
 
Streaming analytics system has to have following capabilities::

  • Rich visualization and monitoring
  • Intelligence to detect urgent, problematic or opportune events
  • Automatic course correction/ real-time responsiveness - as simple as alerting or messaging the stake holders or launching a complex business workflow

Furthermore, intelligence can be built in to predict what might happen in future, based on what is contained in the moving data and co-relating it with what was found in the previous month's or year's, or may be even previous 20 years' data - all in real time.


Let’s have a look at some examples of how real-time analysis can make a difference:

  1. Fraud Detection: Periodic processing of a credit card’s transactions would reveal a fraud much after it has happened. Real-time analytics enables detection of a fraud while the transaction is in process, allowing the system to automatically stop the suspicious transaction before it’s too late.
  2. Supply Chain: Manufacturers who analyze weekly or monthly reports of production orders, to make adjustments to their production, can greatly benefit from streaming data analysis (of how their product is being sold off the counter) to make immediate adjustments in the processes to yield optimum output or avoid creating an overstock.
  3. Online Retail: Retailers can process disparate data streams (ERP, CRM etc.), and historical purchasing data/ shopping patterns of a customer, to offer more contextual offers which may lead to more sell of their products than the buyer originally intended to buy. Have you ever seen a “people who bought this also bought these items” message? Real time analysis! Retailers have identified this huge business potential just because they made use of this very short “right-time” window to make contextual offers.
  4. Ad Exchange and Ad serving: Have you ever noticed that when you visit your favorite news website, you see the advertisement of a product you searched on the internet a couple of days back? Chances are that this ad was only served to you and some others, based on search history. To earn this spot on the webpage, thousands of ads have undergone a bidding process in blink of an eye – thanks to real time analysis!
  5. Real time stock price prediction: Analysis is done by integrating huge data from firms like NYSE, twitter messages (from relevant financial community) about the stock, public sentiment about the stock, and correlating it with the results of complex statistical modeling algorithms on historical data of the stock.
  6. Global sports betting market: 80% of betting takes place after the event has started. Prediction of pricing movements using only batch processing on historical data before the start of the sports event is hardly of any use.


The point is that all of the examples above have been around for a while, performing real time analysis, albeit by using home grown algorithms or existing commercial systems. Streaming Analytics aims to provide a platform where more and more organizations can do this by building a thin layer on top of this platform.

Digital data in tech as well as non-tech organizations is almost doubling every year. The IoT’ization will only increase this volume multifold. Many do not know what exactly is to be done with this enormous amount of digital data, or are still gearing up to use conventional analytics effectively. But those who take the 'streaming analytics' path will leap ahead and gain tremendously from the descriptive, predictive and prescriptive analysis – all in real time.
 
Customized solutions created by building on top of, or integrating open-source components such as Spark or Storm, Kafka or RabbitMQ, Elasticsearch or Solr, Cassandra or MongoDb or Aerospike, customized ETL processes, BI or visualization tools – will be the most sought after solutions and will give the commercial streaming analytics biggies a run for their money in time to come.

Streaming analytics is not a replacement for conventional analytics. It complements and supplements existing techniques to make analytics more intuitive and valuable to organizations. If conventional analytics on massive volumes of data let your organization make intelligent business decisions quicker, then streaming analytics is the equivalent of firing the booster engine of a rocket and flinging it into a higher orbit. Why wait till the end of the week for the data or month to derive hidden business value when you could get it within minutes, seconds or even milliseconds? Just Stream it on and zoom ahead.

Last modified on
Hits: 1994
Rate this blog entry:

Logging – for a large part of the history of computer software, creators of software products have had a love-hate (but mostly just indifferent) relationship with logging. They love logs when products run into problems – which is more often than most people would imagine; they hate logs because software doesn’t do anything automatically and they have to tell it to log its every whim and fancy; and when software runs the way it’s expected to, well, then nobody really cares what’s in the logs. It’s like what parents are to a teen, love ‘em when you need something, hate ‘em when they make you explain where you were at night and don’t really give too much thought when they’re not doing either.

The situation is not helped by the fact that logging sits on the intersection of two very important concerns of software businesses and developers – how to use the least amount of computing resources and how to take the least programming effort possible to achieve what you want to, respectively. And guess what, logging requires resources and you have to write actual lines of text by hand to ensure it’s done; clearly not a tenable position. And so, best case scenario, it gets treated as a necessary evil; worst case scenario, it gets cut down mercilessly.

Or, at least, that was the case until very recently.

Over the last few years, logging has come out of the shadows and it’s now actually fashionable to talk about ‘log management’, ‘log analytics’, etc. A number of factors have caused this shift and many of them are the same things that cause any such shift – abundance of cheap computing resources, more powerful hardware and a good availability of open source components. It has become as simple as setting up one tool-chain to get started very quickly on understanding what’s happening in the logs from your application. However, beyond being easy enough so that anyone with even a little software experience can do it, here are three reasons why you should care about it and get started as soon as possible.

There’s hidden data in your logs that you don’t know of

Logs are a geologic record of exactly what went on in your product over time. Not only do logs give you access to what happened at a specific point of time in the past – the traditional use, analyzing them also exposes trends in the usage of your product and in problems that your product runs into. These data points are useful to product managers and support teams alike to understand users and provide them a better experience. Traditionally the domain of time-consuming discussions and ineffective surveys, this need can now be served effectively by looking at the insights hidden right inside data your already own.

With a low barrier to entry, anyone can get in on the fun

There are several log analysis products and services covering the spectrum from comprehensive log management to hosted log analysis and visualization, which offer a range of plans for their services. Irrespective of whether you are a startup or an established large company, you will be able to get a solution that exactly fits your needs. With such a solution, you will be able to keep dumping information in logs while deciding which of it is useful at a later stage. Unfortunately, so will your competitors.

You can actually use logs as a data sink

While it seems counter-intuitive, there’s a perfect case to be made for using logs as a way of storing and processing data. While you have to use resources and spend development effort when trying to store data about various events and objects in your system, with logs all you need to do is dump that data into the logs. Then, with an appropriate setup, you will be able to extract the results you want out of these either on the fly for immediate action or later for historical analysis. Definitely beats having to worry about creating the perfect database design to hold all the data that you would ever want for analysis.

Given the state of log management and analysis ecosystem, it’s not only easy but also hugely beneficial to look at your own logs to derive knowledge about your users and your product. This is sure to result in stronger products and better user experiences. Happy logging!

Last modified on
Hits: 7580
Rate this blog entry:
Very low screen size go to mobile site instead

Click Here