StreamInsight in SQL Server is a Complex Event Processing Engine which can manage and mine the data in real-time. A practical application has developed to stream Twitter data and analyze this data with Microsoft Azure and Hadoop. Application can be downloaded from here. Power Pivot, Power View and Mobile Devices can be used as Front end tools to analyze this data. To get an understanding on Big Data you can read this post and below is Microsoft picture of Big Data.This post outlines the Microsoft’s view on Big Data and explains the steps in Building the Twitter and Big Data application.
You can notice on the left hand side in above picture a number of devices which feeds the data into Microsoft eco system which in-turn flows into StreamInsight. You can also leverage tools like Hadoop on Azure and Hadoop on Windows Server in Microsoft Big Data. SQL Server Parallel Data Warehouse is a destination for different enterprise systems. Once the data feed into the SQL Server Parallel Data Warehouse then you can have the rich reporting infrastructure to report on their data. You can then use the tools like Reporting services, power pivot , performance point and third-party tools to visualise this data.
Big Data is really compelling from Microsoft point of view and they tools which have roles and responsibilities together they can build a rich echo system as shown above.
The steps to build the Big Data Twitter application are
1. This application uses the StreamInsight in SQL Server to collect and process twitter data.
2. Send the tweets from twitter to SQL Server Azure instance which contains the high-level tweets.
3. The actual body of the data is stored in SQL Azure Blob store.
4. Send the processed data then to Hadoop on Azure to produce the maps.
5. Finally you can consume those feeds through power pivot and power view.
The Architecture diagram for the above steps from codeplex as below
After sending the tweets to StreamInsight engine, the Input Adapter returns the sentiments(positive, negative or neutral) about tweets. Now tweets transferred to analytic engine and output adapter produces the high-level data that can feed into SQL Azure and you can notice body of the data moving to SQL Azure Blob store. You can do the real-time aggregation and streaming with HTML5 dashboards.
The alternative path is you can map the SQL Azure data to HadoopOnAzure and then do the Map/Reduce jobs on that data.
StreamInsight engine is a nice enterprise edition feature that allows you to do the complex event processing on hundreds of thousands of records per second typically suites for Manufacturing and Financial applications.StreamInsight engine can produce super high volume of data processed per second.
The below is the application screen shot which followed the discussed steps
As tweets gets send from StreamInsight engine to Real Time Dashboard you can see total tweets and sentiment engine and also you can actual tweets that come through.
The same information simultaneously send to the Azure instance which looks as below
this table contains the header information which gets updated in real time. You can now navigate to the SQL Azure portal to see the data in Blob store. The storage account view as below
The containers actually hold the detailed information of tweets.
The same can be viewed using Cloud Explorer
so now how it looks on HadoopOnAzure, which basically takes the unstructured data and produce the information which you can display in presentation tools. With the latest version of power view and power pivot you can run the reports from within the excel. You can do the analysis on the data in power view as shown below.
The key step in this application mapping Azure Blob storage to HadoopOnAzure.