Become a Data Driven Business With Chartio and Amazon Redshift: Part I
Becoming a Data Driven Business starts with a solid BI strategy. Initial findings from new MIT research show that companies with data-driven decision making environments had 4% higher productivity and 6% higher profits than other businesses did. This according to Andrew McAfee, a principal research scientist at the MIT Center for Digital Business.
We have done several projects recently, implementing new BI strategies for clients that have selected Chartio as their visualization platform. We built the data infrastructures, data pipelines and configured the initial dashboards and developed the SQL queries that drive the charts on the dashboards. The end result is business friendly metrics in an easily consumable format for both executives and technical users alike.
Businesses can now drive these projects from end-to-end by utilizing the cloud and remote managed services, realizing the dream of a data driven business without impacting existing applications or adding new technical staff.
This article talks about the process and underlying considerations to enable a smooth project and get the most out of your data.
BI as the Source of Truth
Deciding on a business intelligence tool for your business can be one of the most important decisions you make. Not only are major business decisions at risk when not backed by data, data visualizations today are the source of truth for the products and services that your customers, investors and executives rely on.
“Analytics is all about understanding the relationships between various events. But as you record more events, relationships between events increase exponentially. Understanding them all becomes even harder, and this is where biases or assumptions are likely to creep in. When no single relationship stands out as especially relevant, it’s easy for people to pick out the one or two that make the most sense so they can say their decision is supported by the numbers, even though it may not be.” – Nate Silver
This is especially true when it comes to Big Data size data stores. As investor and writer Peter Pham writes, “Data is no longer a static disposable resource that loses usefulness once it has served its singular purpose. Its life may be extended through multi-use, multi-purpose data processing. As a renewable resource, its value should be assessed not by the bottom line, but as an asset that not only grows in value but one which further provides value creation opportunities.” As a result unexpected insights can be gleaned from data when it is presented in a human friendly, visual manner.
New Stories Emerge from your Data
Data driven organizations possess the ability to seamlessly view all their data both cloud, on-premise, IoT, social or big data and bring it all together to see new stories emerge from the data.
The path to estimating the scope, complexity and total cost of ownership of your BI project is to first understand the below:
- Data Sources – How many, locations, format, structure all matter.
- Data Integration – Data from different sources needs to be conformed.
- Data Transformation – Conforming data requires constant processing of new data.
- Real time requirements of the business – Business often need data to to be real time.
- Volume and relevance of the data at rest and in motion – How fast the data changes at high volumes relative it’s relevance over time.
Selecting the Right BI Tool
There are many tools out there to choose from, each with their own strengths and weaknesses relative to the requirements. Use cases for selecting a tool tend to be driven by the following categories:
- Operational BI – Charting business processes end to end.
- Analytics – Deep dive analytics to discover new insights from the data.
- Self Service – Typically client facing reporting on usage, stats, etc. often embedded.
- Reporting – Summary data on events, trends, usage etc.
- Dashboards – Real time stats from multiple data sources combined to tell a comprehensive story.
Most BI projects involve two or more of the above use cases. One of the reasons our customers choose Chartio is that they can do all of the above with it. We recommend starting with an Agile approach and build out each use case one at a time. Chartio allows you to stand up your data quickly and has one of the shortest time to value:
- Chartio is a hosted service which allows for conveniently fast implementations.
- Save on the cost of hosting, servers, install time and ongoing management and technical team costs which are significant.
- Very intuitive usability purpose built for the business user but with advanced SQL, Filtering and post processing features built for the analyst.
- Chartio supports a myriad of data source with VPN linkups that speeds up access to the data wherever it is making it very fast to value.
- Chartio’s proactive caching of report and dashboard queries makes for fast interactive conversations with the data for users.
- No meta data layers and fancy proprietary language to learn and hire staff for.
- Easily share and access data. deal for distributed teams/executives and customers alike.
- Supports embedding dashboards into any web app for a seamless experience.
- Post processing allows for easy manipulation of your underlying data and does not limit you by a data model so you can do complex calculations on the fly.
Chartio has one of the most user friendly and intuitive user interfaces which allow non-technical analysts to quickly navigate the tool and generate visualizations. Adding data sources is another very strong aspect to Chartio. There are dozens of datasource types supported, including Amazon Redshift, MS SQL Server, Mongodb, Heroku, MySQL, Postgres, and many others. Most of the sources can be added simply by entering credentials or a connection string. SSH tunneling is also supported for when the data is secured.
Knowing your own data is most important to drive requirements. When you are ready to get started with your BI strategy, we suggest a few steps to guarantee the success of your project.
- The first step is to develop the dashboard requirements. This will drive the structure of your data, the network infrastructure needed and tool selection used to for normalization, visualization and storage.
- How close to real time do your visualizations need to be?
- How customizable do you need your graphs and charts to be? And how frequent will they change?
- What underlying trends in your data will you be observing, and how close does that fit your current data model?
- And perhaps most importantly do I have the right technical staff to handle this project and run the system ongoing.
The answers to these questions will help determine the scope, complexity and cost of the project.
Let’s talk about your BI project now. Contact Us!
Operational Data Stores (ODS) with Amazon Redshift
The next step is to plan your data strategy. In some cases you may get lucky and your current data structures will be suitable to run analytics direct. But that’s not common for mid-size and larger firms. That’s where an Operational Data Store (ODS) comes in.
Operational Data Store is just another database that collects data from all the sources and presents it to the BI tool. It allows collection and transformation of the data for the sole purpose of visualization without impacting the source database or applications. This allows running queries against it all at once, making sure the same unification is happening to the whole data set making it easier to switch or have multiple analytics tools referencing the same data for different views or use cases.
Amazon Redshift is a great option and here’s why:
- The trend toward massive data lakes is not for every business and Redshift allows for a more agile approach.
- Redshift allows you to start small then grow with manageable costs.
- Many services are built around Redshift to get data in and process it along the way such as Alooma, Xplenty, FiveTran, Segment, Stich among others.
- A Proof of Concept can be quickly and cheaply turned up with your data then operationalized for production all in one continuous efficient project.
- Creating a new trend which offers faster time to benefit, lower TCO and yet still supports complex data migration, transformation and real time analytics. If you know what you’re doing.
AgilData builds and manages ongoing Operational Data Stores using Amazon Redshift and Chartio. Contact Us!
In most situations data needs to be moved to the Operational Data Store, unified across sources, and restructured in a way that makes analytics queries accurate across time and category in a structure that allows for performant queries.
Data Pipeline Orchestration
Data pipeline orchestration is the process of moving data from different sources (relational dbs, other data stores, Google Analytics, csv uploads, etc,) and inserting into a the Operational Data Store or Redshift in this example. The data also often needs to be normalized, transformed to a standard and efficient model and unneeded or sensitive data removed. This often requires custom algorithms to be designed and written that fits your data and analytics needs. This process also will be continuously running adding new changes and data in either streaming or batches.
The data pipeline complexity and cost are determined by the need to be real-time or not. For data which can be hours or even days old, many ETL tools will work, running large batch jobs to update your whole datastore as often as needed. However your choices become much more limited when you need close to real-time. One such tool which allows for continuously streaming data from multiple sources, through transformations, into a final data store is Alooma. Check out our post on Alooma for more information.
Another big decision to be made is the final datastore which Chartio or other analytics tools will be reading from. While Chartio does allow for many different data sources, it is much simpler to have all of your data in the same place. This allows you to run your own queries against it all at once, makes sure that the same unification is happening to all data, and makes it easier to switch or have multiple analytics tools referencing the same data. Even if there is no need for complex unification or multiple data sources, most projects will require this separate data warehouse to prevent performance degradation to the main application. Analytics queries can be expensive and long running, and you don’t want those affecting your application.
Integrating with a cloud based BI tool such as Chartio can be a fast, low overhead way to provide analytics and business intelligence. However there are hidden challenges, especially in the data pipelining that must be thought out and addressed before implementation. With some preparation and understanding of your own requirements, these challenges can be overcome and allow you to focus on what you really want, designing great analytics and BI visuals to grow and improve your business.
Sign up for our Newsletter for more on Chartio and Redshift in Part II which will discuss the value of Star schemas, embedding with security entitlements and more.
AgilData has partnered with Chartio to provide full life-cycle professional services with proof-of- concept packages and expert implementation services and ongoing remote managed services to enable firms to adopt analytics faster with a high level of success while leveraging the cloud and avoiding impact to existing staff or costs. AgilData’s Big Data experts solve problems, set strategy and develop solutions for BI, datapipe orchestration, ETL, API’s and custom applications.
Interested in learning more about AgilData products and services?
Looking for Big Data Services? we can help you.
Have a question? See the form below, and we will get right back to you.