Can a Supply Chain be managed using all possible data? Part 2
While working on numerous Big Data projects, we realized that the majority of them are a combination of several commonly used Big Data application patterns:
- Events recognition
- Finding patterns in structured and unstructured datasets and classifying all the existing data
- Advanced Analytics
- Identifying correlations and trends in massive data sets and documenting the data analysis approaches
- Extract Transform Load
- Implementing ETL using parallel computing technologies, reducing processing time and implementation costs
- Data Enrichment with Public Data
- Adding non-transactional data from social networks, media, emails and other unstructured sources
- Data Reservoir
- Aggregating all enterprise data in a Big Data silo and providing cleansed data to multiple data warehouses and BI tools
Because of its complexity, Data Reservoirs often rely on a combination of the other patterns and is of great relevance to logistics companies. Let’s look at a case study of how the implementation of a Data Reservoir helped a nationwide mail delivery company.
After an initial evaluation, the Luxoft team found that as the business of the company evolved and new services were proposed to customers, the company needed a comprehensive view of all their available data. That included shipment tracking events, customer visits, social insight and financial transactions.
Since the amount of incoming daily data exceeded what traditional approaches and technologies could handle, the decision was made to build a Big Data Cloud to provide analytical services to all BI and data analysis tools.
The technical implementation was based on a Hadoop-enabled Data Reservoir with data cleansing procedures, integration with DWHs and on-demand analytical service provisioning.
It’s always important for Big Data projects to show business owners a return on investment, so Luxoft had to prove that implementing the Data Cloud would enable all business processes to manipulate the data without size limits, opening the new opportunities listed in the following table. The opportunities have been evaluated as either “traditional” or “Big Data” approaches:
At the early stages of a typical Big Data project, multiple R&D activities are carried out using different approaches to provide the customer with reports, dashboards and analytical capabilities – each with specific advantages and limitations. Once the customer realizes the analytical capabilities they now have, new requirements are often revealed. At this point, the customer usually doesn’t just want batch analytics running on their data building reports, they need “real-time” dashboards, alerting, notifications and ad-hoc analysis.
The best solution for such cases is not collecting requirements for a predefined set of reports, but rather building the infrastructure for self-service analytics so that a customer can analyze the data regardless of its size. The Luxoft team implemented a self-service analytical platform that used a combination of DataMeer (which lets a customer ‘slice and dice’ the data) and Apache Spark (for providing aggregated views to the Oracle BI platform in volumes that Oracle can handle effectively). Together with customized and pre-built MapReduce reports, this provided a highly usable analytical ecosystem for inventory and demand planning.
Working with these analytical tools requires specific skills. An important part of the project that is frequently not given enough attention is the organization of appropriate infrastructure at the correct stages of the project. Without the proper allocation of servers for performance testing, the validation of the Big Data product might be incomplete and lead to problems later. Cloud deployments for testing are of great help here, even if the deployment is not planned to use public clouds. The project team should use the special methods available to make sure that the data processed in a Cloud is anonymized.
Most often, the result of data analysis is a barely digestible set of disconnected documents with no logical flow. This is not Big Data, and it still requires a lot of effort for decisions to be made. Using traditional tools for BI is a job for BI experts – not decision makers!
We differentiate Big Data Analysis and Big Data Decision Making as follows:
- Analysts use analytical tools
- Decisions makers need a visually rich decision support systems
This is why we introduced Horizon, Luxoft’s own next-generation data visualization framework that allows rapid decision support system creation. The results are a set of visually appealing, touch-friendly info applications that allow users to quickly dive into the problem.
This requires a special approach to make the data available in real-time for rich UI, but this problem can be solved using multiple approaches, as shown: