Customer data has just started to explode in all major line of business. Talk about Insurance Companies or Airlines, it is adding terabytes of data each passing day. Challenge has become to maintain it rather than taking the advantage out of it. Generally, there are many 3rd party and non-traditional sources feeding the data to the client’s hub. Although adding more data does not necessarily mean you are adding dollars to the profit you are making. In fact, it is the other way round!! The more data you accumulate, more becomes your responsibility to understand it and get desired value out of it. Yes, even if it is in terabytes per day.
Imagine the customer data is segregated across silos and you don’t have any way to match it. At end of each month or quarter, it shall become a nightmare to query each system and build a dashboard out of it. Needless to say, the data quality would be very low and data redundancy would be high. Obviously data accuracy takes a big hit there impacting the decision making power of the customer.
Big Data Relationship Management
One of the possible solution is to get the data on HDFS clusters and start matching the customer records to form matching groups. Informatica Big Data Relationship Management (BDRM) undoubtedly provides best solution to use matching on Hadoop. Here are few things that BDRM has to offer:
- It creates clusters of related records using multiple match criteria on Hadoop to identify groups
- Match data from 3rd party and non-traditional sources
- Rapidly add new data sources to augment existing data
- Real time identity search
BDRM uses powerful SSA Name 3 match engine that runs distributed matching and linking in parallel across multiple Hadoop nodes.
Use Cases
As such every customer who has a requirement of matching tons of customer data, can potentially use BDRM for matching. In case of cloud, it may take few weeks (or a month) to match millions of data. However, BDRM has an ability to match hundreds of millions of customer records in few hours. In fact during one of the POCs, BDRM has matched around 700+ million records in around 18 hours. Now, that could be a life saver!
Indeed, there are scenarios where customer may request the SI to on board millions of data during the weekend as he is not ready to sacrifice his profits and bring down the system during weekdays. Fair enough. And believe me, it is difficult for any matching tool to match records on HDFS clusters that fast. And how about the Map-Reduce jobs, it is extremely difficult to get resources who can write those lines of code and continue the maintenance. Guess what, BDRM does that for you. Yes, it generates the map-reduce jobs and loads the indexed and linked data into a repository for further analysis and visulatization.
Consider a use case, you work for a large insurance organization that wants to enrich the existing customer database with data from third-party data provider services on a Hadoop environment. The organization wants to compare the existing data with the third-party data to identify potential business prospects. The organization wants a 360-degree view of the customers to understand the relationship between them and come up with targeted marketing programs. This could make the best use case for BDRM to match the data, link the matched records to identify potential business prospects.
Key Features of BDRM
Single view of party, 360-degree view, Appends social data and Real-time search
Indisputably, BDRM is the right choice for improving big data analytics, infering non-obvious relationships, viewing social relationships and for achieving rapid results.
Before concluding, I would like to share with you our Customer Success Stories for BDRM.
The post Matching Customer Data on Hadoop appeared first on The Informatica Blog - Perspectives for the Data Ready Enterprise.