What is Data Integration?

by Kyle Swan

Data integration is the act of bringing disparate data streams together in order to answer questions that each dataset would otherwise be unable to answer on its own. For example, if you have transactional data on your customers, you might want to understand the motivations behind their purchase behavior. So, you would need to combine data from attitudinal surveys with the transactional data to understand how motivations informed purchase behavior.

And while this seems simple enough, the real challenge lies in finding where—and how—one data type influences the other. For instance, you may not be able to discern how, exactly, your customers’ motivations are impacting their decision to purchase a given item. That’s where the science of building connections between different data types comes into play.

How Data Integration Works

From transactional data on customers, to data on business processes, to geodemographics (like census data), to quantitative survey data, a wide range of data types can be effectively connected and integrated. The key is having a way to make the connection between different data types. This means finding “hooks”—or “bridge variables”—to build a common frame of reference between datasets.

Regardless of what data-integration tools you’re using (e.g., open-source tools like R and Python, a domain-specific language like SQL, or enterprise data-integration platforms), there are three primary approaches to establishing a “bridge” between datasets:

One to One – This method entails connecting a single piece of information in one dataset to a single piece of information in another dataset. It’s often seen as the “ideal” method, but it assumes you have a way to directly connect one specific record to another. Often, one-to-one data connection involves linking customer IDs, names, or addresses with another type of data (like transaction data). Regardless of the mechanism, you should always handle personally-identifiable information with care.

One to Many – As an example of one to many, you might take information that’s relevant to a specific customer (zip code, for instance) and connect aggregations of information (like census data) to that, in an effort to learn more about the individual. And while this can certainly help enhance the depth of your data, it would assume—in this case—that where an individual lives is relevant to the desired learning goal.

Many to Many – If you’re taking the many-to-many approach, you need to create common aggregations of data across data sets—sometimes referred to as “cohorts.” Cohorts can be grouped based on demographic features, location, or other properties that are shared across datasets. But you must make sure that enough relevant cohorts can be created to accommodate the analysis you’re looking to perform.

Benefits to Businesses

Ultimately, the real “point” of any data-integration effort is to provide strategic value for businesses, by making insights more relevant, actionable, and informative. So, here are a few key applications in which data integration can enhance the business-critical insights that organizations depend on for sound decision-making:

Identifying Market Segments – Marketing organizations will often perform segmentation analyses of their customers to identify groups based on common attitudes, beliefs, and motivations. One common question that follows a segmentation project is “how do I find these individuals in the market?” By integrating geodemographic or behavioral data with segmentation data, you can build an online targeting algorithm or lead-prospect scoring system that can effectively identify people “in the wild” who belong to a given segment.

Expediting Physical Data Integration – Physical data integration can be an expensive, labor-intensive, time-consuming process. It entails locating the right data sources, finding common connectors, and manually connecting these records. However, a well-designed data-integration system can help you bring these data together automatically, greatly reducing the time to insight and action.

Identifying Discrepancies and Errors – Integrating different data streams can help reveal flaws in one, or more, of your datasets. By cross-referencing multiple data sources, you can see where—or whether—certain datapoints vary or show inconsistency. This can help elucidate errors, prevent rework, and more reliably ascertain the “truth” at the core of your insight. After all, nothing determines the success or failure of a research project more than data quality—garbage in, garbage out remains relevant, as always.

Challenges

It’s important be aware of potential challenges when moving forward with any data-integration project. So, here are few common hurdles you may encounter:

Bringing Down Barriers – Often, different organizational divisions act as vaults—or silos—keeping valuable information from being widely shared. Certainly, there are instances where data security or regulatory concerns necessitate such data protection; but often this is more of a cultural phenomenon. This behooves leaders within your organization and client organizations to identify ways around these barriers (or ways to remove them), so the data can flow as needed to facilitate useful insights.

Prioritizing Personnel & Tools – Once there is cultural openness and other barriers to sharing information have been surmounted, the next step involves putting the right staff—and the right data-management tools—in place. Since these individuals may already exist inside your organization, conducting a skills assessment with existing resources can allow you to promote from within, versus hiring externally. Similarly, an assessment of skills with existing partners who already work with your data can reduce the time to action.

Data-integration tools vary greatly and can run the gamut from open-source tools to large, enterprise data-integration platforms. That said, don’t assume that complex systems are the only way forward. Be intentional about your goal in bringing data together and how you intend to use it to add value for your clients. It may be best to start small—develop a proof of concept and show how it will provide value before ramping up efforts. Again, identifying partners who can help with these up-front efforts is a smart way to “right-size” your efforts at the beginning of your data-integration journey.

Ensuring Data Quality – Finally, take the time to assess the quality of the data you are trying to integrate. On occasion, end-users of business-intelligence data have little confidence in what’s being produced. This information may not always be shared, yet the organization still invests time and money collecting “bad data.” Quality issues may also become apparent once different streams of technical and business data are brought together. Build these internal stakeholder interviews and data-quality checks into your data-integration journey. It will help ensure that the foundation of your efforts—namely, good data—is firmly in place before building out a fully realized data-integration system.

How do I Find the Right Tool or Partner?

As we mentioned earlier, finding the right data-integration solution can be a challenge. So, to make things simpler, we will frame this challenge in the context of three options: Buy, Build, and Partner.

The “Buy” option involves hiring teams of data scientists and data engineers to build out processes or directly buying data-integration tools that would facilitate your needs. This will require leadership within your organization to take a focused and intentional approach to identifying their research goals. For instance, expecting untrained staff to use complex software solutions in lieu of a service/vendor, or recruiting teams of experts that don’t get fully leveraged, can be expensive mistakes.

The “Build” option leverages existing resources within your company and involves open-source tools, like R and Python, or SQL for extract, transform and load operations. Although open-source solutions overcome the potential issue of up-front costs, getting teams up to speed quickly can be difficult. If data integration efforts take too long, the organization may question their value and competition could advance more quickly.

The “Partner” option may make the most sense early in your data-integration journey, or in situations where you need to connect disparate data for a one-off application. Building your own data warehouse can be a lengthy and expensive process, so partnering can help you get off the ground quickly, while controlling costs. Over time you may move into “buy” or “build” phases, building off a strong foundation, enabled by your partner.

No matter which path you choose, following these guidelines can help you begin your data-integration journey with the right conditions, tools, and approaches to ensure you successfully reach your destination.

As a senior consultant at Burke, Kyle works on a variety of market research and data integration opportunities. Kyle leads the Data Science team, a group of talented individuals bringing together advanced analytics and actionable insights for our clients.

As always, you can follow Burke, Inc. on our LinkedIn, Twitter, Facebook and Instagram pages.