Understanding 'Big Data'Preserving the Meaning is Industry's Greatest Challenge
State Street Corp. scientist David Saul says most financial institutions have plenty of information, but are not doing enough to exploit the business benefits of so-called "big data."
Making sense of data and using it for fraud prevention and risk-based analysis is a problem for banks and credit unions. Few have figured out how to efficiently and cost-effectively use all the data they have at their disposal.
"The challenge that we've always had, and it's going to continue, is business people who own this data have the real understanding of its meaning," says Saul, State Street's chief scientist, in an interview with Information Security Media Group's Tracy Kitten [transcript below].
"When we program a solution - a new product, a risk analysis report - the programmers and database people have to understand requirements from the business people, translate that into programming languages, into database terminology, and sometimes we lose some of that meaning, and it also takes time," he says.
If data is brought closer to the people actually understand it, it becomes much more useful.
And Saul says the evolution of semantic database technology is helping institutions integrate and overlay the data they collect. "That's going to give us some of those tools that we need to integrate the data more quickly, more accurately, and also by retaining the meaning of that data, it's going to make it more useful to the people in the business who need this integration," he says.
During this interview, Saul discusses:
- Preserving the meaning of data;
- The regulatory impact;
- The role timing plays in data analysis.
At State Street, Saul proposes and assesses new advanced technologies, and evaluates technologies already in use at State Street. He previously served as State Street's chief information security officer, overseeing corporate information security, controls and technology. Prior to that position, Saul managed State Street's Office of Architecture, responsible for the overall enterprise technology, data and security architecture. Saul joined State Street in 1992 after 15 years with IBM's Cambridge Scientific Center, where he managed innovations in operating systems virtualization, multiprocessing, networking and personal computers.
TRACY KITTEN: As a chief scientist, what exactly do you do?
DAVID SAUL: The role of the chief scientist was created at State Street a little over a year ago and really builds on our experience with delivering technology to our customers and continuing to innovate in the space of products and services. When we created the role of chief scientist - and I'm the first person to hold that - it's really to focus on innovation that delivers benefits to our customers. Being the chief scientist is very much in line with that.
KITTEN: In recent months, the term "big data" has been one that's been heavily used and encompasses a wide range of topics. When it comes to the financial space, can you help us understand exactly what big data comprises?
SAUL: The term "big data" is getting a lot of play not just in financial services, and I think it's very important to define what we mean by it, specifically as it relates to State Street and our business. Data that we receive from our customers that we process here, that we receive from external sources, is really the most important asset. And the definition of "big data" in our terms is any of that data that we're either holding as custodian for our customers, we're processing or we're performing information delivery. It's quite a broad definition.
Silos: Gaps in Fraud Detection
KITTEN: You note that banking institutions have lots of data, but that data is typically siloed or is coming in from numerous sources. That lack of integration leaves gaps in fraud detection. Can you help us understand how those gaps can affect security?
SAUL: While the security of individual databases or data sources can be quite strong, it's where the interfaces occur and when that data's integrated with other information that exposures can occur. And of course, integration is something that not only we do all the time, but we do with our customers. When we're delivering information to our customers, almost the first thing that they do with it is integrate it with other data that they already have or that they're receiving from other sources. What we need to really do is have an end-to-end protection scheme for that data not only when it's within our walls but also when it's passed along to our customers.
KITTEN: Time analysis of time-sensitive data is critical to a financial services company. What lessons can be learned from other regulated industries, such as healthcare?
SAUL: That's a very good point. While financial service data is our most critical asset, some of the technologies that we apply against it for analyzing large amounts of data we could look at other industries, like the pharmaceutical area where they have lots of data from test results that they need to analyze and integrate with other data. While financial services is very data-centric, there are very important carryovers from other technologies that we can take advantage of. We're not isolated in looking at data as being valuable, protecting it and controlling it.
The Regulatory Impact
KITTEN: You've also noted that better data management and the integration of data, especially in the financial services space, would benefit regulators as they work to establish guidelines and set mandates for consumer privacy and fraud prevention. How exactly would this benefit regulators?
SAUL: I believe this would have a very strong benefit to regulators similarly to what we do in terms of integrating data, and let's take as an example risk data that we extract from multiple different systems to create an aggregated view of risk against an entity or a geographic area. We then pass that data along to our customers. And as I said before, they integrated it with some of their data. Now think about the situation with the regulator whose receiving risk information from multiple different companies in the financial services space. What do they have to do? They have to integrate that data to answer the questions at an industry level, what's the aggregate risk that's being created by the entire industry?
What we do internally, what we do with our customers, is very much analogous to regulators. I put myself in the situation of a regulator. They have a very difficult job if they're going to get this data from different companies; and it's not standardized, it's not in a form in which they can easily look across an entire industry. I believe that regulators are going to benefit very strongly from these technologies as well.
Programming: Database Challenges
KITTEN: What about programming and database challenges in the financial services space in particular? What challenges do financial institutions face when it comes to getting an enterprise-level view of big data?
SAUL: The challenge that we've always had and it's going to continue is business people who own this data have the real understanding of its meaning. And when we program a solution, whatever it happens to be - creating a new product, a service, a risk analysis report - the programmers and database people have to understand requirements from the business people, translate that into programming languages, into database terminology, and sometimes we lose some of that meaning and it also takes time. The more we can move that big data closer to the people who own it and understand it and give them tools that they can use directly to, for example, create an ad hoc risk report without having to require programming, that's always been one of our goals in information technology. And a lot of these technologies are going to bring us closer to doing that, where the business people have direct tools that they can use.
KITTEN: How can banks and credit unions get around the challenges they face when it comes to integrating their data?
SAUL: In my role as chief scientist and tracking current and future technologies, the evolution of semantic database technology to me is one of those great steps forward that's going to give us some of those tools that we need to integrate the data more quickly, more accurately, and also by retaining the meaning of that data - that's the whole point of a semantic database - it's going to make it more useful to the people in the business who need this integration.
Services and Solutions
KITTEN: What types of services or solutions should financial institutions consider? Is outsourcing to a third party, for instance, the best option?
SAUL: Let me answer the second part of that question first. I think that outsourcing is an independent decision. Whether you do it in-house or you outsource it, really the critical things are: Are you providing the right kind of services to quickly integrate that data, to protect the data, to make it directly accessible to the people who need it most? Coming back to my previous answer, the semantic technology, by keeping with the representation of the data, its meaning and its relationship to other data, that's really where the power is. Whether you run that in-house or you might have a service that provides that for you, say, integrating reference information from multiple sources, I think is less important than you have the right kind of services and the right protection on that data.
Data Management and Storage
KITTEN: You've also noted that over the last 5-10 years a number of companies have emerged that focus on data management and data storage, and this emergence has actually benefited all the players rather than creating more competition.
SAUL: I'm a very strong believer in standards and the value of standards as opposed to creating independent, proprietary and conflicting solutions. In the data space - and in financial services in particular - having standards for the representation of data, because of the amount of data exchange and data integration that we do across all of our companies, as well as with our customers, ends up being a benefit to all of us. And I'm very encouraged with the efforts that have been going on in financial services. To cite one, the Enterprise Data Management Council, of which all of the major players are members, for the past several years have been working on developing standards for the description of the financial information that we exchange all the time. And recently they have partnered up with the Object Management Group, probably the world's leading standards organization, to create a set of standards that we can all use for exchanging financial information.
And then harking back to an earlier question that you had, the regulators are taking very strong interest in this effort because if it's an industry standard and it comes out from all of us and then they adopt that as the standard that they use for writing their regulation, I believe we're all going to benefit from that. And the companies that we've been talking to really get the point about standardization, and that's what we're looking for, that's what our customers are looking for and that's what the regulators are looking for. I believe in the end we're all going to benefit.
KITTEN: Before we close, what final recommendations would you like to offer banks and credit unions that are listening to this podcast that are keen to do more on the data management front?
SAUL: These technologies are still evolving. Some of them are in the early stages, but they're built on some very long-term, well-established standards. For example, semantic database technology builds on the same standards that we use on the Internet to link between sites. So it's been around for a long time, it's well proven and we know that it scales. But the advice I would give everyone is to get started. It's a technology that you can start with a relatively small project. You can take two databases to get started. It has an advantage over some other technologies in that you don't have to go back to a clean piece of paper and re-architect everything you've done. My advice is to get started as soon as possible and do a small project and build on that. Also, if you're not already involved with the standards activity, you should do that and provide your input there.