Daniel J. Weitzner
Principal Research Scientist
Decentralized Information Group
MIT Computer Science and Artificial Intelligence Laboratory
This document on the Web [http://dig.csail.mit.edu/2006/11/ic-identity-weitzner.html]
A later version of this column appears in IEEE Internet Computing, October/November 2006
This is the first in a series of columns exploring the unexpected, hard-to-anticipate at design time interactions between Internet technologies and public-policy goals. Beginning with fundamental questions about privacy, intellectual property, and free expression, stretching to equity of access worldwide and across cultures, the myriad ways in which the Internet reaches into our individual, community, and national lives suggests that I can’t predict where the list will end (or how long I’ll have the privilege of writing in this space). However, the most apt beginning is to start with an essential aspect of our human existence: our personal identities, the ability to reliably assert our names or other identifiers across the barriers of communications networks. The basic identity problem the Internet poses is establishing one party’s identity to another party’s satisfaction through communication across the network. The problem isn’t so much that — as The New Yorker cartoon said in 1993 — “on the Internet no one knows you’re a dog,” but more that, whether you’re a dog or a person, no one knows your name.
Terrorists have boarded airplanes with forged IDs. Our email boxes are full of fraudulent messages seeking to “phish” our account numbers and passwords by directing us to spoofed Web sites. Identity systems’ perceived insecurity, online and off, seems to threaten our physical security as well as the confidentiality of valuable and private information. So, governments worldwide are deploying more secure, hopefully reliable identity documents: driver’s licenses, passports, and national identity cards. At the same time, technology vendors, online services, and healthcare and financial institutions are scrambling to develop, standardize, and deploy the holy grail of network security: the identity-management system.
Although no doubt exists about current identity mechanisms’ weaknesses, our efforts to design and successfully deploy network-based identity-management systems have been so frustratingly unsuccessful that a new approach seems necessary. Elements of the new approach come into view when we compare Internet identity protocol designs with systems used in financial services. Traditional computer security systems begin with a nearly metaphysical design goal of associating a single identifier with a single identity (whether a person’s name or pseudonym). Once the system verifies the identifier, all privileges associated with it become available to whoever possesses that identity. Rather than taking this unitary approach, however, credit-card authorization systems take a composite approach, in which the binding between an identifier (a credit-card number) and the associated privileges (access to credit) is established only after the system has completed statistically based antifraud checks. In other words, you aren’t actually recognized as the card holder simply for presenting the card or even after verification that the card token itself is genuine. You’re recognized as an authorized party only on the basis of traditional security checks combined with statistical verification that you’re likely to be who you say you are.
The modern era of cryptographic network security protocols began with the beguiling elegance of public-key encryption and signature algorithms, followed quickly by international standardization efforts aimed at worldwide deployment of a public-key infrastructure (PKI). The promise of being able to share secrets at a distance, across networks, according to widely accepted international standards that evolved in the 1980s and ’90s, suggested that we might have a globally accessible method for clearing up the identity uncertainty inherent in Internet-based transactions. Alongside this centralized PKI approach, skeptics developed decentralized alternatives such as Pretty Good Privacy (PGP) crypto systems, which allowed secure signing and encryption for email and other documents based on “webs of trust” among individuals, rather than a central authority.
Although the first round of Internet-based PKI standards wasn’t widely adopted, a new crop arose. First, Microsoft offered a system called Passport; later — partly in response to political worries about relying on a centralized system from a single company — the Liberty Alliance, led by Sun and America Online, offered an architecture for a more distributed system in which numerous “identity providers” could vouch for individual users’ online credentials to various commercial and government services. Both of these systems attempted to offer not only security but also single-sign-on capabilities to relieve us all of having to remember endlessly growing lists of user names and passwords.
Given the obvious need for more secure network communications, how is it that no such system has reached critical mass such that the general public can rely on them? Network security gurus will correctly point out that enterprise IT systems have deployed many different PKIs and that financial institutions, government agencies, and others with sensitive data do use a wide variety of identity-management technologies in local settings. However, none of these deployments constitutes the sort of global, general-purpose identity infrastructures that we seem to need in the Internet age. Why?
In sharp contrast to the Internet’s current identity-management difficulties, the worldwide consumer credit card system is a very large-scale system that identifies hundreds of millions of individuals with minimal technical complication and notable ease-of-use. Why has the credit-card industry succeeded where the Internet industry has failed?
In some ways, this is an unfair comparison: the credit-card industry is highly centralized compared to the Internet market and has a very different business model (they make a lot of money on the transactions they secure). Most importantly, the credit card system is a special-purpose identity system, in that its only goal is to establish which individuals are authorized to enter into certain financial transactions. By contrast, the more ambitious identity-management systems are general purpose, seeking to establish individuals’ identities for an unbounded class of possible transactions — everything from securing access to sensitive health records to renting movies at the video store. For these reasons, meeting the credit-card system’s security requirements is easier because they’re driven by a better-controlled and scoped set of threat models.
The most dramatic difference between credit-card authorization and Internet-based identity management is that the credit-card system depends only partially on the security token (that is, the credit card) that we all possess. (Different countries use various hardware security techniques for credit cards. Many European countries rely on smart cards with personal identification numbers [PINs], whereas the US hasn’t found this additional security necessary.) Credit-card authorization also relies heavily on statistical antifraud methods that allow real-time assessment of whether to trust a specific transaction given what the system knows about the card holder’s usage patterns. Neural-network systems that pore over extensive transaction histories establish profiles of each individual account. When the system spots a deviation from that profile, it triggers an alarm and either blocks the statistically suspect transaction before it’s complete or flags it for subsequent investigation. On a recent Sunday morning, for example, I received a call from one of my credit-card companies asking whether I had intended to make a large cash advance at an Atlantic City casino the previous evening. Evidently, my card issuer is on to the fact that my life doesn’t generally involve that sort of excitement, so it denied the transaction, called to let me know that my card had been compromised, and told me a new one was already in the mail. Industry sources now report that although this analysis used to occur retrospectively, the risk-scoring systems are now so efficient that they can attach an initial risk assessment to every credit-card transaction in near real time. The system can automatically decide to honor the transaction within several hundred milliseconds — the time it takes to complete the data communications about the basic transaction.
The contrast with traditional security approaches present in most Internet-based identity-management systems is stark. Identity-management systems follow the classic computer-security model, which conditions identity authentication on verified presentation of “something you know,” “something you have,” or “something you are.” Statistical antifraud systems, by comparison, have very low expectations of the thing you have (the card) or the thing you know (perhaps a PIN) and rarely use what you are (biometrics). Rather, the credit-card system determines authorization by dynamically measuring how statistically likely you are to be who you say, as determined by your conformance with the usage pattern the card system has observed in the past. When the system works well, those who are actually authorized to use a card behave closely enough to their profiles that it honors their requests. Criminals attempting fraud will find it relatively easy to either learn the card number and con the PIN from the card holder (something you know) or mint a fake card (something you have); however, they’ll find it especially challenging to use the card in a manner that conforms to the legitimate holder’s profile. Given that it’s relatively easy to spoof what you know and what you have, statistical analytics steps in to make up for gaps in the traditional security mechanisms.
Simply put, the key to being authenticated in these statistical systems isn’t something that can be stolen or faked. Although some effort certainly goes into establishing the card’s physical integrity as a security token, the identification determination depends far more on statistical analysis than on sophisticated cryptographic algorithms and network protocols. Recent trends suggest that the banking industry will continue to rely on statistics rather than cryptography. In response to concerns about phishing, the leading US bank regulator called for banks to implement so-called “two-factor” authentication to supplement the perceived weakness of username–password authentication systems now common in online banking. Many in the industry reacted with alarm because they thought the only way to add a second authentication “factor” was to distribute some (expensive) hardware token to all banking customers or deploy complex (cumbersome) security software in addition to standard Web browsers. Instead, several banks have now indicated that statistical assessment of transactions’ trustworthiness could well serve as the second authenticating factor.
A vital caveat is that, as good as these credit systems are, they don’t stop fraud altogether. Public reports from 2002 indicate a seemingly modest 0.17 percent fraud loss rate, but on a total volume of more than US$1 trillion, the loss isn’t insignificant. Nevertheless, this rate appears to be sufficient to keep the system functioning economically.
Can Internet identity-management system designers learn anything from the success of financial networks’ chosen security strategy? I think so. Rather than struggling to deploy general-purpose identity-management systems that rely on cryptography alone to reach the desired security level, perhaps they could succeed by implementing more lightweight, flexible systems, supplemented with statistical analysis where needed.
The statistically based composite view is likely to challenge the unitary view of identity inherent to traditional security protocols. The composite view includes important trade-offs and challenges. First, our only models of such systems have highly centralized designs, requiring all transactions to flow through central points for analysis. Such centralization might work in some applications, but it doesn’t necessarily scale to all types of Internet and Web services. Experimenting with Web-oriented identity protocols such as OpenID (http://openid.net/) and SPix, which use URIs as basic identifiers, will help us understand how such systems might scale. Combining these inherently decentralized systems with statistical antifraud measures where appropriate could form the basis for a reliable identity system that could be deployed at Web-scale with a range of security levels to meet applications’ varying requirements.
Second, the predictive power of existing systems clearly comes from collecting and analyzing very large volumes of personal information. In many cases, services such as Google or Yahoo have already collected this data, but developing user behavior profiles does carry a substantial privacy risk. At the same time, more reliable identity systems offer considerable privacy benefits: personal data will be more secure, and those who misuse private information can be held to a higher degree of accountability.
Keeping the risks in mind, we should embrace systems that help reduce the uncertainty associated with online transactions. Neither the composite nor unitary approach will yield a perfectly secure or reliable system, but Internet and Web system designers can learn a lot from the former. Not only has it proven itself in large-scale systems, but it’s also more true to the way we think of identity in human terms. Interpersonal interactions involve assessing identity assertions’ reliability according to numerous subtle factors, rather than applying a mechanical checklist. Adopting a more composite approach to identity will let us build systems that provide great reliability and flexibility in the inherently complex process of assessing whether we can trust the identities of those we deal with online.
I’ve benefited from conversations on this topic over the years with many security experts, including Ross Anderson, Bob Blakely, Whit Diffie, Carl Ellison, Barb Fox, Dan Geer, and Dan Schutzer, though they might all disagree with some or all of what’s written here. The views expressed here are entirely my own and haven’t been endorsed or approved by the W3C or any of its members.
Weitzner is Principal Research Scientist at MIT Computer Scientist and Artificial Intelligence Laboratory and co-founder of the MIT Decentralized Information Group. He is also Technology and Society Policy Director of the World Wide Web Consortium. The views expressed here are purely his own and do not reflect the views of the World Wide Web Consortium or any of its members.