Announcing Drillbit: Who’s on your Mailing List?

The genesis of this idea was a couple weeks ago when my cofounder said: “Would it be possible to see what percent of our email list was female or male based on their names alone?” Thus Drillbit was born.

Screen Shot 2013-06-20 at 6.01.30 PMIn the last couple weeks I have been pouring over data sets and trying different formulas to find the best way to break down a list of seemingly random name data into digestible information. The resulting app allows anyone to upload their mailing lists and see who’s in them, and in perhaps the coolest feature, they can segment their list as well.

The Project

Drillbit uses publicly available datasets to create a likely demographic profile of mailing lists based on first and last names. Upload your mailing, customer or user list with first and last names, and based on that information we will create an age, gender and demographic profile of your list.

The Datasets

Listed here are the foundational datasets of this project, including for analysis tools that haven’t yet been released.


The essential principle behind Drillbit is that an individual’s first and last names betray a lot of information about his or her background, origins, language, gender, and even income and ideology. Names can be both varied in their originality and popularity as well as conservative in their staying power. A surname can be passed down for generations, whereas first names have a tendancy to be cyclical.

As an example, take the name “Max.” It is a common name, or common enough it would seem, that one could find out very little information from the name alone. But as it turns out, “Max” only may seem common to us given its surge in popularity in the late 80’s and early 90’s–the birth years of the rapidly matriculating Generation Y. In 1974, only 400 Max’s were born nationwide!

Of course, baby name popularity is not a new idea. But the variance is astounding, and not just in terms of popularity. In 2012, the two most popular baby names for boys and girls were “Jacob” and “Sophia.” Unlike “Max,” both of these popular names seem to have spent the last 100 years on the up-and-coming list.

With this amount of unique variance in names–some names jump and others sink, some names are like fads and others never really take off–it isn’t surprising that, in the aggregate, it is possible to take a list of people and determine how old they are likely to be.

So that’s what I did. Using the above datasets on name popularity, I was able to come up with some pretty convincing initial results, benchmarking against existing lists I knew well.

The first step is to condense the data I had into a table which compared year of birth, and gender, with the % likelihood that any random “Michael” born in the last century was actually born in that year. For example, if 10,000 Michaels were born between 1900 and 2000, and 1000 Michaels were born in 1950, then 1950-M-MICHAEL has a 10% likelihood; i.e., given a random Michael, there is a 10% chance he was born in 1950.

With the charts above, you can see how this would play out. If you were to use Drillbit to upload a list of 5000 Jacobs, you would see the age match pattern roughly cohere to the above chart. The more Jacobs there are, the higher confidence we would have in the result.

There are some obvious complications with this model. The first is that although 10% of all Michaels might have been born in 1950, they would be over 60 now, and their chance of being around is much smaller than that of a Michael born in 2000. That’s where actuarial data comes in. Using the above actuarial table divided by gender, I was able to normalize the distribution based on likelihood of survival in each age cohort. No matter how many Max’s were born in the 1910’s, there aren’t a lot left today.

The second problem is that names are not unisex; in fact, most names in the database aren’t 100% unisex, Michael included. It became clear that age data had to be done on the basis of gender, and not on totals. Names that are popular with one gender are not necessarily popular with the other at the same time.

To compensate for this, age data was tabulated separately, all the way down to the actuarial normalization. Female names were rated and graded against each other, male names were separately, and only at the end were they normalized against each other.

Compared to Age, Gender and Race were quite easier. Gender analysis was a simpler form of the age analysis–likely names were divided by gender and then normalized by age. Race/ethnicity data was also quite simple based on surnames–the data was already organized by the Census, albeit 13 years ago, so getting it into a searchable database wasn’t tricky.


There are some obvious limitations to my method. The first is in the nature of large numbers, or small numbers as the case may be. If you were to put a list of 2 names into Drillbit, it would spit out a similar looking demographic profile running the gammut of all ages and perhaps some different races as well. There are few names that are reliably “Black” names or “Over 65” names (although, there are a few names with a 100% incidence within one to five years–challenge you to find them). Like with any aggregate data project, the larger the list, the more reliable Drillbit will be.

The other limitation is in any sort of list that comes with existing biases. Say, a list of NBA players (heavily 25-35 and black) or a list of sitting US Congresspeople (heavily male, white, and 35-55). These inherent biases will be reflected in the anlaysis, but probably not to the extent that they could be. This is the House of Representatives, according to Drillbit:

Obviously 18-24 year old congresspeople would be impossible. And yet, even with a small list of 435 names, the trends in age in reality poke through.

In short, you shouldn’t use Drillbit to analyze a list whose composition is already known to you to skew heavily in favor of one or two demographics. However, it’s worth nothing that Congress is 18.3% female, and Drillbit predicted 20.5% based on names alone. Not shabby.

The final inherent bias that’s worth mentioning is in skewing toward younger ages. Since younger people are overwhelmingly more likely to be alive, post-normalized numbers skew younger. In addition, in development, I had a category be “Under25” but it became apparent that although my database could detect age variability all the way to Age 0, babies aren’t going to be on mailing lists, and they were throwing off all the results. So to compensate for the younger skew, I made a judgment call to make a cutoff at 18, and not track any younger cohorts, even though some websites may have 13-18 year olds as users.

Now that you know more about how I did it, upload a list and try it out!

We are Obsessed with Race, Not Racism

Our obsession with race has surpassed and perhaps even magnified our problems with racism in America.

Let me explain what I mean. Since I’m white, I can’t speak to the personal experience of racism, and I wouldn’t try to do so. As an American, I am part of a society that has made identity politics a most incessant and obnoxious trope, and I have observed that the more opposed to this drivel people get, the more the boundaries of politically acceptable discourse solidify to exclude them (or should I say, us). There are things that just can’t be said anymore, things that we need people to say because without dissent, race politics becomes an orthodoxy, and orthodoxies are dangerous. That said, I have travelled to a very many places and interacted with a great deal of people of all backgrounds, ideas and identities. Almost every person I have met has been full of opinions about racism, despite the fact that few of them are people whom I would consider to be racist themselves. And I’m beginning to wonder if our obsession with race has reached a boiling point and we might need to rethink how we approach issues of race in this country before it boils over and causes some real problems.

For reference, I always look to South Africa, where I studied abroad, and to the particularly virulent, open racism that persists there 20 years after apartheid. In South Africa, everybody talks about race, all the time. It’s talked about with an openness and frankness that is surprising to an untrained American ear. I think we can learn a lot from South Africans in how they openly confront their racist past and spend every waking minute talking about it–as a result, there are no secrets, no closet racists, no sinister feeling of power behind a veil of magnanimity. In South Africa, racists white, black and coloured proudly declare their racism. It truly lays bear the shocking reality of racism; that it exists in droves, that it is self-perpetuating, that it results in bad justice, erosion of social cohesion, etc–these are things we know. But because South Africans talk about it so much, because they confront it and it is politically acceptable for public figures to say some of the most shockingly racist things, I found it oddly refreshing and somewhat hopeful. That maybe there is a post-racial future in South Africa after all.

But it is hard not to contrast the South African free discourse over race with our much more regimented, yet simultaneously boiling, discourse in America. We have confined ourselves to a very narrow and troubling politically correct discourse where the only thing it is permissible to talk about is how bad racism is and how racist white people are, and it has become completely impermissible to talk about the identity politics and tokenism which have resulted from this myopic obsession. As a result, the conversation about race and racism in America is troublingly one-sided. When I am engaged in a discussion about race, it is almost always about racism, the ism being the domain of racists and a racist society (depending on your worldview, this defines a relatively narrow or a very broad band of Americans). But in all this talk about racism, we are engaging in a more important discourse, a discourse on and around capital-R Race. The difference is that while “racism” can be easily used to segment the undesirables in our midst, race is considered not only an important preoccupation but a necessary one in order to combat racism, and thus race, not racism, is what enters the national consciousness and infects our discourse. In short, we no longer are obsessed with racists, we are obsessed with race.

What form does this obsession with race take in our society? We are racial compartmentalizers. We count minorities in positions of power and obsess over racial balance. We talk about racial “firsts” (first African-American so-and-so). We still can’t decide on a good definition of Hispanic. We try to “fix” racism with countless race-specific philanthropies and entitlements. When we encounter people or public figures that challenge our assumptions about race, the we get cognitive dissonance and the discourse gets wrapped up in it. Black men like Herman Cain and Michael Steele were commonly derided as Uncle Toms during their pinnacles of influence. (This isn’t just a racial problem–we even blame women like Marissa Mayer and Sheryl Sandberg for not being feminist enough, which is eerily similar to the time when Sarah Palin was being attacked by the feminist movement who apparently wanted a woman in power but only a certain kind of woman.) This systemic compartmentalization is rampant. We castigate white people with success for ignoring and/or not admitting their privilege. We castigate “minorities” (I hate that word) with success for not doing more to help other minorities. In the latter case, it is very discomfiting to see the expectations of people when it comes to diversity unhinged on those who are providing solutions.

If there’s ever a better exemplar of the problem of race in America, it’s President Barack Obama. Obama is our first black president, but he’s actually half black. It’s interesting how his mixed racial heritage rarely gets as much attention as his blackness. It’s as if there’s an unspoken rule that being biracial is too confusing for a racial narrative. He must be black, or maybe conservatives wouldn’t hate him as much, and he wouldn’t be different than every president that came before. But he’s also a possessor of a litany of American privileges that we usually associate with whiteness. He was raised in a white household by his white grandparents. He went to white colleges. How do we as Americans square that circle? Do we dare create a definition that challenges our inborn assumptions of race, or do we call him black and leave it at that? And if we have decided that a half-black man is either all black or all white, what sort of example is that supposed to set to mixed race children growing up in America, that they have to choose one or the other in order to have a place? Of course, if we make too much of a deal of his white heritage, we have also failed black kids in telling them that you can be successful if you’re black, but only if you’re actually white.

Our race discourse is about constantly deconstructing and reconstructing our racial narratives in order to make the most sense about ourselves. We all think about these things, even if we don’t talk about it. We are conditioned from an early age to internalize notions of race and culture, to be aware of racism, to know our racist history, to understand it. We embrace “diversity” and engage in an uncomfortable amount of social engineering in order to achieve some utopian post-racial future. At the same time, we are conditioned to only speak about race in euphemisms, to avoid offending (which often means avoid discussing) and to tread lightly in the public sphere on the subject. We also are very happy to shut down discussion of race, especially by white people–an uncomfortable ad hominem lobbed at white people who dare to criticize identity politics in America.

A bigger challenge to egalitarianism is that we can’t be satisfied as Americans all seeking for our piece of the American Dream. We can only be satisfied if every person fits neatly into a box on a census form and into a race coalition with its own community spokespeople. We need to conflate race and class, because the alternative is too unsettling. This is a problem because using “white” as a synonym for privilege ignores a very important factor of what constitutes racial “normality” in a society. It is fair to say that white people have a privilege in a white society. It is more accurate to say that X people have a privilege in an X society. Whatever X is in America, it isn’t strictly white. There’s a combination of looks, language, culture and history involved in X. There are plenty of white people with southern drawls who couldn’t land a job on Wall Street even if they had straight A’s. Our culture doesn’t work like that. There are also plenty of black kids growing up in Fairfield County, CT who often act, talk, and subsequently succeed like any white kid growing up in the same circumstance. Incidentally, they are often accused of “acting white.” This is part of the problem: that we use such terminology speaks to a very sad conflation between race and class in contrast to America’s multiracial, diverse reality.

X isn’t necessarily the same thing as white, and indeed, if we want there to be any progress on the racial front, we have to insist that X shouldn’t be white and it is possible, and desirable, to deconstruct the “white privilege” paradigm. This isn’t unthinkable. The definition of “white” itself has changed in history. One of the more interesting books I read last year, Nell Irwin Painter’s The History of White People, tells a fascinating story of how “white” has come to express different ethnic makeups in America. In the last 200 years alone, white has excluded, and then included in turn, people of German, Scandinavian and Irish origin. Imagine that in the late 19th century there was an entire contingent of scientists who didn’t consider Nordic people to be white enough!

I would have to mention Michel Foucault at this point because the parallels of racial discourse in today’s America to sexual discourse in yesterday’s England are too obvious not to bring up. Foucault observes that the people whom we regard to be the most uptight about sexuality were the most obsessed with it. People who spent every waking minute restricting new sexualities and perversities and in doing so opened up sexuality to a whole new universe of intrigue in science, the law, and medicine, in what he calls the Perverse Implantation. Rather than sexuality becoming more subdued, it became more accessible, with the prudish Victorian discourse on sex merely a catalyst for an unprecedented interest in sex, and indeed, it is often misunderstood to have been prudish in the first place.

We have a similar situation in America with race: we spend every waking minute thinking about it and in doing so create more obsession. We can’t get enough of race. Instead of pushing past racism, we are recycling racism into a new paradigm in which all facets of the racial puzzle are reconstructed, pushed into avenues of politics, art, science, the humanities, and thus continually re-examined, obsessed over. Call in the Racial Implantation. Instead of defeating racism, we are creating a new class of racists who, like the racists of old, believe their solutions to the race problem are progressive. They also tend to be inside an echo chamber where challenges to their outlook are deflected, often, ironically enough, with charges of racism.

Given these issues of race in our discourse, racism itself isn’t surprising. I would be surprised to find myself in any modern society today without racism. It either is an extremely natural human instinct in complex societies, or it is going to be a very bad habit to break. I think everyone will disagree on the best “solution” to racism, the discussion of which I think may be part of the problem, but c’est la vie. You can’t argue with the facts: America has racists, and whites sit at the top of the racial hierarchy. This makes a lot of people uncomfortable, including whites. White people, like myself, find it difficult to square their belief in an egalitarian society with the racial realities of our still predominantly white society. And that’s something that we can and should address, and there are plenty of ideas on how to do so. But the first step to solving a problem is recognizing a problem. And the problem, I believe, needs to include our obsession with race. We need to realize that our race discourse has added to, and perhaps even compounded the racism problem. I would like to see racism become just one part of a larger discourse where we look at ourselves first and foremost as perpetrators of a perverse race logic. Only then can we really begin to address the dreams of a post-racial future.

Thanks to Danilo Campos and Frances Low for reading drafts of this.

