6 - See the data

Posted on Jan 7, 2018

Silicon Valley diversity

Mark Zuckerberg, Evan Spiegel, Tim Cook. What have they all got in common? Yeah - apart from being rich enough to use iPhones as toilet paper.

White dudes the lot of them. But then again I just picked these three names. What do I know? Let’s see what the data says.

Getting the data

I want to show you how easy it is to get hold of data of this kind to play with yourself. You can download a csv file at Kaggle Screen Shot 2018-01-06 at 4.49.52 PM And then load it into Python using a thing called a Pandas data frame. Screen Shot 2018-01-06 at 4.49.05 PM

There are 3960 rows in this dataset, one row for each combination of our four hierarchical variables: Company Race Gender Job Category

It’s hard to figure out from the data alone what the gender balance in Silicon Valley is.

Representing the data hierarchically

Rather than just having counts for each specific subset, we may want to find out how many men there are in total at 23andMe.

We can use a data type called a tree to represent how Silicon Valley workers break down into subcategories.

Let’s create a tree where the 354,964 workers are firstly divided into companies, and then into gender groups. Here’s the start of the tree:

Screen Shot 2018-01-06 at 7.50.36 PM

This begins to show us that 23andMe is unusual in having more female than male workers.

Representing the data visually

However, we can go one better. Humans aren’t great with comparing proportions of numbers. We can visually represent the numbers as areas using a treemap. Here is the treemap of the previous tree: div-companygender.png

Because 23andMe’s 297 workers gives them such a small area on the horizontal axis, they don’t even warrant a name on the diagram. But they are the unusual 50/50 blue-red split second from the left.

This diagram says “patriarchy” much better than a list of data could.

Now, off to find a Silicon Valley summer internship! Which company wants to expand their blue box?