Google’s Super Bowl ad proves that Google knows too much?

Today we were treated to Google’s first Superbowl ad.

It showed the life story of a man, all through his Google searches.  He traveled to Paris (“study abroad Paris”), met a woman at a restaurant (“cafes near the Louvre”), fell in love (“how to impress French girls”), seduced her (“chocolate shop near Paris France”), moved to France (flight tracking), got married (“churches in Paris”), and eventually had a child (“how to assemble a crib”).

What’s creepy about this?  It’s proof that your Google searches can be enough to reconstruct your life history.  All of it.  And this was just the highlights.  Imagine if our mystery searcher had searched for more personal information — how to deal with grief, the loss of a child, sexual confusion, a disease, a long-lost love, or more?  Google knows all of it. Google knows how you feel, what you think, and what you do.

And even if Google itself doesn’t read your search history, it’s still out there and vulnerable to eavesdropping or hacking (remember Google’s allegations that someone in China hacked their servers and stole personal data?)

I think Google meant the ad to show that it is powerful.

Instead, it gave me the heebie-jeebies.  It’s an everyday reminder that Google knows pretty much everything there is to know about you.  And that this data is stored on servers around the world, where it could be mis-used by nearly anyone. (TechCrunch says that it’s proof that hell froze over.)

Remind me to clear my search history more often.

Post to Twitter Tweet This Post

Facebook fights subpoena by criminal defendant looking for evidence of bias

News comes today that Facebook is fighting against a subpoena issued in a Missouri murder case.

In short, former police officer Bryan Pour was accused of murder after a shooting at a bar in St. Louis.  He is currently facing trial.  He claims that he is innocent, and that he can prove that the investigating officers were biased because they were friends with the witnesses to the shooting.  His theory, presumably, is that the investigating officers wanted to protect their friends (who were present at the bar at the time of the shooting) and so they shifted the blame to him.

To find out, Bryan Pour’s lawyer subpoenaed Facebook to find out if any of the investigating officers were “Facebook friends” with any of the witnesses who were present at the bar, and if they had any other interactions on the site.  Of course, all of the investigating officers have since made their profiles private or deleted them, so a subpoena is the only way Pour can find out if his accusers are biased.

Facebook has announced that it will fight the subpoena and refuse to turn over the information.

There is a deep conflict brewing:  Facebook claims that it has a duty to protect the privacy of its users; Pour claims that he has a Due Process right to find out if his accusers are biased.

This isn’t the first privacy issue involving  Facebook and certainly won’t be the last.  This one just adds the additional layer of criminal justice, and Facebook’s attempts to protect its users from the government; one wonders if Facebook would have the same stance if advertisers wanted the same information?

If you have questions about this story, please contact ReputationDefender.

Post to Twitter Tweet This Post

California court vindicates Nicole Catsouras and her family against California Highway Patrol

February 1, 2010
Orange County, California

Today, the California Court of Appeals released a sweeping opinion vindicating the Catsouras family in their lawsuit against the California Highway Patrol.

The facts are brief, but tragic.  Nicole “Nikki” Catsouras was an 18-year-old driver in Orange County, California. She was killed in gruesome car accident.  Two California Highway Patrol officers took photos of Nicole Catsouras’s body at the scene of the accident, as part of their official investigation.  But, the officers are then alleged to have emailed these photos to a group of friends as a Halloween prank, solely for the “shock value” of the images.  These friends then spread them to a wider audience.  Soon, the photos were available on hundreds of websites and anonymous trolls and cowards used the images to taunt the surviving members of the Catsouras family — including by sending the graphic images by email to Nicole Catsouras’s father and siblings.

Nicole Catsouras

“The CHP should know better. Every one of its officers should know better. The CHP is in a position to ensure that this does not happen again.” – California Court of Appeal in Catsouras v. CHP

The Catsouras family was shocked to find that police had leaked official investigation photos in such an inappropriate manner.  The Catsouras family filed a lawsuit against the California Highway Patrol (CHP) and the two officers on the scene.  The lawsuit alleged that the CHP invaded the family’s privacy by emailing gruesome photos of Nicole to people with no law enforcement interest in the photos, that the family suffered massive emotional distress, and that CHP failed to create or enforce policies protecting the privacy of families.

The Catsouras’s lawsuit against the CHP and officers was initially rejected by the trial court.  The trial court ruled that the CHP had no duty to protect photographs of the dead, and that the family could not bring a lawsuit.

The Catsouras’s suit was reinstated by the California Court of Appeal.  The California Court of Appeal sharply chastised the CHP and its officers for emailing gory photos outside their official duties, and reinstated all but one of the Catsouras’s claims.  The California Court of Appeal held that the CHP and its officers are liable for invasion of privacy if they unnecessarily publicize gory photos of the deceased without any legitimate law enforcement purpose, especially when there is no legitimate public interest in the gory phtoos.  The court also held that it is entirely predictable that gory photos emailed to a small group might be spread to a wider audience, especially if the photos are gory or otherwise shocking.  The court went out of its way to say that the CHP officers should have known better than to publicize gruesome photos to a group of people outside of law enforcement, and that the officers and CHP are now liable for the entirely predictable consequences.  The duty of the CHP to treat the dead with respect is independent from their duty as law enforcement; instead, it is a duty that all people have.

The case will now be remanded for a trial; it is likely that the case will settle before it reaches trial.

Instant analysis:

This case is an important step toward the law recognizing the harms that can be caused by the Internet.

The case also shows how difficult it is for many victims of Internet attacks to find justice.  The trial court initially rejected any liability for the shocking conduct of the officers — holding that a family could not sue even when officers took graphic photos and emailed them to friends solely for their shock value.  It took the California Court of Appeals stepping in to give justice to the Catsouras family.

The case is also an important reminder that real people can be harmed by online pranks — there is a real victim when photos of Nicole Catsouras’s body are emailed to her father with taunting messages.  The impersonal nature of the Internet often makes it seem like nobody “real” is harmed by online actions — everything done online is done through a web browser.

[C]oncepts of morals and justice clearly dictate that those upon whom we rely to protect and serve ought not to be permitted to make our deceased loved ones the subject of Internet spectacle and then to claim the defense of lack of duty.” - California Court of Appeal in Catsouras v. CHP

The case is also legal recognition of what we all should already know: the anonymous and impersonal nature of the Internet empowers malicious attacks and content.  The California Court of Appeals ruled that it is entirely predictable that a gory photo emailed to a group of people will be spread all over the Internet and used to abuse an innocent family.  Online, a photo spread to a small group be quickly spread to hundreds of websites and millions of viewers, well beyond the reach of any one person.  As soon as they emailed the photos, the officers lost the ability to stop the chain events that they set in motion.  They may not have intended the photos to reach a massive audience, but their actions started a predictable chain reaction that led to massive emotional abuse of the Catsouras family.

It will be interesting to see if any individuals who spread the photos can be found and prosecuted; now, it is in the interest of the California Highway Patrol to find as many as possible so that it may recoup its losses against them.

Key takeaways:

  • The California Court of Appeal vindicated the Catsouras’s legal claims;
  • The California Court of Appeal went out of its way to chastise the CHP for its conduct in this case, and for its legalistic defenses in light of the horrific facts of the case;
  • The California Court of Appeal held that it is an invasion of privacy to publicize gory photos of the deceased when there is no legitimate law enforcement purpose or legitimate public interest;
  • The California Court of Appeal held that the CHP and CHP officers have a duty to not unreasonably publicize photos of the deceased for their shock value;
  • The California Court of Appeal held that the CHP officers were negligent and should have known better; the court held that it was entirely predictable that photos emailed to a group of friends and family for their “gruesome shock value” would be spread to a wider audience and used to torment the family;
  • The California Court of Appeal held that families of the deceased can assert a right to privacy regarding photos of the deceased; the court squarely rejected CHP’s attempt to hide behind a legal doctrine suggesting that only the deceased could sue (which, of course, is impossible);
  • The CHP and officers may be liable for emotional distress damages;
  • There is no First Amendment issue because only the conduct of the CHP is at issue and there is no press defendant;
  • The case was remanded for a trial and to calculate damages;
  • The case will likely settle on terms favorable to the Catsouras family.

If you have questions about this story, please contact ReputationDefender.  Comments will be strictly moderated in light of the subject matter of the story.

Related documents: Full text of the Catsouras v CHP california court of appeal pdf opinion (Catsouras v. Department of the California Highway Patrol (G039916, G040330))

Post to Twitter Tweet This Post

Privacy 2.0: What’s missing from Google’s new privacy principles?

Google as Skynet
Google: Skynet?

In honor of International Data Privacy Day, Google just released a list of five “Privacy Principles.” Google said it will implement the following ideals when creating new products and services:

  • Use information to provide our users with valuable products and services.
  • Develop products that reflect strong privacy standards and practices.
  • Make the collection of personal information transparent.
  • Give users meaningful choices to protect their privacy.
  • Be a responsible steward of the information we hold.

These are important principles and they are a great start for a company that collects as much data as Google does.

But these five principles are focused on Google’s own use of data.  It is a “Web 1.0″ model of privacy, where all of the concern is focused on how Google itself uses the data it collects.  Call it a commitment to “Privacy 1.0.

One important concept is missing entirely from Google’s list: social privacy.

We live in a Web 2.0 world. Data flows through Google in a million ways: through search, through Blogspot, through YouTube, and more.  Even if Google promises to not use any of this data itself, thousands of other people can.  A video of you hosted on YouTube and found through a Google search can have a far greater impact on your privacy than Google’s use of contextual advertising to serve you ads about suntan lotion when you search for “Bermuda.”  Think about it: do you care more about contextual advertising, or a video of you that comes up for any Google search for your name?  But Google’s privacy principles do not address this at all: they are entirely focused on Google.

In other words, even if Google promises that it will not misuse data, that does not mean that Google is respecting your privacy.  Google is part of a larger privacy ecosystem.  In fact, Google is perhaps the largest and most powerful part of the Internet’s privacy ecosystem.  Google’s products (search, Blogger, YouTube, and more) connect more people to more information than any other company in history.  It is crucial that Google recognize its role as the central connection in a massive data ecosystem.  If Google creates a system that allows other people to violate your privacy, Google is complicit.

Take just a few examples that Google’s privacy principles do not even consider.  Each of these has significant privacy implications:

  • If the first result for a search for your name is a site with your home address and phone number
  • If the first result for a search for your name was a site that displayed your medical history, HIV/AIDS status, sexual orientation, or other private information
  • If the first result for a search for your name was a “hidden camera” video of you
  • If someone else created a blog about you through Google’s BlogSpot service that listed everything you did every day
  • If someone else posted a video of you on YouTube that contained false and defamatory lies
  • If a health insurer uses Google to search for your name near “cancer”, “diabetes” and “overweight” before denying you coverage
  • If an employer uses Google to search for what you are doing in your off-hours and finds that you are politically active in a way that disagrees with the boss

People can disagree about what Google’s obligation is to address each of those situations.  But Google’s current privacy principles don’t admit that these are important questions, let alone address this social side of privacy.  Call this new form of privacy, “Privacy 2.0“–the concern that your information will be misused by “300 million little brothers” rather than Orwell’s Big Brother.  We’ve previously discussed the same principle as applied to Facebook: the concern is not that Facebook itself will violate your privacy, but rather that Facebook will empower other people to violate your privacy.

Google’s “privacy principles” are entirely focused on the old view of privacy, when the biggest fear was that Google itself would violate your privacy.  It’s easy to protect your privacy from Google that way: just don’t use Google.

But in the Web 2.0 world, it is time for Google to accept that its privacy choices have impacts that go well beyond its corporate use of data.  Google can create a system that allows users to protect their privacy from others.   As the largest and most important information provider, Google has an obligation to at least consider these privacy implications.  Its “privacy principles” don’t appear to even admit that its privacy practices affect a lot more than just its internal data use.  It’s time for Google to catch up with Privacy 2.0.

Post to Twitter Tweet This Post

Learning to forget: Why web companies need to fix their data archiving policies

Man looking at paper records

The good old days of paper records.  Image courtesy Ed Uthman via CC license.

It’s time for web companies to learn how to forget.  It’s particularly time for Web 2.0 companies to learn how to forget.

The digital nature of the Internet makes it easy for websites to collect massive amounts of data: every click, every interaction, every search term, every referrer, every error… you get the idea.   This massive data harvest can be dumped into a SQL database to be analyzed, cross-tabulated, summed, totaled, averaged, and dissected.  In general, this is good.  Web companies should learn from their visitors, and web companies should take advantage of the power of digital data collection.  Important trends can be spotted, and products can be improved.

The problem comes when companies keep too much data too long.  Take the example of a search engine.  To a search engine, it is very useful to know what search terms are popular today:  Google uses each day’s search terms to compile a list of the hottest search terms of the day, and undoubtedly uses the same data for anti-spam and quality control.  So far, so good.  Google is using its data in interesting ways for an appropriate amount of time.

The problem comes when data connecting search terms to individual users is kept too long.  Six months from now, your search queries don’t matter.  Maybe there’s some data that is useful in the aggregate (like the hottest search terms of the year, used to create Google Zeitgeist), but Google doesn’t need to know who entered each search query; the data has become stale and less valuable.  Keeping non-aggregate data around too long is an invitation to privacy breaches, like what happened when AOL revealed thousands of search histories.  AOL claimed that the data was anonymized, but it was possible to identify many individuals.  Even more data can be revealed when web servers are hacked—Google claims that its servers were recently attacked in China, and it is not publicly known how much data was accessed.  The more data that was still on Google’s servers, the more data could have been revealed.  The same goes for insider theft, computers left unsecured, and any other means of getting at the data.

To put it simply, the cost-benefit tradeoff of keeping data changes as the data gets older.  The benefit of keeping data decreases as it ages; data that has business value today (like clickstream data, search queries, and website interactions) loses value over time because it becomes too stale to use for business decisions.  If long-term trends need to be spotted, then data can be aggregated and the original fine-grained data destroyed.

But the cost of keeping old data doesn’t decrease: to end-users, revealing old data can be just as harmful as revealing new data.  A site that reveales embarrassing search queries from 2 years ago is just as dangerous as a site that reveals embarrassing search queries from last week.  Here, Web 2.0 companies are particularly at risk.  They know a ton about users’ social, political, and inner lives — information that is often very personal.  They often know every interaction between two users — what profiles have you been clicking on?  what messages have you been sending? who have you “poked” lately?  were you on the Jersey Shore fanpage for an hour looking at pictures of Snooki?  A site that collects this information is constantly at risk of losing it.

The solution is to destroy data, or at least take it offline and preferably move it into non-digital form.  Search engines have recognized this in part, and have generally similar plans to destroy clickstream data within 6-18 months.  But it’s not clear that a lot of Web 2.0 companies do.  I know that many of my old Facebook interactions are still stored in a production database because I can still access them.  There is simply no need for this data to still be in a production database that is vulnerable to hacking, data leaks, insider theft, and more.  One data security incident could reveal the entire history of social interactions on the site.  This is a privacy Sword of Damocles, silently hanging over every user’s head.  What embarrassing thing have you done on Facebook in the last few years?  What private messages have you sent?  With one data dump, it could all be revealed.

Instead, Facebook could simply announce a policy to archive all interactions more than 12 months old, then move them offline.  Or it could just delete them entirely: do we really need 5 years of history of “pokes”?  Or, if users really want to keep their data, then let users download an archive with all their interactions and delete them from the server.

To be fair, forgetting is hard.  Why don’t web companies forget more often?  Often, it’s just inertia.  It takes programmers’ energy to archive data, and it takes careful business decisions to determine when and how to archive data.  Sometimes it’s like an overdue library book: you know that you need to return it, but you just never get around to it until it is very overdue.

Sometimes, the good old days are best.  Remember paper files?  Paper records are nothing like digital: they are slow to process, hard to store, and are corrupted over time.  But maybe those are features rather than bugs.

In bullet points:

  • Web companies collect massive amounts of data
    • Clickstream, social interactions, emails and messages, credit cards and payment info, preferences, actions, and activities…
  • It often seems easier to keep old data than delete it
    • Disk space is nearly free, and databases make it easy to keep old records
    • Programmers often think that old data will have some kind of marketing value
    • Archiving is a pain
  • But old personal data can be embarrassing or dangerous
    • Information about people’s financial, social, and political beliefs can cause embarrassment
    • Some data that seems benign (like your Netflix movie rentals) can reveal a lot more (like your sexual orientation)
    • Some data that has identifying information removed can still be used to identify you (like your AOL search queries)
    • Information about people’s names, addresses, and family can cause safety issues and encourage identity theft
  • That said, information about places, things, and science should be more available
    • News reports, scientific papers, and scientific data generally do not present the same privacy problems
  • Old digital data is particularly likely to be problematic
    • Data that is instantly accessible in a production database is instantly accessible to a hacker or data accident
    • Insiders can leak data, intentionally or accidentally
    • Once out, it can be digitally scanned, searched, sorted, and remixed
  • Old data is less likely to be useful in a live environment
  • There are solutions
    • Move content into an archive that the user controls
    • Delete marketing and clickstream data
    • Research and trend data can be aggregated
  • There’s something to be said for paper records.  Paper records have a very high transaction cost; that can be a feature, not a bug.

Post to Twitter Tweet This Post