09 November 2015

Privacy or not, that's the question

Published first on 9th, November 2015 on my old blog.

As a Data Scientist in training, I faced the problem of how one can complete the data collection. Many beginner data analysis consider that having more data is better. It is not so since this depends on what are you analyzing. But there is an overall trend among data science to collect more and more data.
1984JLH1In discussion with one of my colleagues who works at Google, we were brainstorming what would be good to make Amazon recommendation engines better. I pointed out that knowing is person got an item from somewhere else would be really useful. And he agreed.
Today I found an article that goes in a similar direction. Assembling the information collected about you through various machines you use increases the reliability of recommendation engines, and ultimately the profit of whoever is using recommendation engines. I wonder when they will start aggregating purchase information as well.
As an ordinary individual, I'm sick to my stomach of learning about all this. I'm getting paranoid, and thinking perhaps, use of Tor or similar browsers is not such a bad idea. I know how much info about me is available through my browser history and use of my devices.
You see, the article I linked talks about how data scientist can figure out who received treatment for SDT by combining the data from various devices. Honestly, that's just a tip of an iceberg. You would be amazed how many information about your one can squeeze out just from likes.

No comments:

Post a Comment