16 November 2016

Data analysis and reality

Recently I was watching a TED talk about how Machine intelligence makes human morals more important.
And Zeynep has a point. The way we use data science today is more-less a black box. People input the data, allow the machine learning algorithm to spit out the results and then just go with it, without even bothering to double check. 
Zeynep presented the example where a black-box algorithm was predicting the likelihood of criminal re-offending, assigning the higher probability to a woman who did not do anything subsequently and releasing the criminal who ended up re-offending with the quite violent crime. Basically failing in its primary task, to make the society safer.
To me, as someone who did scientific research for a decade, it was painfully obvious that whoever made that particular black-box did not bother to base it on the facts, but used the personal biases instead.
The machine learning algorithm is as good as the premises you feed it. If you feed it initially with the false assumptions, it will produce false results. And that is a simple truth.
This particular quality of an algorithm is leaving them wide open for any kind of covert manipulation. Whoever inputs the initial teaching data set, basically makes an algorithm find just the cases that match the premises behind that initial training data set.
The problem arises if your algorithm clashes with reality. It is hard to make an algorithm that does not reflect a bias (cognitive or some other) of the persons who made it. That’s why scientists receive so much training on how to avoid that particular pothole. And it is hard work to go against your own beliefs, to open your mind to something that runs counter to all what you think it is correct. But when you make a machine learning algorithm that supposed to predict reality, you have to do that. Otherwise, the algorithm you make is worthless.
Sadly in today’s society, we are usually closed in our bubbles, in little spheres of a belief that confirms what we think, making us believe we know reality, forcing us to make algorithms that reflect our bubble, and not the reality itself.
So I would add, besides the importance of our morals we have to question our own biases, we have to recognize that we have biases, and we have to fight to minimize their influence. Because, in the end, reality will not conform to what we believe but slam us with the surprise. In the talk, the surprise was the real criminal being released and making violent re-offending crime, crushing down all the prejudices and biases creators of that particular algorithm used, and ultimately making the society less safe, failing in its fundamental purpose.
So, if you are searching for a new data scientist who supposes to help your company/organization predict reality and give you the edge, make sure that person you hire is aware of the existence of biases and willing to question herself/himself every step of a way. Because only then you will end up with the data science product that actually matches reality and gives you the edge you're seeking.

No comments:

Post a Comment