Cathy O'Neil on Pernicious Machine Learning Algorithms and How to Audit Them by InfoQ published on 2016-09-13T07:10:54Z In this week's podcast InfoQ’s editor-in-chief Charles Humble talks to Data Scientist Cathy O’Neil. O'Neil is the author of the blog mathbabe.org. She was the former Director of the Lede Program in Data Practices at Columbia University Graduate School of Journalism, Tow Center and was employed as Data Science Consultant at Johnson Research Labs. O'Neil earned a mathematics Ph.D. from Harvard University. Topics discussed include her book “Weapons of Math Destruction,” predictive policing models, the teacher value added model, approaches to auditing algorithms and whether government regulation of the field is needed. Why listen to this podcast: - There is a class of pernicious big data algorithms that are increasingly controlling society but are not open to scrutiny. - Flawed data can result in an algorithm that is, for instance, racist and sexist. For example, the data used in predictive policing models is racist. But people tend to be overly trusting of algorithms because they are mathematical. - Data scientists have to make ethical decisions even if they don’t acknowledge it. Often problems stem from an abdication of responsibility. - Auditing for algorithms is still a very young field with ongoing academic research exploring approaches. - Government regulation of the industry may well be required. Notes and links can be found on http://bit.ly/2eYVb9q Weapons of math destruction 0m:43s - The central thesis of the book is that whilst not all algorithms are bad, there is a class of pernicious big data algorithms that are increasingly controlling society. 1m:32s - The classes of algorithm that O'Neil is concerned about - the weapons of math destruction - have three characteristics: they are widespread and impact on important decisions like whether someone can go to college or get a job, they are somehow secret so that the people who are being targeted don’t know they are being scored or don’t understand how their score is computed; and the third characteristic is they are destructive - they ruin lives. 2m:51s - These characteristics undermine the original intention of the algorithm, which is often trying to solve big society problems with the help of data. More on this: Quick scan our curated show notes on InfoQ. http://bit.ly/2eYVb9q You can also subscribe to the InfoQ newsletter to receive weekly updates on the hottest topics from professional software development. bit.ly/24x3IVq Genre Technology Comment by mpena22 love this mathematician 2019-10-28T16:39:23Z Comment by comprehensiveanticipatorydesignscientist4livingry “I want to do this. This is where I appeal to your audience because I need computer scientists to help me, because this is the goal for the rest of my life to be an algorithmic auditor.” 2018-05-08T16:27:53Z Comment by comprehensiveanticipatorydesignscientist4livingry The model that got me started on this long trip was this teacher/ value-added model, which is sort of a nationwide trend to sort of find the bad teachers…and getting rid of bad teachers. And the way you get rid of bad teachers is first you find bad teachers using these data-driven assessments, and then you fire them. And the way that they’re attempting to find them is just statistically weak. Basically, it’s a scoring system where everyone gets a score between zero and 100 for each class they teach each year… and it’s almost entirely a random number generator. I talked to a teacher who got a 6 out 100 in 2010, and a 96 out of 100 the following year [but] he did not change the way he taught. Another teacher was fired because she got a bad score even though the principle loved her and the parents loved her. She immediately got picked up by a richer public school system that doesn’t use this system. THIS SYSTEM IS ALMOST ENTIRELY USED IN URBAN DISTRICTS. 2018-05-08T15:40:47Z Comment by comprehensiveanticipatorydesignscientist4livingry whites and blacks smoke "pot" at basically the same rate; but at the same time who gets busted for smoking pot? Who gets busted for low-level drug crimes? It's completely disparate.... this is the data coming out of the policing system and that very same data is going into these algorithms [which…] are things that look for patterns. And, if those algorithms see patterns of criminal acts in certain over-policed neighborhoods--namely black neighborhoods--then the predictive policing algorithms will tell police to go back to those neighborhoods and look for crime. To be clear, this is crime that would not be reported if there were no police there... So we have this data which is essentially biased. It's racist, biased data. But once we end up with these racist policing predictions, people imagine that since it's scientific, it’s mathematical… it must be objective/it must be truth. It’s because they don’t really understand how bias in a data set will manifest to biased algorithms. 2018-05-08T12:54:31Z