Hold the code #1

An Introduction

Welcome to Hold The Code. Founded by an interdisciplinary team of Northwestern students part of the Responsible AI Student Organization (RAISO), we are passionate about exploring the ways technology intersects every corner of our lives.

Regardless of one’s computer science background, we believe everyone can share a vested interest in learning about ethical applications of AI and contemporary technology (CTech). That’s right: even if you’re like me, struggling in CS110 to make Python draw shapes, this newsletter is for you.

Through sharing relevant news and perspectives, the intent of our newsletter is to democratize the conversation around AI and engage as many voices as possible. Thanks for choosing to be a part of it.

Put on your AI Reading Genes

Viruses mutate all the time, and the Sars-CoV2 virus is no exception. MIT Computational Biologist, Bonnie Berger, has developed an NLP model that can help identify potentially dangerous viral mutations by effectively reading the genetic code of a virus.

What is NLP?

Natural language processing, or NLP, is a subset of artificial intelligence that allows computers to interpret human language. One way these models can work is by relating words based on how similar their meanings are. For example, the words happy and cheerful would be more closely related than happy and angry.

From Sentences to Sequences: How does NLP work on genetic code?

Short answer: it’s all based on analogy.

If you think of the entire genetic sequence of a virus as a really long sentence, the individual components of these sequences can be thought of as words. When a virus mutates, the “words” of its genetic sequence change. Whether or not these mutations will affect the overall behavior of the virus in a profound way (e.g. make the virus more infectious or resistant to treatment) depends on what changes occurred.

Going back to our previous example, replacing the word happy with cheerful in the sentence I am happy does not change the meaning too drastically, but replacing the word happy with angry makes the sentence take on an entirely different meaning.

The same idea holds true for genetic mutations in viruses: the behavior of the virus will only change if its genetic code mutates in a consequential way. When Berger’s NLP model is given the genetic sequence of a new strain, it is able to read this sequence and find the mutations within it that may affect viral behavior.

Looking Ahead

Although the model is not perfect (it missed a potential vaccine-resistant mutation in a South African COVID variant), this technology is quite promising and has the potential to be applied outside of the COVID pandemic, such as finding drug-resistant cancer mutations.

Clearview AI Usage Spikes Following Capitol Riots

Clearview AI first made headlines early last year when a New York Times investigation uncovered how this unregulated facial recognition application was able to identify virtually anyone walking down the street with a single photo. This investigation raised concerns over privacy since this application scraped photos from literally millions of websites to compile a giant facial recognition database.

Clearview AI One Year Later

Clearview AI made the news again last month when its CEO announced that their usage spiked by 26% after the Capitol attack on January 6th, with some local police departments using the technology to send the identities of suspects to the FBI.

Why Should I Care?

Besides the massive privacy concerns about a facial recognition database of this size in the hands of law enforcement (or anyone else for that matter), there are also major concerns surrounding the normalization of facial recognition by law enforcement in general. Specifically, there are concerns about the potential to misuse facial recognition technologies against Black and Brown communities.

Nathan Freed Wessley from the ACLU’s Speech, Privacy, and Technology Project says, “We know who it will be used against most: members of Black and Brown communities who already suffer under a racist criminal enforcement system.”

Compounded with this is the fact that many facial recognition systems are biased against people with darker skin and women. These groups are more likely to be misidentified by facial recognition systems, which can lead to false accusations and drastic repercussions when this technology is in the hands of law enforcement groups.

Facial Recognition by the Numbers

Men with lighter skin are misidentified by facial recognition systems at a rate of 0.8% (source)
Women with darker skin are misidentified by facial recognition systems at a rate of between 20%-35% (source)

Further Reading

The featured article mainly focuses on Clearview AI usage after the attack on the Capitol. If you’re interested in learning more about biases in facial recognition technology, this article from MIT News is a great place to start.

TL;DR: Facebook is Developing a News-Summarizing AI

According to an internal news report, Facebook is developing a news-summarizing AI called “TL;DR.” Named after the acronym “Too Long, Didn’t Read,” TL;DR will reportedly provide bullet-point summaries of news articles so that users do not need to read them. The tool also aims to leverage AI to provide chat-bot support to clarify terms, provide audio narration of articles, and even a vocal assistant to answer questions.

Some Concerns

Facebook is already considered a major source of misinformation, which is in large part driven by their algorithms. Critics of TL;DR have raised questions like: which sources of news will be prioritized? How will TL;DR summarize key pieces of news, without biasing the reader towards certain facts? The AI will also need to be carefully designed not to take quotes out of context and further contribute to the spread of misinformation.

Our Opinion

Facebook + News + AI — what could go wrong? The answer: so much. This tweet by Audrey Cooper captures it well:

You can count on Hold The Code to be watching you, watching us, Mark Zuckerberg.

Weekly Feature: “How Our Data Encodes Systematic Racism”

Every week during February, Black History Month, Hold The Code plans to feature a story related to AI and racial equality.

In an article published by the MIT Technology Review, Deborah Raji explains several ways that data encodes systematic racism. From predictive policing tools that disproportionately affect communities of color, to self-driving cars that are more likely to hit Black pedestrians, Raji writes:

“Data sets so specifically built-in and for white spaces represent the constructed reality, not the natural one.”

She argues that we must resist technological determinism and accept responsibility for the technology we create. There is a tendency to view data as perfectly objective, removed from our own biases.

For Example

GPT-2 is an automated language generator developed by OpenAI that operates by generating responses that replicate the language pattern observed in its data set.
But: the data set it uses, named WebText, gathers its data the most upvoted posts on Reddit, a social media site rife with racist language and ideas.
So: when given simple prompts like “a white man is” or “a Black woman is,” the GPT-2 generated text often includes horrific slurs and direct threats.

The Path Forward

According to Raji, the machine-learning community problematically accepts a level of dysfunction, displacing blame from human to the machine. Only by recognizing this, Raji argues, can the technologists begin to institute better practices, such as: disclosing data provenance, deleting problematic data sets, and explicitly defining the limitations of every model’s scope.

For More Reading

Read Ruji’s full piece here. And If you’re interested in a more in-depth study regarding the ethical consideration of predictive policing, check out Rashida Richardson’s paper, “Dirty Data, Bad Predictions.”

Thanks for Reading

Written by: Lex Verb and Molly Pribble.