Category Archives: Data

Can ICT4D Have a Cambridge Analytica-Facebook Moment? Your Weekend Long Reads

Facebook ICT4D

Facebook currently has a Cambridge Analytica problem. It is under severe pressure to explain how 87 million users had their personal data leaked and offer assurances of how it will not happen again. Beyond the US, Cambridge Analytica has been a player in multiple elections in Kenya and Nigeria.

This month Mark Zuckerberg testified before the US Congress and the biggest revelation of that episode was that America’s lawmakers have very little understanding of how Facebook works, and missed a key opportunity to engage deeply with the problems at the heart of Facebook’s business model and practices.

Thanks to the overall weak line of questioning, Zuckerberg’s net worth rose $3 billion during the testimony.

Deleting Isn’t An Option
Users are outraged, some deleting their accounts in the #DeleteFacebook movement. It seems, though, that in general even while many people get angry, they don’t do much more than utter a tut tut.

It’s worth remembering that to actually delete your Facebook account is a privilege, as New York Times reporter Sheera Frenkel tweeted. “For much of the world, Facebook is the internet and only way to connect to family/friend/business.”

From an ICT4D perspective the people we serve, who count on us for knowing how the tech and the data works, need Facebook. And indeed, so do we in our ICT4D offerings through WhatsApp, Messenger and Groups.

Many ICT4D orgs continue to ride the wave of the stellar uptake of Facebook and its owned services, utilising the reach, communication and engagement opportunities these offer, for example, through Facebook Basics.

We Do No Harm, Right?
Can the ICT4D movement have its own Facebook-Cambridge Analytica moment? The answer is yes, of course, and to prevent, or at least delay it from happening we need to vigilantly focus on data privacy and interrogate the choices we make in the offering of our services.

Knowing that using external platforms that vacuum up data can be potentially hazardous, the ICT4D community needs to reaffirm its commitment to do no harm, to ensure data privacy and security.

We’re the good guys: we are transparent with individuals whose data are collected by explaining how our initiatives will use and protect their data; we protect their data; our consent forms are written in the local language and are easily understood by the individuals whose data are being collected.

Nice words, but do we really implement them?

How Careful Are We?
Below are a few questions to ponder in the context of Cambridge Analytica-Facebook.

  • Access: WIRED magazine shows you how to download and read your Facebook data. Does your app or service allows users to do the same?
  • Clarity: Come 25 May 2018, the General Data Protection Regulation (GDPR) will require any company serving EU citizens to be very clear about what data they are collecting and what it will be used for. Users will be able to have their data removed or changed, or demand an explanation of how its being used to profile them. This is a major law for the rights of the user (well done European Commission!): How do we do comply? How clear are our ethics research forms, or terms of use on websites? How about comics to explain Ts&Cs?
  • Recourse: Again, drawing on the GDPR (you can tell I’m a big fan), how easy is it for our users to contact us, request their data to be removed, ask for the algorithm that profiles them to be explained? Do we have the capacity to meet these demands?
  • Protection: Where is the data that you collect about users? What measures have you put in place to safeguard it?

Terms and conditions are long documents. If US users were to read every privacy policy on every website they visited in a year, it would take them 25 days to complete. Unsurprisingly, most people don’t read the damned things. How much less than can we expect someone who signs with their thumbprint to read such documents?

We really need to be very creative in solving these challenges.

How Are You Transparent and Safe?
So, how is your project practicing radical transparency? Have you had to explain your actions to your users, have you been requested to delete data? Pre-emptively, in what ways have you engaged the community to explain exactly what you are doing?

Please do share your experiences.

There is value in creating templates for radically understandable ethics forms, processes for data download and explanations.

While the scale of risk is lower for us than for Facebook, based on sheer number of affected users, the issues are no less grave. Perhaps in ICT4D, by often coming as non-profits and development agents and not as commercial entities, the issues of data protection are even more important than with Facebook. We come as people who are there to help. If we fail in doing no harm, how terrible is that!?

We need to make sure our house is in order before it’s too late.

Advertisements

Algorithmic Accountability is Possible in ICT4D

As we saw recently, when it comes to big data for public services there needs to be algorithmic accountability. People need to understand not only what data is being used, but what analysis is being performed on it and for what purpose.

Further, complementing big data with thick, adjacent and lean data also helps to tell a more complete story of analysis. These posts piqued much interest and so this third and final instalment on data offers a social welfare case study of how to be transparent with algorithms.

A Predictive Data Tool for a Big Problem

The Allegheny County Department of Human Services (DHS), Pennsylvania, USA, screen calls about the welfare of local children. The DHS receives around 15,000 calls per year for a county of 1.2 million people. With limited resources to deal with this volume of calls, limited data to work with, and each decision a tough and important one to make, it is critical to prioritize the highest need cases for investigation.

To help, the Allegheny Family Screening Tool was developed. It’s a predictive-risk modeling algorithm built to make better use of data already available in order to help improve decision-making by social workers.

Drawing on a number of different data sources, including databases from local housing authorities, the criminal justice system and local school districts, for each call the tool produces a Family Screening Score. The score is a prediction of the likelihood of future abuse.

The tool is there to help analyse and connect a large number of data points to better inform human decisions. Importantly, the algorithm doesn’t replace clinical judgement by social workers – except when the score is at the highest levels, in which case the call must be investigated.

As the New York Times reports, before the tool 48% of the lowest-risk families were being flagged for investigation, while 27% of the highest-risk families were not. At best, decisions like this put an unnecessary strain on limited resources and, at worst, result in severe child abuse.

How to Be Algorithmically Accountable

Given the sensitivity of screening child welfare calls, the system had to be as robust and transparent as possible. Mozilla reports the ways in which the tool was designed, over multiple years, to be like this:

  • A rigorous public procurement process.
  • A public paper describing all data going into the algorithm.
  • Public meetings to explain the tool, where community members could ask questions, provide input and influence the process. Professor Rhema Vaithianathan is the rock star data storyteller on the project.
  • An independent ethical review of implementing, or failing to implement, a tool such as this.
  • A validation study.

The algorithm is open to scrutiny, owned by the county and constantly being reviewed for improvement. According to the Wall Street Journal the trailblazing approach and the tech are being watched with much interest by other counties.

It Takes Extreme Transparency

It takes boldness to build and use a tool in this way. Erin Dalton, a deputy director of the county’s DHS and leader of its data-analysis department, says that “nobody else is willing to be this transparent.” The exercise is obviously an expensive and time-consuming one, but it’s possible.

During recent discussions on AI at the World Bank the point was raised that because some big data analysis methods are opaque, policymakers may need a lot of convincing to use them. Policymakers may be afraid of the media fallout when algorithms get it badly wrong.

It’s not just the opaqueness, the whole data chain is complex. In education Michael Trucano of the World Bank asks: “What is the net impact on transparency within an education system when we advocate for open data but then analyze these data (and make related decisions) with the aid of ‘closed’ algorithms?”

In short, it’s complicated and it’s sensitive. A lot of convincing is needed for those at the top, and at the bottom. But, as Allegheny County DHS has shown, it’s possible. For ICT4D, their tool demonstrates that public-service algorithms can be developed ethically, openly and with the community.

Stanford University is currently examining the impact of the tool on the accuracy of decisions, overall referral rates and workload, and more. Like many others, we should keep a close watch on this case.

3 Data Types Every ICT4D Organization Needs – Your Weekend Long Reads

After five years researching the effectiveness of non-profit organizations (NPOs) in the USA, Stanford University lecturer Kathleen Kelly Janus found that while 75% of NPOs collect data, only 6% feel they are using it effectively. (Just to be clear, these were not all tech organizations.)

She suggests the reason is because they don’t have a data culture. In other words, they need to cultivate “a deep, organization-wide comfort level with using metrics to maximize social impact.” Or, in ICT4D speak, they need to be data-driven.

Perhaps NPOs feel that if they start collecting, analysing and using big data, that need will be satisfied. But one cloud server of big data does not a data culture make. While big data can be a powerful tool for development, there are three other data types that could significantly improve the impact of any ICT4D intervention.

Thick data

Technology ethnographer, Tricia Wang, warns us about the dangers of only looking to big data for the answers, of only trusting large sets of quantitative data without a human perspective. She proposes that big data must be supplemented with “thick data,” which is qualitative data gathered by spending time with people.

Big data excels at quantifying very specific environments – like delivery logistics or genetic code – and doing so at scale. But humans are complex and so are the changing contexts in which they live (especially true for ICT4D constituents). Big data can miss the nuances of the human factor and portray an incomplete picture.

As a real-life example, in 2009 Wang joined Nokia to try to understand the mobile phone market in China. She observed, talked to, and lived amongst low-income people and quickly realised that – despite their financial constraints – they were aspiring to own a smartphone. Some of them would spend half of their monthly income to buy one.

But the sample was small, the data not big, and Nokia was not convinced. Nokia’s own big data was not telling the full story – it was missing thick data, which led to catastrophic consequences for the company.

Adjacent data

Sometimes there is value in overlaying data from other sources onto your own to provide new insights. Let’s call this “adjacent data”. Janus provides the case of Row New York, an organization that pairs rigorous athletic training with tutoring and other academic support to empower youth from under-resourced communities.

To measure success, Row started by tracking metrics like the number of participants, growth, and fitness levels. But how could they track determination or “grit” – attributes of resilient people?

They started recording both attendance and daily weather conditions to show which students were still showing up to row even when it was 4C degrees and raining. “Those indicators of grit tracked with students who were demonstrating academic and life success, proving that [Row’s] intervention was improving those students’ outcomes.”

Pinpointing adjacent data requires thinking outside of the box. Maybe reading Malcom Gladwell or Freakonomics will provide creative inspiration for finding those hidden data connectors.

Lean data

Lastly, there is a real risk in just hoovering up every possible data point in the hope that the answers to increased impact and operational efficiencies will emerge. That’s not referring only to the data security and privacy risks related to the sponge approach. Rather, that’s because it’s easy to drown in data.

Most ICT4D initiatives don’t have the tech or the people to meaningfully process the stuff. Too much data can overwhelm, not reveal insights. The challenge is gathering just enough data, just the data we need – let’s call this the “lean data”. When it comes to data, more is not better, just right is better. In fact, big data can be lean. It’s not about quantity but rather selectiveness.

Lean data is defined by the goals of the initiative and its success metrics. Measure enough to meet those needs. When I was head of mobile at Pearson South Africa’s Innovation Lab, we were developing an assessment app for high school learners called X-kit Achieve Mobile.

With the team we brainstormed the data we needed to serve our goals and those of the student and teacher users. We threw in quite a lot of extra bits based on “Hmm, that would be cool to know, let’s put it in a dashboard.”

The company was also preparing to report publicly on its educational impact, so certain data points were being collected by all digital products. Having a common data dictionary and reporting matrix is something worth considering if you’re implementing more than one product.

After building the app we only really used about 20% of all the reports and dashboards. Only as we iterated did we discover new reports that we actually needed. The fact is that data is seductive, it brings out the hoarder in all of us. We should resist and only take what we need

So, perhaps the path to building a data culture is to always have thick data, be creative about using adjacent data, and keep all data lean.

Image: CC by janholmquist

Every Big Data Algorithm Needs a Storyteller – Your Weekend Long Reads

The use of big data by public institutions is increasingly shaping peoples’ lives. In the USA, algorithms influence the criminal justice system through risk assessment and predictive policing systems, drive energy allocation and change educational system through new teacher evaluation tools.

The belief is that the data knows best, that you can’t argue with the math, and that the algorithms ensure the work of public agencies is more efficient and effective. And, often, we simply have to maintain this trust because nobody can examine the algorithms.

But what happens when – not if – the data works against us? What is the consequence of the algorithms being “black boxed” and outside of public scrutiny? Behind this are two implications for ICT4D.

The Data Don’t Lie, Right?

Data scientist and Harvard PhD in Mathematics, Cathy O’Neill, says that clever marketing has tricked us to be intimidated by algorithms, to make us trust and fear algorithms simply because, in general, we trust and fear math.

O’Neill’s 2016 book, Weapons of Math Destruction: How Big Data Increases Inequality and Threatens Democracy, shows how when big data goes wrong teachers lose jobs, women don’t get promoted and global financial systems crash. Her key message: the era of blind faith in big data must end, and the black boxes must be opened.

Demand Algorithmic Accountability

It is very interesting, then, that New York City has a new law on the books to do just that and demand “algorithmic accountability” (presumably drawing on the Web Foundation’s report of the same name). According to MIT Technology Review, the city’s council passed America’s first bill to ban algorithmic discrimination in city government. The bill wants a task force to study how city agencies use algorithms and create a report on how to make algorithms more easily understandable to the public.

AI Now, a research institute at New York University focused on the social impact of AI, has offered a framework centered on what it calls Algorithmic Impact Assessments. Essentially, this calls for greater openness around algorithms, strengthening of agencies’ capacities to evaluate the systems they procure, and increased public opportunity to dispute the numbers and the math behind them.

Data Storytellers

So, what does this mean for ICT4D? Two things, based on our commitment to being transparent and accountable for the data we collect. Firstly, organisations that mine big data need to become interpreters of their algorithms. Someone on the data science team needs to be able to explain the math to the public.

Back in 2014 the UN Secretary General proposed that “communities of ‘information intermediaries’ should be fostered to develop new tools that can translate raw data into information for a broader constituency of non-technical potential users and enable citizens and other data users to provide feedback.” You’ve noticed the increase in jobs for data scientists and data visualisation designers, right?

But it goes beyond that. With every report and outcome that draws on big data, there needs to be a “how we got here” explanation. Not just making the data understandable, but the story behind that data. Maybe the data visualiser does this, but maybe there’s a new role of data storyteller in the making.

The UN Global Pulse principle says we should “design, carry out, report and document our activities with adequate accuracy and openness.” At the same time, Forbes says data storytelling is an essential skill. There is clearly a connection here. Design and UI thinking will be needed to make sure the heavy lifting behind the data scenes can be easily explained, like you would to your grandmother. Is this an impossible ask? Well, the alternative is simply not an option anymore.

Data Activists

Secondly, organisations that use someone else’s big data analysis – like many ICT4D orgs these days – need to take an activist approach. They need to ask questions about where the data comes from, what steps were taken to audit it for inherent bias, for an explanation of the “secret sauce” in the analysis. We need to demand algorithmic accountability” We are creators and arbiters of big data.

The issue extends beyond protecting user data and privacy, important as this is. It relates to transparency and comprehension. Now is the time, before it’s too late, to lay down the practices that ensure we all know how big data gets cooked up.

Image: CC by kris krüg