DataKind DC Works with Local Government to Improve City Services

Can government partner with hackers to deliver better services?  The National Day of Civic Hacking answered with a resounding yes! In support of this day, last September, we teamed up with the Office of the D.C. Deputy Mayor for Health and Human Services and Code for DC, to help tackle some of the city’s toughest challenges. Joining us were 150 volunteers from the D.C. tech community.

A great example of what can be achieved by applying data for good and through strong collaboration, the Health and Human Services hackathon was featured in the Huffington PostListen Up America: National Day of Civic Hacking.

We’re extremely proud of the work the volunteers achieved in less than 24 hours. The volunteers split up into five teams and collaborated closely with the DC’s Department of Human Services (DHS), DC Office on Aging (DCOA), Child and Family Services Agency (CFSA) and the Department of Health (DOH) to use open data to help these organizations address various health and human service related problems.

Find out more about what each team worked on and was able to achieve in the summaries below.

How can open data be used to make the jobs of Child and Family Service Agency (CFSA) caseworkers easier?

  • The Problem: As part of their work, CFSA Caseworkers must recommend beneficial CFSA programs for families in need. Though there is some overlap among the programs offered, each has a unique set of eligibility criteria making and filtering through each of the program’s criteria is a laborious and time-consuming process for caseworkers.
  • What Resulted: The volunteer team built a multi-service referral portal prototype for CFSA. Check out the prototype here. The portal asks a series of simple questions and then suggests any of the CFSA programs for which the client is eligible.
  • What’s Next: DataKind DC and Code for DC will be continuing development of the prototype for CFSA. In addition, we’re building out an eligibility blueprint app that can be easily replicated for other applications with similar challenges.

Washington, D.C. is the Third Rattiest City in the U.S.! Can we fix our rodent problem?

  • The Problem: Orkin Pest Control recently named Washington, D.C. the third “rattiest” city in America, and long-time residents can see that rodent populations have been on the rise.
  • What Resulted: As part of a larger effort to analyze 311 data, the team worked to develop models that can predict long-term trends in rodent complaints. They also worked to build out features that allow users to examine trends in service complaints over time and in their neighborhoods.
  • What’s Next: The team is continuing to work on the project to develop better metrics for predicting trends in rodent complaints, using census blocks and space and time variables.

Google Maps can’t solve this one: How can D.C.’s seniors locate and travel to needed services?

  • The Problem: Seniors need to regularly travel to a variety of locations for medical and  wellness center appointments as well as for day-to-day events and necessary shopping. In addition to the complexity of finding health-related services and scheduling, it is also challenging for seniors to get to the places they need to go. There are a number of options and organizations offering transportation for seniors to consider including: the Department of Health Care and Finance (DHCF), District Department of Transportation (DDOT), Department for Hired Vehicles (DHV), and the metro; however it is sometimes difficult for seniors to know which are both best and available to them.
  • What Resulted: The team first researched the different ride services available, and each of the provider’s service areas and eligibility requirements. The goal was to design a site or an app that would help individuals quickly and easily find out which services they are eligible for. Users would just need to enter where they needed to go along with a few pieces of information about themselves. A mock up  was built of what this site could look like, with logic that displayed the services available based on the information entered. The team also built a phone service that when called, asked a series of question that a user could answer with simple ‘1’ for yes and ‘2’ for no responses. The service would then identify a transportation service the caller is eligible for and provide their contact information.
  • What’s Next: The project served the DCOA by providing several examples of what they could create with data to support the needs of those they serve. At this time, there are no plans to further develop the models explored. However, there is a desire to build a referral system that is similar to the one developed in the CFSA caseworker project — the same project described above — and the work accomplished by the volunteer team can be repurposed to support this future effort.

How can we help DC’s homeless find available shelters?

  • The Problem: DC’s homeless shelters and daytime facilities vary in capacity and services, and there is no central location where this information exists. The DC DHS wanted to use open source data on homeless services and transportation route information to quickly and easily access facilities and services for their clients and also determine the best travel routes to available facilities.
  • What Resulted: Illustrating what could be done with the data, the team built 10 different prototypes of tools and analysis including several maps to locate homeless shelters and display transportation routes, and a tool that allows users to easily search for shelter facilities and programs based on specific services that may be needed, such as dental aid, child care, or domestic violence support.   
  • What’s Next: Following the  hackathon, the DC homeless office worked to improve their data and is continuing to fill additional gaps in the data that the team has since discovered. The team has also continued to meet and work with the DC DHS on extending the functionality of a “selector tool” that will help case workers identify where homeless individuals can be directed to go when multiple services are needed. The team also meets regularly with Code for DC to keep the project moving forward.

How can the Department of Health (DOH) reduce the number of restaurant closures and increase safety of patrons?

  • The Problem: Food safety inspections help ensure food is being handled properly from preparation through serving. The Department of Health (DOH) wanted to better understand and identify common factors, such as establishment type or location, of food outlets with poor inspections ratings. These factors could help guide future training efforts by the DOH for business owners and their staff and also help the DOH allocate staff resources more efficiently.
  • What Resulted: Two prototype dashboards were developed to demonstrate for DOH the various ways they could visualize their data. Both of the prototypes focused on providing DOH managers with risk assessments and inspection trends, which would help DOH managers allocate resources to the most troublesome establishments.
  • What’s Next: The team presented the functionalities of both dashboards to members of the DOH leadership. The agency was appreciative of all the hard work and is eager to explore and learn more about additional ways their data can help inform their practices.

Although there are often barriers to government adoption of new tools, we at DataKind DC believe that persistent volunteers partnered with tenacious leaders can really make positive social change happen!

Will we see you at our next get-together?
Join us on Slack at:

Source: DataKind – DataKind DC Works with Local Government to Improve City Services

Building a Brighter Future for America’s Foster Youth Through Data Science

Child welfare continues to be a growing concern in the United States, with the number of children in foster care having increased steadily since 2012. According to the Administration for Children and Families (ACF), approximately 437,500 children were reported to be in the foster care system in 2016. Whereas, in 2012, the number of youth in foster care was estimated at 397,000.

Adding to the problem is America’s escalating opioid crisis. More and more children are entering foster care as a result of parental drug abuse, placing a strain on an already overtaxed child welfare system. Data from the federal Adoption and Foster Care Analysis and Reporting System (AFCARS) shows drug abuse by a parent as one of the leading causes for a child to be removed from their home, with more than 92,000 children entering foster care between 2015 to 2016 for this reason alone. This influx of children in the system exacerbates existing challenges around child welfare and deepens the need to address issues surrounding foster care placement, caseworker capacity, and screening and risk assessment for neglect and abuse, in order to improve the safety and security of foster youth and support their transition into adulthood.

There is great potential for data and technology to be used to help tackle the complex problems faced by the sector, ensure the well-being of foster youth, and even reduce the number of children who end up in foster care in the first place. In recent years, many child welfare agencies have begun to explore the possibilities of using data science and predictive technology to identify ways to work more efficiently, improve safety and risk assessment processes, and better serve youth.

This month, in partnership with the Microsoft Cities Team – Civic Engagement, we will be launching two new projects with organizations serving foster youth – Think of Us (TOU) and Community Based Care of Central Florida (CBCCF). With available open data on child welfare along with data collected by these organizations, teams of DataKind’s expert pro bono data scientists will use machine learning to help both organizations improve the safety and well-being of foster youth and develop solutions that can be shared and benefit the sector as a whole.

In addition to problems faced by youth while in foster care, there is evidence showing that these children are at a greater risk as adults for substance use and abuse, homelessness, unemployment and incarceration. TOU strives to help foster youth successfully transition into healthy, stable, and thriving adults by providing the resources and support they need to do so. We’ll work closely with TOU to design classification models that will lay the foundation for building risk profiles that will allow TOU to identify and better serve at-risk foster youth by providing more targeted services and support to help them successfully transition into adulthood. 

Another factor affecting the safety and well-being of foster youth, and a positive transition into adulthood, is found to be the time a child spends with his or her assigned caseworker. There is anecdotal evidence showing that child outcomes are impacted negatively by caseworker turnover. As such, caseworker retention is a top priority for organizations like CBCCF, a national leader in progressive child welfare systems, that provides fostering and adoption services for youth. Our team will help CBCCF gain a greater understanding of the complexity of workers’ caseloads and address operational and logistical challenges by optimizing caseworker activities in order to improve outcomes for children in care. 

As we discovered from an initial DataDive in December 2016 with the Annie E. Casey Foundation, that focused on related topics around improving and supporting youth in America, the sharing of data among youth service programs and agencies is crucial as it can show a more comprehensive picture of youth and youth in multiple systems. Such data collaboration could better inform future research, help unearth additional insights and provide the information necessary to develop tools that could aid in improving the lives of millions of foster children at risk of poor educational, economic, social and health outcomes.

This new collaboration offers an opportunity for DataKind and Microsoft to further build on these initial learnings and apply data science and emerging capabilities in machine learning to gain aditional insights and help optimize processes. Bringing together multiple project partners to share their data, expertise and resources, will show how working side by side to co-design data-driven solutions can help tackle critical social issues like child welfare.

We look forward to diving into these projects with our partners and helping work toward developing a brighter future for foster children across the country.

Source: DataKind – Building a Brighter Future for America’s Foster Youth Through Data Science

Doing Data for Good Right

As an organization focused on using  data science to support humanitarian issues, DataKind knows that ethics are of the utmost importance. In this blog, DataKind UK outlines the ethical principles their community has created. Read on and see what you think. Is this a code you’d adopt? What would you add? Join the conversation by leaving comments below and stay tuned for more on this topic.

Ethical principles for pro bono data scientists

Guest blog: Christine Henry and the DataKind UK team

“Ethics is knowing the difference between what you have a right to do and what is right to do.” – Potter Stewart

As data scientists volunteering to help nonprofits, we hope that our work will have a positive impact on those around us. However, in the new frontiers of data science and artificial intelligence, it is sometimes difficult to know what right and wrong looks like or what the impact of our work will be. We can all agree that we don’t want to discriminate against people, but we also recognize that, in data science, labelling and categorising types of people and types of behaviour is at the heart of what we do. At DataKind, ethics around data and technology takes on an even more critical and serious note  when you consider that the projects we work on are often about, and for, the most vulnerable populations in our society

DataKind’s projects often lead to a nonprofit partner reallocating scarce resources (money, food, or even advocacy and attention), and this may mean that some groups will go unsupported. The “do no harm” principle is not as simple to apply in our work. Our job is often about minimising harm and maximising positive impacts, rather than avoiding harm all together. And sometimes doing nothing is not necessarily better: a charity’s mission can be furthered by data analysis even if the analytical project is imperfect. 

At DataKind UK, we’ve been thinking about some of these tough ethical questions. How do we ensure that the predictive models we build don’t have unintended consequences – and can we ever be sure of that? How can we assess the benefits of implementing an algorithm versus the possible risks? How do we ensure that we don’t allow these ethical challenges to prevent us from taking action when the status quo is worse?

We believe the best way to act ethically as an organisation is to directly confront these hard ethical questions and to support open, frank discussions. With input from our pro bono data scientists, we put together a set of principles to guide these discussions. The principles will help us think about risks within our community and share these concerns with our nonprofit partners. In creating the principles, we focused on understanding potential harms, looking carefully at data context and biases, and being transparent about analysis limits and the reasons for analysis choices.

See the principles we outlined below and learn more about how we crowdsourced these from our community.

How did we do it?

On an evening this past October, we brought together 20 members of our volunteer community, plus a couple of DataKind friends and ethics experts. The brilliant Alix Dunn, Founder and Executive Director of the Engine Room, and conveniently the partner of DataKind UK’s Executive Director, adeptly facilitated the event. For Alix’s reflections on the discussion see here and check out the Responsible Data Forum here. 

Rather than start from a blank page, we decided to “seed” the workshop with samples of other related documents that participants could take ideas from or react to. We selected half a dozen sets of principles from different fields and professions (e.g. government, corporations  and academia).

Working in small groups, people pulled useful principles out of the sample documents, or hacked their own variants. We also supplied short (anonymised) case studies from past DataKind projects that included possible ethical issues, to help groups think about the real world application of the principles they were discussing. Lastly, we pulled together everyone’s principles into one shared document which, after some heavy editing, turned into the five principles outlined below.

What’s next?

We will begin rolling out the principles to volunteers starting new projects, and track ethical issues raised and what happens. Our volunteer-run Programmes Committee will also look to start building any required processes – for example, a tracking document for issues, identifying someone for volunteers to contact with ethical issues on a project or general level, and updating our existing scoping process to identify ethical issues at an early stage.

This is intended to be a living document for the DataKind UK community and anyone interested in ethical data science. The principles will be updated and adapted as necessary in response to future changes in data science practice; development of ethical standards in the broader data community; and the needs of charities, stakeholders or our community. We hope that the principles we outlined can be an example or starting point for other organisations and data science practitioners as well.

The Principles

As a Datakind UK volunteer, I will strive to adhere to the following principles:

  • I will actively seek to consider harms and benefits of my work with DataKind UK.
    1. I know that data often represents people and misusing it can do harm. In light of DataKind UK’s mission to do data science for social good, I will consider the impact of my work on vulnerable people and groups in particular.
    2. I understand that data can be a tool for inclusion and exclusion, and that these effects may be non-obvious and indirect. The output from data analysis can be used in decisions that have disparate impacts on people, including the allocation of scarce resources (e.g. money, food, or even advocacy and attention). I will openly discuss with my team and the charity partner the different potential impacts of this project, including any indirect consequences.
    3. In thinking about the impact of my work, I will weigh up the costs of the status quo. What is the cost of doing nothing?
    4. I will advocate for fair and accurate representation of my work in public and within charities and their partners. 
  • I will actively seek to understand the context of the data and tools I use.
    1. I will look for and interrogate biases in data and collection methods.
    2. I will consider built-in assumptions, defaults and affordances of tools, and consider how these may impact my work.
    3. I will think about the history of the data and tools we’re using, and I understand that all datasets and tools carry a history of human decision-making. This history also includes choices about the data and people not included.
    4. I understand that privacy is not binary, and that context matters for consent and for the expectation of people whose data are available to me.
    5. I understand the limits of the stories I can tell with the available data.
  • I will enable others to understand the data and analysis choices I have made, now and in the future.
    1. I will be open and transparent about my choice of data and sources.
    2. I will be open and transparent about analysis choices and tools, and the choices made in assessing model and result quality.
    3. I will work so that users – including people without data expertise – can use the analyses and tools I work on, effectively and appropriately.
    4. I will think about the configurability, sustainability, transparency, auditability, and understandability of my work. I will make stakeholders aware of limits to these.
    5. I will be aware of the time window in which my analysis may be valid, and will share this with stakeholders.
  • I will actively seek to understand my own limits and the limits of the organisations involved.
    1. I will be aware of my own limits and realistic about what I can offer, and what DataKind UK can offer within its different programme formats.
    2. I will be aware of the limits of new technology, and I will respect human expertise and incorporate technology into existing human decision-making.
    3. I will be alert to possible legal issues and seek out advice and expertise where necessary.
  • I will debate and discuss ethical choices.
    1. I will debate ethics openly and acknowledge that the choices we make are uncertain.
    2. I will raise any ethical concerns within DataKind UK, and listen to those of other volunteers. I will acknowledge that other people may make other ethical decisions based on the same information.
    3. Where appropriate, I will seek to carry these principles outside of DataKind UK.

Source: DataKind – Doing Data for Good Right

2017 Global Chapter Summit Recap

Last month we hosted our 4th Annual Global Chapter Summit where we were joined by Chapter Leaders from across our global network.  A natural part of networks is dynamic change, especially when you have a network full of such active and amazing volunteers. Year over year we get to greet both familiar and new faces and this year was no exception.  We are thrilled to welcome eight new leaders to DataKind’s Chapter Network:

DataKind Bangalore: Deepthi Chand Alagandula, Jayant Pahuja

DataKind DC: Aimee Barciauskas, William Ratcliff

DataKind Singapore: Arrchana Muruganantham, Neil Shah

DataKind UK: Joe Harris, Michelle Lee, Mike Taylor

We were excited by the almost uncontainable energy at the summit, and the new ideas brought forth by both our newly joined Chapter Leaders and seasoned DataKinders alike. Together we shared (and learned from!) some failures, brought forward insights about volunteer engagement, took a deeper dive into project selection to enhance processes, discussed leadership structure within each chapter to identify best practices that can be applied across the globe, and recommitted to DataKind’s mission to use data science and AI, ethically and capably, to create a sustainable planet in which all have access to their basic human needs. It is this melding of knowledge, experience and enthusiasm that really allows our network to continue to thrive and grow.

The work we do at DataKind isn’t driven by just one person, but is accomplished by an exchange between people, the practiced process with individuals who are not afraid to question why something must be done one way and not another. It’s this rigor, alongside our value of humility, that helps the work we do flourish across the boundaries of issue areas and expertise.

Through many spirited debates and stimulating discussions, fun banter and bonding, the summit solidified the relationships that will help propel us into greater alignment and impact for 2018 and beyond.

Thank you to all our Chapter Leaders for joining us from around the world and taking the time to participate, share learnings from the past year, and help build strategies to move our work forward. A special thanks as well to our friends at Teradata for their generous sponsorship of the Chapter Summit and commitment to supporting this global movement.

We look forward to another exciting year for DataKind and continuing to come together, as we did at the summit, in service of who we can be when we work together – and ultimately the collaborative impact we can achieve for the greater good.


Source: DataKind – 2017 Global Chapter Summit Recap

DataKind Advances to the Next Round of $5 Million AI XPRIZE Competition

DataKind has been named one of the top ten teams in the IBM Watson AI XPRIZE challenge, a four-year, $5 million, global competition to use AI technologies to tackle the world’s greatest challenges.

Our challenge for this competition is focused on poverty alleviation and builds off a project started at a DataDive with the Bill & Melinda Gates Foundation using satellite imagery to combat crop disease in Ethiopia. There are multiple phases within this project and each focuses on a different issue area, though all ultimately are linked to poverty alleviation. The first phase focuses in on fighting crop disease, which in places like Ethiopia, where a majority of the population’s livelihood is reliant on agriculture, it can decimate local economies. If these communities could know ahead of time, if and when, their crops were at risk for disease, measures to intervene can be taken more quickly to reduce crop loss and prevent spread of the disease. Our hope is to create a model that would provide real-time information on crop disease to support the creation of enhanced early warning systems. The intention is to also create a model that could easily be scaled and transferred to other regions.

In January, we applied to be part of the competition, along with 10,000 others, and, in March, learned we would be one of 147 teams selected to participate in the competition. Today it was announced 59 teams will be moving forward to the next round of the competition and we’re honored to be among the top ten.

We look forward to our continued work with the AI XPRIZE community, comprised of experts /authorities in the field of AI, machine learning and and data science, as well as many of our esteemed peers and colleagues looking to use data and technology to help improve the world.

Stay tuned for more updates and learn more about AI XPRIZE.



Source: DataKind – DataKind Advances to the Next Round of Million AI XPRIZE Competition

Democratic Freedom DataDive Capacity

This weekend, we’ll be hosting a Democratic Freedom DataDive in New York in partnership with Omidyar Network and the Knight Foundation. Together, we’ll be working with three organizations focused on promoting government transparency, affordable housing and combating extreme hate groups. Because we may have a full house this weekend, please continue to check this blog for the latest updates on event capacity if you are local and planning to attend!

We’ll update the text below and the image above to let you know if we’re full or if we still have room for more DataDivers to attend. 





What’s this DataDive all about?

Advances in data science, machine learning and predictive technologies offer organizations working to protect democratic freedoms unprecedented opportunities to leverage data to achieve their mission, scale their work and help build a more just and equitable world. In response to recent discourse and policy in the U.S. targeting vulnerable groups and threatening our democratic institutions, we’re hosting a DataDive in partnership with Omidyar Network and the Knight Foundation, with support from Bloomberg, DataRobot, American Airlines and Neonto. This DataDive aims to support the work of the organizations below working to promote democratic freedoms in the U.S. Here’s a sneak peek at the projects we’ll be working on:

  • Center for Responsive Politics, Financial Disclosures
    Develop a process to identify nuanced patterns in politicians’ personal financial disclosures, making this information more valuable to political journalists and public watchdogs who track these activities.
  • Center for Responsive Politics, Political Ads
    Track when special interests purchase political ads in order to create an open, public tracking tool for newsrooms around the country.
  • Los Angeles Mayor’s Office
    Identify illegal housing conversions in service of protecting vulnerable tenants. 
  • Southern Poverty Law Center
    Develop a methodology to measure volume and interest in hate content online and monitor how effectively hate sites are exploiting Google Search.

Learn more about DataKind’s focus on using data science to promote democratic freedoms.

Source: DataKind – Democratic Freedom DataDive Capacity

Data Science Protecting Democratic Freedoms in the U.S.

Advances in data science, machine learning and predictive technologies offer organizations working to protect democratic freedoms unprecedented opportunities to leverage data to achieve their mission, scale their work and help build a more just and equitable world. In response to recent discourse and policy in the U.S. targeting vulnerable groups and threatening our democratic institutions, we’re launching five new DataCorps projects and will be hosting a DataDive focused on these important issues.  

With the support of the Omidyar Network, Knight Foundation, the James Irvine Foundation, and other funders, DataKind will be partnering with inspiring groups across the country to help combat online harassment, improve resources for undocumented immigrants, protect environmental government data, ensure fair access to voting in California, detect deceptive speech from politicians and more. Read below to learn more about these long-term DataCorps projects as well as our upcoming November DataDive in New York, where we’ll be working with three organizations focused on promoting government transparency, affordable housing and combating extreme hate groups.  

Get involved!
Join us in supporting democratic freedoms and civil liberties in the U.S. East Coast volunteers, apply to help combat deceptive political speech with the Knight Foundation or register for our New York DataDive this November.

DataKind is currently looking for partners and donors to support the projects below and our democratic freedom work overall. Please contact to learn more. 

Data Infrastructure and AI To Combat Online Harassment

Online SOS
Volunteer team: Eric Chen, Ankit Gupta, Anil Muppalla, Daniel Pedraza, William Sheffel

Online harassment can take on many forms –  threats of violence, extortion, sexual harassment, impersonation, or releasing of personal details or intimate images – and often occur without consequence. Despite the internet’s ubiquity, not all online harassment activities may be criminal offenses and it can be difficult to navigate the dizzying array of statutes across jurisdictions. Social networks like Facebook and Twitter place the onus of reporting and proving abuse onto the abused, asking the victim to list incidences of past events that might be painful and triggering.

Online SOS is a team of mental health, business, and tech professionals concerned about the state of online spaces, particularly online harassment. They exist to provide help to those in crisis or recovery from online harassment as quickly as possible, with the least amount of emotional distress, through professional support. Online SOS is now working with a team of DataKind volunteers to scale their services by integrating AI and data science best practices to its existing chatbot. The chatbot, which functions as the frontlines of the intake process, would leverage AI to intelligently interpret incoming user requests and offer customized support. In addition, Online SOS is looking for a sustainable framework to store data, track client progress and interventions, and other programmatic information that would help them refine their offerings and understanding their effectiveness.  

Improving Access to Resources for Undocumented Immigrants

Immigration Advocates Network
Volunteer team: Audrey Ariss, Andrea Bonilla, Hans Fricke, Dora Heng, Rajesh K Metha, Clarissa Salazar

Of the 11 million undocumented immigrants in the United States, nearly 15% (more than 1.5 million) are likely eligible for existing immigration benefits; however, many are simply not aware that legal relief is available. 

Immigration Advocates Network (IAN) is a collaborative effort of leading immigrant rights organizations designed to increase access to justice for low-income immigrants and strengthen the capacity of organizations serving them. IAN promotes more effective and efficient communication, collaboration, and services among immigration advocates and organizations by providing free, easily accessible, and comprehensive online resources and tools. A core program of IAN is (immi) – an online plain-language screening tool and information hub in English and Spanish. Immi helps immigrants understand if they qualify for existing immigration options, such as asylum, family-based immigration, relief for victims of crimes, and more. It also helps immigrants understand their rights, become better consumers of legal services, avoid fraud, and find free or low-cost legal assistance from trusted nonprofits.

IAN is now working with its volunteer team to use annonymous data from the screening tool, website, and other sources alongside demographic and census data to better understand the impact of immi and more effectively outreach to immigrant communities. This will be accomplished by identifying opportunities to optimize user paths and assessing where there are gaps in access to resources for immigrants.

Filtering and Classification to Automate the Protection of Environmental Government Data

Environmental Data Governance Initiative
Volunteer team: Karthik Balasubramanian, Ryan Connor, Ryan Coughlin, Anuja Kelkar, Anna FitzMaurice, Taimur Sajid, Abhinandan Seshadri

United States federal environmental and energy policies, information, and data are facing threats of revision, reduction, and removal. Eliminating or weakening the infrastructure that houses the data and that publicizes the policies has a profound impact on the public’s “right to know,” the United States’s scientific leadership, and many measures of environmental protection.

The Environmental Data Governance Initiative (EDGI) is a network of academics and nonprofits building resources and capacity to proactively archive this vital data and ensure sustained public accessibility to federal energy and environmental information. EDGI has built a tremendous network of volunteers that analyze millions of web pages for daily changes and contribute to longitudinal data tracking and analysis. While EDGI’s current processes have yielded impressive results, they are difficult to scale. EDGI is now working with its volunteer team to to build a filtering system using machine learning that can automate change tagging and classification, greatly increasing EDGI’s efficiency. As a second deliverable, the team will create a dashboard or simple reporting tool to help the organization quickly identify and communicate key statistics to stakeholders.


Help Ensure Fair Access To Voting In California

California Civic Engagement Project
Volunteer team: Frances Lu, Tom Marthaler, Quynh Nguyen, John Zenk, Carol Zhang 

In September 2016, the California State Senate passed SB450, the California Voter’s Choice Act. This legislation allows select counties in 2018 to choose to adopt a new voting system. This new model enables counties to mail every registered voter a vote-by-mail ballot which the voter can mail in, drop off at a secure ballot box, or drop off at a newly established Vote Center. At Vote Centers, voters can cast their ballots in person, drop off their vote-by-mail ballots, access same-day voter registration, receive replacement ballots, and access additional services.

The goal of this project is to minimize the difficulty for election officials in selecting equitable locations for vote centers and vote-by-mail drop boxes, and to reduce the likelihood that historically disenfranchised groups are not disproportionately negatively affected by where vote centers are placed.

California Civic Engagement Project (CCEP), housed at the UC Davis Center for Regional Change, is collaborating with DataKind to ensure that historically disenfranchised groups are not disproportionately negatively affected by a new voting policy in California. They are collaborating with DataKind’s volunteers to create a user-friendly tool to help government officials locate the best areas to place vote-by-mail drop boxes and vote centers to ensure there is fair access to voting for all, identifying areas where vote-by-mail drop boxes and vote centers would have the most success (i.e., attract the most voters while minimizing any potential disenfranchisement).


Use Deep Learning to Identify Deceptive or Uninformed Political Statements

Duke Reporters’ Lab
Currently recruiting for East Coast volunteers! Apply >

News organizations like PolitiFact,, and The Washington Post routinely fact-check political speeches and advertisements, but the time-consuming reporting process means journalists and voters cannot immediately tell when a politician is uninformed or deceiving constituents.

While completely automated fact-checking is still a work in progress, multiple subject matter experts we’ve consulted have said that automating the process for identifying patterns of deceptive speech is a step in that direction. That’s why DataKind is partnering with the Duke Reporters’ Lab and the Knight Foundation to use the power of predictive technology to accelerate the fact-checking process in service of combating deceptive speech from politicians. We’re now recruiting a DataCorps team to build a deep learning model that can score a speech or advertisement on deception, misinformation and uninformed statements to help journalists and organizations prioritize what to fact-check so that they can more quickly inform the public about a candidate’s deceptive or uninformed behavior and ultimately hold candidates immediately accountable for what they say.

Interested in joining the team?  Learn more and apply >


New York, join us for a weekend DataDive!

DataKind will be hosting its second DataDive in New York this year November 10-12. Held in partnership with Omidyar Network and the Knight Foundation, with support from Bloomberg, DataRobot, and American Airlines, the DataDive will focus on promoting democratic freedoms in the U.S. Work alongside fellow New York City data do-gooders and experts from the organizations below:

  • Center for Responsive Politics, Financial Disclosures
    Develop a process to identify nuanced patterns in politicians’ personal financial disclosures, making this information more valuable to political journalists and public watchdogs who track these activities.

  • Center for Responsive Politics, Political Ads
    Track when special interests purchase political ads in order to create an open, public tracking tool for newsrooms around the country.

  • Los Angeles Mayor’s Office
    Identify illegal housing conversions in service of protecting vulnerable tenants. 
  • Southern Poverty Law Center
    Develop a methodology to measure volume and interest in hate content online and monitor how effectively hate sites are exploiting Google Search.

All skill levels welcome – just bring your laptop, maybe a friend, and desire to make an impact. RSVP today >

Source: DataKind – Data Science Protecting Democratic Freedoms in the U.S.

Mapping Lawful Permanent Residents with the Immigrant Legal Resource Center

Guest blog by Ozzie Liu, DataKind DataCorps Volunteer & Senior Analyst at Casper 

I recently volunteered with DataKind and had the chance to lead a team of fellow volunteer data scientists on a six-month long pro-bono project for the Immigrant Legal Resource Center’s (ILRC) New Americans Campaign to analyze and visualize the demographics and geography of lawful permanent residents eligible for naturalization, allowing Campaign partners to be more effective in their outreach and naturalization work.

I’m always interested in using my data science skills to make a difference in the community around me. So when I heard about DataKind’s mission to harness the power of data with impactful projects to solve social and humanitarian problems, I knew I wanted to be part of it! In the fall of 2016, I had the opportunity to work on a project with the ILRC to help immigrant communities pursuing U.S. citizenship receive legal assistance to aid in the process. As a second generation immigrant, this topic is near and dear to my heart. I have family members that have been long time green card holders but never took the step in applying for citizenship, which means  they are missing out on possible civic engagement and additional benefits. Moreover, the new administration’s stance against immigrant groups has made this work even more urgent.

The Partners: ILRC and New Americans Campaign

The Immigrant Legal Resource Center (ILRC) works with immigrants, community organizations, legal professionals, law enforcement, and policy makers to build a democratic society that values diversity and the rights of all people. The ILRC’s mission is to protect and defend the fundamental rights of immigrant families and communities. Led by the ILRC, the New Americans Campaign is a diverse nonpartisan national network of respected immigration organizations, legal services providers, faith-based organizations, immigrant rights groups, foundations and community leaders. The Campaign transforms the way aspiring citizens navigate the path to becoming new Americans. It is committed to connecting lawful permanent residents (LPRs) to trusted legal assistance and critical information that simplifies the naturalization process.

The Goal

The ILRC wanted to be able to visualize the geographic and demographic makeup of LPRs that are eligible for naturalization so that New Americans Campaign partners could identify those that might require the Campaign’s assistance.

The Data

There is a good amount of data around immigrants and LPRs in the U.S. including sources such as:

There are also several research organizations who work extensively with this immigrant population, and their work was tremendously helpful to our project. These partners include:

Although much of the data around immigrants and LPRs is public and available to the ILRC and its partners, it is not easy to navigate the many disparate websites or download large spreadsheets of numbers. We wanted to help make this information more accessible to the organizations that are serving this immigrant community.

Our team leveraged the excellent research that has already been done by CMS and CSII, which estimates the demographic and characteristics of immigrant populations at a detailed geographic level using PUMA. We then compared this with the New Americans Campaign’s internal data to gauge the effectiveness of past outreach and to find new opportunities for local partners.

Visualizing LPRs

After cleaning and wrangling the data, we started the visualization process. We wanted to develop an easy-to-use map that would show the appropriate characteristics of lawful permanent residents in corresponding PUMA area. Once we converted our data with PUMA’s shape files into a GeoJSON, we were able to create a visualization using D3.js. The resulting map, shown above at the start of this post, depicts the number of lawful permanent residents (LPRs) by PUMA. Locations in which there are brighter colors represent more LPRs.

Next, we attempted to add functionality and usability to the map using an off-the-shelf platform such as CARTO:

Interactive map of Spanish speakers and education levels using CARTO

We extensively tested tools such as CARTO, Mapbox, and Tableau that generated great looking visualizations, but we were concerned about the maintenance cost to the partner and the limitation of free tier levels that require opening up the Campaign’s internal data to the public. With help from DataKind’s in-house team of data scientists, we were able to develop a fully functional web app that uses Leaflet.js to serve as an interactive map that looks great and is flexible.

Interactive map and tool showing LPRs, in the NYC metro area, with a Bachelor degree or higher

Above is the final version of the app that we provided to the ILRC and New Americans Campaign partners. The left hand side is an interactive map that shows the raw number of LPRs given a selected characteristic by PUMA areas. On the right is a detailed view of each characteristic of the selected areas.

Wrapping up

After the project was completed, I had the chance to join Campaign partners in Chicago for the national New Americans Campaign Conference, where I led three workshops to unveil and demo the tool we had developed to all the attendees. I encouraged everyone to get out their laptops and smartphones, and actually play with the map as I presented the features. We then discussed the ways this tool could potentially help Campaign partners be more effective.

Me leading a live demo of the tool at the 2017 New Americans Campaign Conference in Chicago

Potential Impact

The feedback from the partners after the workshops was overwhelmingly positive (I even got some hugs). For some, this was the first time that the data was presented in a way that they could both understand and immediately use.

Together we brainstormed some of the ways that this tool could help the partners:

  • Now that areas where certain demographics and characteristics were once unclear, were now apparent, more targeted and effective outreach could be planned. One partner in Florida had planned to make a 6-hour roundtrip drive, twice a week, to be able to reach out to a specific immigrant group in the area. After using the map at the workshop, he discovered that a population with the same demographics he wanted to reach out to existed just 30 minutes away from his office.
  • Partners could use this tool to better prepare for offsite events or office visits, target outreach and plan necessary resources. For example:
    • Partners would be able to know in advance if they are serving an area where there is not a high-level of fluency in English, so that they could have translators on site.
    • Partners could identify LPRs that are younger and more computer savvy, and introduce them to Citizenshipworks, a “TurboTax” like program that can expedite the naturalization application form-filling process.
    • Partners could find LPRs that have lower income, as they may qualify for a fee-waiver.
    • Partners could develop more strategic planning to expand to new areas and seek funding.

It’s been a wonderful experience for our DataCorps team to work with the ILRC and I am very excited to see how the New Americans Campaign partners will use our tool to advance their work!

Source: DataKind – Mapping Lawful Permanent Residents with the Immigrant Legal Resource Center

An Open Source Tool for Disaster Relief

(Source for Above Map)

During their March DataDive, DataKind DC and Catholic Charities USA (CCUSA) partnered to create a map to help support the organization’s efforts  for disaster assistance to individuals and families before, during, and after a tragedy hits.  

CCUSA provides the holistic and compassionate care needed to help individuals impacted by disaster recover and move forward. They wanted to create a map to better target mitigation, preparedness, relief and recovery projects in order to best serve communities that are both at greatest risk for disasters, most overlooked, and/or are ineligible for FEMA assistance.

DataKind DC used the Center for Disease Control and Prevention’s (CDC) Social Vulnerability Index and a proprietary natural disaster dataset (generously donated by ATTOM Data Solutions) to develop this map using Mapbox, R, and D3.js. Version 1.0 was released this past August, just in time to help understand vulnerable populations in the wake of Hurricane Harvey.

Understanding Vulnerable Populations and Hurricane Harvey

The map showed that Houston, along with many of the counties in Hurricane Harvey’s path are socially vulnerable, in particular the eight counties along the coastline where Texas issued mandatory evacuations. Every single one of these counties were found to be at least in the 80th percentile for Crowding as well as the 80th percentile for Speak English Less than Well, meaning that these counties are more vulnerable than at least 80 percent of other counties across the United States with respect to each of those two categories.

The majority of the population in most of the counties are also primarily Minorities, Single Parent Households, and Aged 17 and Under, again in at least the 80th percentile for each of these categories/demographics. In addition, Aransas County is in the 96th percentile for populations Aged 65 and Over and the 93rd percentile for Disability. Understanding these percentiles helps local relief agencies provide the assistance needed for their particular community. For example, if the community is in a high percentile for Speak English Less than Well, it is critical to identify the primary languages and find volunteers who are fluent in those languages for any community outreach.

Houston shows similar vulnerable areas, but many of the census tracts within Harris County are particularly vulnerable with Socioeconomic Status and Minority Status/Language. 34 census tracts are in the 90th percentile for at least 9 of the 15 different categories of vulnerability. The most vulnerable census tracts are in extremely high percentiles for the following categories: Aged 17 and Under, Below Poverty*, No High School Diploma, Minority, and Crowding.

While the coastal counties seem to be especially vulnerable when it comes to transportation and communication concerns, Houston has many people that will be susceptible to post-hurricane issues, such as receiving aid for rebuilding their homes and returning to their lives.

*Below Poverty is the percentile of persons below the poverty line. A higher percentile indicates a greater number of people below the poverty line.

Our Partners

Working with CCUSA has been a wonderful and inspirational experience. The map has already been used to help local relief agencies better identify and understand social vulnerability in their communities to inform planning around disaster response. With the unforgiving weather and devastating events the world has experienced in this year alone, from the earthquakes in Mexico to the hurricanes in the Atlantic and the wildfires in the West, we’re hoping that this open source tool will be able to help the CCUSA and other organizations reach and support even more individuals and communities in need.

In the aftermath of Hurricane Maria, local relief agencies wanted to use this map to help efforts in Puerto Rico as well. Thanks to a quick response from the team and our partners at MapBox, we were able to update the map in a few hours to include coverage of Puerto Rico. It is projects like this, with the potential to save lives and improve welfare, that really drive and motivate our volunteers to be their best.

If you’d like to support the communities impacted by Hurricanes Irma, Harvey and Maria, donate here. A hundred percent of the funds raised will go directly towards disaster relief efforts.


Source: DataKind – An Open Source Tool for Disaster Relief

DataKind Singapore's DataDive on Philanthropic Giving and Volunteerism

Hot on the heels of our April DataDive, DataKind Singapore hosted its third DataDive in late August for National Volunteer & Philanthropy Centre (NVPC). We were a cozy group of around 30 volunteers who worked together to help NVPC gather actionable insights for its platform.

About National Volunteer & Philanthropy Centre (NVPC)

NVPC promotes a giving culture in Singapore by inspiring more volunteerism and philanthropy. NVPC hosts, a one-stop platform for local nonprofits and those looking to give back. On, organizations create campaigns to raise funds or recruit volunteers, while donors can find ways to contribute to causes they care about and volunteers can find meaningful ways to donate their time and skills.

Choy Yee Mun, Assistant Director of NVPC says, “the group of DataKind volunteers have been one of the most passionate and professional group of volunteers I have met. They have provided us with numerous useful and actionable insights and recommendations, which we will either be implementing or continuing in further projects with DataKind.”

Analyzing Browsing History

Previously, NVPC had created donor profiles and volunteer profiles based on user accounts. In the DataDive, we hoped to use Google Analytics data to develop personas based on browsing history for website visitors that have not created an account to volunteer or donate.

We looked at the browsing history of visitors with user accounts to see if the causes they browsed on matched those they had declared in their user profiles, but we didn’t find much overlap. One idea for future analysis post-DataDive would be to infer cause preference based on the number of cause pages clicked and the amount of time spent on those pages. With that information, we could derive which segments are interested in which causes. 

We also found a few thousand cases where users that had initially created a account and volunteered or donated would then revisit the platform within three months, but forget their password and stop using the platform entirely. One actionable insight for NVPC to consider is re-engaging these affected users.

By analyzing visitors’ browsing paths between the pages, we found that most visitors tend to follow the same paths from the homepage to the volunteer page. An onboarding process could be introduced to guide new users step by step and propose other possible user journeys on the website. We also found that ~75% of the users re-login after 16 days, so this might be a potential insight for electronic direct mail marketing.

Finally, we have also made recommendations on future Google Analytics data to extract for analysis.

Analyzing Campaign Effectiveness

NVPC had also wanted to look into factors associated with campaign successes so we reviewed all the data available on the various platforms. We tried to focus our efforts on three areas – organization reputation, campaign descriptions, and individual users’ browsing history from Google Analytics. 

We found interesting relationships between certain campaign variables and their eventual success. In campaigns where a personal story was shared, we found that stories related to hospices, elderly care and disabilities (topics 6 and 12 in the chart above) tend to receive a larger donation.

In addition, campaigns with customized impact messages tend to draw a higher median donation. This may be because custom messages give donors a more specific and transparent view of how exactly their donation will be used.

We also looked at the effect of the campaign duration, but found some inconsistencies in the available data. An idea for future work with NVPC could be to build models that could predict the success of a campaign prior to its launch so campaign creators could build and enhance it to achieve their desired funding goal.


It’s A Wrap! Get Involved with DataKind Singapore

A huge thanks to everyone that participated in the DataDive helping NVPC better engage potential donors and volunteers. If you’re local, we’d love to see you at the next DataDive or Meetup. Sign up to get involved!

Source: DataKind – DataKind Singapore’s DataDive on Philanthropic Giving and Volunteerism