Ingredients of a Thriving Chapter

By the DataKind Bangalore team

Happy New Year from DataKind Bangalore! As we head into 2017 and our third year as a chapter, we’ve been reflecting on the successes of 2016 and how much our community of over 1200 volunteers and project partners has accomplished together. But what makes a successful DataKind Chapter? For us, there are a few key ingredients. Check out highlights below and get excited for the year ahead!

1 – Volunteers That Embody Our Values

Volunteers are at the center of DataKind’s work. DataKind Bangalore is entirely volunteer-led, supported by a team of committed and talented people that exemplify DataKind’s values. Because they are always going above and beyond, we created the monthly DataKind Bangalore Awards to recognize their specific contributions. Get inspired by our November and December winners!

Chetana Amancharla
A Senior Technology Architect at Infosys, Chetana works on application development, software process engineering and program management. Chetana has been an incredible addition to the DataCorps team for Centre for Budget and Governance Accountability. She has been building and refining various data visualizations for the tool, polishing our user interface with her great eye for design and detail. And she does all of this on top of her career and Saturday classes, all while taking care of her 8-year-old son. Her knowledge, expertise and commitment is truly an inspiration for the whole community.

Sahil Maheshwari
Diversity and Expertise
An Engineer and MBA, a few minutes conversing with Sahil is enough for anyone to realize that he is an expert data scientist. With his wide ranging knowledge in statistics and probability, he has been instrumental in the eGovs DataCorps project. A fast learner, he’s also generous in sharing his knowledge and gave a workshop on statistics for the Chapter. His motivation to try out new things inspires all of us to do the same. We’re grateful to have someone with such a rich skillset and rich love of learning with us.

Suchismita Naik
An engineer-turned-designer, Suchismita has been leading design work for our CBGA’s DataCorps project. She exemplifies a great passion and commitment to the work and is always ready to try out those last-minute design suggestions (No matter how cumbersome they seem!) Apart from being the creative brain of the project, she brings great enthusiasm and vigor to the team, making her a fun and energizing teammate to work with.

Murugesan Ramakrishnan
A consultant at Fractal Analytics, Murugesan is absolutely fantastic to work with. With an immense will to learn and almost limitless energy, he keeps the eGovernments DataCorps team moving full speed ahead. He’ll git at 2am or on weekdays, blowing us away by how much he accomplishes in addition to his demanding job.


2 – High Impact Project Partners

Partner organizations are our vehicle for impact so we depend on their subject matter expertise to inform our volunteers work. We’ve had the honor of working with many incredible organizations this past year, but we’re especially excited to launch two long-term DataCorps projects in 2016 that will be wrapping up soon:

Centre for Budget and Governance Accountability (CBGA)
is a civil society organization that promotes transparent, accountable, and participatory governance, and a people-centered perspective in the preparation and implementation of budgets. CBGA has been building Open Budgets India, a data portal to make India’s budgets open, usable and easy to comprehend. The DataKind Bangalore team is co-creating a Story Generator Tool that helps users browse visualizations across various state-level fiscal indicators and schemes. The project is still in progress and the beta version of the tool is expected to launch in February.
Check out the source code and documentation >

eGovernments Foundation transforms urban governance with the use of scalable and replicable technology solutions. Using four years of data from the Chennai municipal corporation’s public grievance portal, we hope to build a problem forecasting and alerting system to predict trends and generate alerts at ward levels for better urban governance.
Check out the source code and documentation >


3 – A Community of Learning

Any good data scientist or social innovator embraces continuous learning, which is why we were excited to launch DataLearn –  a series of of talks, workshops and discussions that brought together some of the best names in the data science and social good community.

From creative hacks of Machine Learning – which viewed machine learning and artificial intelligence through the lens of creative subversion to Data Visualization and Storytelling with Data to the open data environment in India and ethics, we covered a variety of topics. We also hosted skill-building workshops, including statistical analysis with R, exploring data with pandas, text mining and Natural Language Processing and web scraping with R.

And true to our word about sharing learnings, we recorded many of these talks!

Check out our YouTube video series to learn more >

And The Last Ingredient? You!

In 2017, we are looking forward to exciting collaborations with more project partners, more values-driven volunteers and learning even more with our community, but we need you to make it a success! Stay tuned for more DataLearn sessions on Bayesian statistics and inference, time series modeling, developments in Deep Learning and more, as well as DataDives and collaborations with NGOs in interesting domains.

Join our Meetup to get involved! >

Follow us on Facebook and Twitter for updates and announcements!

Source: DataKind – Ingredients of a Thriving Chapter

A Big Welcome to DataKind’s Newest Board Member!

We’re thrilled to announce the addition of Elizabeth Grossman to DataKind’s esteemed Board of Directors, a team of top minds and dedicated champions in the Data for Good movement.

Director of Civic Projects in the Technology and Civic Engagement group at Microsoft Corporation, Elizabeth helps design and execute long-term, strategic partnerships for Microsoft that leverage technology to make a sustainable and scalable impact on local and global civic priorities. She has also worked on policy and societal impacts of emerging technologies and governmental science and research program design with universities and scientific societies as well as at the U.S. House of Representatives Committee on Science and the National Academy of Sciences. 

A longtime friend, collaborator and supporter of DataKind, Elizabeth worked with us on the very first DataKind Labs projects to advance the Vision Zero movement, to reduce traffic-related deaths and severe injuries to zero, in three U.S. cities – New York, Seattle and New Orleans.

Her knowledge and expertise in areas such as civic engagement, partnership design, smarter and more sustainable cities, research and technology policy, data sharing and government ecosystems will be indispensable in helping further DataKind’s work and mission, particularly on larger, civic and sector-wide projects like Vision Zero.

With the guidance of our devoted Board of Directors, now five-strong with Elizabeth, and the help of our talented and amazing volunteer community, DataKind finds itself approaching another phase of growth; with more staff, increased chapter engagement, and a thriving volunteer network – all paving the way for more projects and opportunities to harness the power of data science in the service of humanity.

Please take a minute to join us in congratulating Elizabeth and officially welcoming her to DataKind!

Source: DataKind – A Big Welcome to DataKind’s Newest Board Member!

A look back at DataKind UK's 2016

We’re sure many of you are looking forward to the festive season and waving goodbye (and good riddance) to 2016. Here at DataKind UK, we’d like to take a moment to reflect and appreciate all the good stuff that happened this year.

2016 was all about growth and impact. We doubled our number of staff by welcoming Lauren Smith as our Project & Events Coordinator and grew our brilliant team of Chapter Leaders with Kate Vang, Billy Wong and Gianfranco Cecconi joining Rishi Kumar. We got much smarter at selecting and scoping projects, as well as testing new event formats. We ran one-day DataDives for single projects. We experimented with DataJams – a day of data wrangling to better understand the data at hand. Behind the scenes, we’ve been working in partnership with Data Orchard to survey 200 UK charities and social enterprises, interviewing 12 of them to produce a data maturity framework that will be launched in 2017.


We’re pleased to have partnered on projects with the following organisations over the last year. We also provided light touch advice and support to a further 15 charities and social enterprises.


2016 Events Roundup

We’ve had a packed calendar of events from DataDives to DataJams. Find out more below!


DataDive: Cafedirect Producers’ Foundation



Meetup: When the rubber hits the road:

the highs and lows of small data


Workshop: Data Evolution London Workshop            

Workshop: Data Evolution Hereford Workshop


Meetup: When good algorithms go bad…

DataDive: Shared Assets & the Ecological Land Co-operative

blog post here


DataJam: National Council for Voluntary Organisations

Meetup: Data-for-good Summer Social



DataDive: Autumn DataDive




Meetup: data+visual

DataDive: Marks and Spencer

(Internal event for analysts and their charity partners)

blog post here


DataJam and DataDive: DataDiving into

Company Ownership with Global Witness

blog post here



Meetup: Who owns UK companies?   





Project Highlights


  • During a one day DataDive, the Cafedirect Producers’ Foundation (CPF) sought to better understand the smallholder famers they support. For example, the volunteer data scientists showed which factors correlate with higher incomes and how farmers adopt different agricultural practices and innovate. CPF continued working with one of our volunteers on a consultancy basis and they are now figuring out how to empower smallholder farmers to use their own data to inform their businesses.
  • Shared Assets are developing the prototype we produced at a DataDive with our friends over at Outlandish. They are building a platform to explore UK land data because good information on land is crucial to making good decisions about it. Many common good land users struggle to access the information they need e.g. who owns the land, what has it been used for, or where to find new project sites. The prototype pulls together dozens of open data sets enabling common good land users to identify and compare different sites on a range of characteristics, saving them time and money while helping them to make smarter, data-informed decisions.
  • Global Witness managed to get three separate organisations (Open Corporates, OCCRP and the Spend Network) to bring data to a DataDive in November. 50 data scientists descended on the newly released beneficial ownership data showing, for example, that thousands of UK companies are owned by other companies in tax havens and some of these tax-haven-owned companies are in receipt of government contracts.

Things we’re excited about in 2017


  • We’re busy prepping for a DataDive with the NSPCC next year in partnership with Credit Suisse (huge thank you to Ben Wilkinson at Credit Suisse for his personal donation to support this work).
  • Watch out Newport – we’re headed your way. We’ll be DataDiving with the Office of National Statistics next year.
  • There’s an exciting schedule of monthly Meetups starting on 24th January – save the date and sign up to our Meetup page to find out more.
  • We’ll be launching an organisational data maturity model for the social sector that we’ve developed with Data Orchard.
  • Plus we’ve got a couple of DataCorps projects up our sleeves. Volunteers will be needed – watch this space!


Source: DataKind – A look back at DataKind UK’s 2016

DataDiving with Marks & Spencer

Running DataDives is part of DataKind’s DNA. However, over the years, we have experimented with different formats and formulas. At DataKind UK, we’ve been partnering with Marks & Spencer, a UK food and clothing retailer, to run internal DataDives with them for three years running.

Similar to other DataDives, the event involves bringing together groups of volunteer data scientists and selected charities to work on data-for-good projects. Unlike other DataDives though, the events are not open to the public. Rather, they are attended by M&S’s internal data analyst community and invitations are extended to some of their suppliers.

On Thursday, October 27th and Friday, October 28th this year, 40 Marks & Spencer data analysts came together to help three fantastic charities: Oasis Community Learning, Shelter, and the Welcome Centre. After much coffee and number crunching, the assembled brain power produced some spectacular results. 

Check out highlights of findings from each project below!

Oasis Community Learning

“…It is the best level of human resources analytics that Oasis Community Learning has ever had and it is great to see the educational impact being clearly mapped to the turnover of our staff.”    
John Barnaby, Chief Operating Officer, Oasis Community Learning

Oasis Community Learning is one of the top three Academy providers in the UK with 47 schools across primary, secondary and 6th form serving 22,000 students with 4,300 teachers. Oasis wanted to look at their human resources data to understand staff turnover and absence, and what this means for pupil performance.

The volunteer analysts found that staff turnover was higher in schools with students that have special educational needs and English as an additional language. They found that primary schools spend twice the amount per day to cover staff absence compared to secondary schools. The analysts also found that primary schools tend to underestimate these absence costs. While these are all provisional findings that require further analysis, the DataDive enabled Oasis to see these patterns in their HR data for the first time and Oasis has accelerated their plans to become more data-driven.


“…The DataDive has equipped us with a set of ideas and insights that has helped to clarify which direction to head in to develop a deeper understanding of our clients...”
Dean Robinson, Business Systems & Analysis Manager, Shelter   

Shelter helps millions of people every year struggling with bad housing or homelessness through advice, support and legal services. They wanted to dig into their outcomes data to understand what happened to their clients. What is the result of their help? What are the changes for the client? How do these changes compare for different people accessing different services around the country?

The M&S analysts dived in and, in no time at all, whipped up an interactive dashboard to enable Shelter staff to explore these very questions. The volunteers looked at the number of hours Shelter staff spend delivering services in different parts of the country. They also explored the rate at which cases were dealt with. For example, those over age 65 are more likely to have their cases resolved than those aged between 25 to 34. Shelter’s business systems team was thrilled, and they have even started learning R, a data analytics programming language!

The Welcome Centre

“A very well organised and structured event, the outcomes of which will make a genuine difference to our organisation’s business processes…”
Andrew Tomlinson, Trustee, The Welcome Centre

The Welcome Centre is a food bank in Huddersfield and South Kirklees that supports people experiencing crisis through practical help. They wanted to understand their clients better and identify those who would benefit from additional support, advice and referral to other services. In particular, the Welcome Centre wanted to know who is most likely to become a repeat user of the food bank, as those individuals tend to need extra support.

The all-star team of pro bono analysts got to work and were able to find factors that predicted how likely it was that someone would need extra support. Based on a person’s age, their number of dependents, and the reason for their referral to the foodbank, we can begin to predict the kind of support they will need. The Welcome Centre is looking to develop the model further so they can identify who needs support earlier, what future demand for the service might be, and to test hypotheses for which interventions work best with which clients. 

A huge thank you to Marks & Spencer’s Plan A team and to Pete Williams, Head of Enterprise Analytics at M&S, for driving another successful DataDive. We look forward to next year’s!

Source: DataKind – DataDiving with Marks & Spencer

Deep Learning Summer School 2016 Videos

Deep Learning Summer School, Montreal 2016 is aimed at graduate students and industrial engineers and researchers who already have some basic knowledge of machine learning (and possibly but not necessarily of deep learning) and wish to learn more about this rapidly growing field of research. If that is you, there are plenty of videos to help you learn more.

Source: 101 DS – Deep Learning Summer School 2016 Videos

Predicting Wheat Rust in Ethiopia with the Bill & Melinda Gates Foundation

Cultivated by about five million households, wheat is an important crop in Ethiopia as both a source of income for small farmers and a source of food and nutrition for millions of Ethiopians. Despite the country’s huge potential to grow wheat, the average wheat productivity of 2.5 tonnes per hectare is lower than the global average of 3 tonnes per hectare. This is due in part to recurrent outbreaks of a fungal disease called wheat rust that causes devastating pre-harvest losses.

Several international development agencies have been supporting scientists to study the spread of wheat rust as part of their efforts to increase agricultural productivity and reduce hunger and poverty for millions of farming families in Sub-Saharan Africa. However, it can be challenging to even know where wheat rust croplands are located in Ethiopia, as the field survey data that exists is incomplete and costly to collect.

Given advances in satellite imagery, we wondered – is it possible to detect wheat rust from space so that an early warning system could be developed to predict and prevent future outbreaks?

Last August, we held a DataDive with the Bill & Melinda Gates Foundation to tackle this question and more. Using a combination of survey data, remote sensing data and satellite imagery, a DataDive volunteer team was able to develop a proof of concept statistical model using survey data to distinguish severe yellow rust from no rust (of any type) with about 82% accuracy. A model like this could enable governments, funding agencies and researchers to better detect the spread of the disease and evolution of new strains of pathogens, and more quickly deploy protective measures to help farmers and their communities.

We’re pleased to announce we’re continuing our work with the Bill & Melinda Gates Foundation and will be kicking off a long-term multi-phase project to develop a more accurate predictive model using a combination of satellite imagery, multispectral imaging and computer vision techniques. The goal of the first phase of the project is to find a way to automatically detect wheat cropland.

Satellite Imagery Experts, Join Us!

We’re looking for a team of volunteers, including satellite imagery and machine learning experts, to help work on this project over the next several months. If you have significant experience in these areas and would like to contribute, email Sina Kashuk, DataKind’s Data Scientist managing the project, at with details on your background. 

Source: DataKind – Predicting Wheat Rust in Ethiopia with the Bill & Melinda Gates Foundation

Get Involved – Monthly Roundup!

Eager to flex your data skills for good? Each month, we do a roundup of volunteer opportunities through DataKind and other organizations around the world!

Don’t see anything in your area? Check out DataLook’s definitive guide to doing data science for good and our Data4Good Kit for help getting started.

DataKind Opportunities 

Satellite Imagery Volunteers – We need your help on our newest project launching in December with the Bill & Melinda Gates Foundation! Email to get involved.

Web Developer Volunteers – We need a front-end web developer to help on one of our DataCorps projects. Email to get involved.

SAVE THE DATE! DataDive – March 3-5, New York City
New York – it’s high time for a DataDive! RSVPs to open in January – stay tuned.

SAVE THE DATE! DataDive – April 28-30, North Carolina
We’re co-hosting our first ever North Carolina DataDive. RSVPs to open in February – more details soon!

Upcoming Events and Conferences 

DataKind’s Jake Porway at Stanford Social innovation Review (SSIR) Data on Purpose/Do Good Data conference – Feb 7-8, Stanford, CA
Join Jake and other data experts, academics, practitioners, and social sector leaders for two days of skillfully-led sessions on topics ranging from aligning practice with policy to creating a culture of data, and how Silicon Valley is facilitating data practices in civil society.
Learn more >

Beyond DataKind – Our Top Picks To Get Involved 

Data Science for Good: Support America’s Warrior Partnership – Dec 9, College Park, MD
Join Immuta for a Hackathon to support the America’s Warrior Partnership (AWP). AWP works to help communities to empower veterans by providing a community based program offering a proactive approach to serving Veterans. Bring your data skills and get ready to dive into datasets to assist AWP in forwarding its vision and goals. Help AWP organizations effectively find Veterans in an area, identify factors that lead to more successful outcomes for Veterans, better predict needs for follow up actions, determine the probability of success related to various services, and help prevent homelesenses.
Sign up >

beyond.uptake Data Fellows Program – Dec 9 Deadline
Social enterprises are attacking some of the biggest problems in the world. But there is a lack of professional development and mentoring for data professionals at social enterprises. To help, beyond.uptake has introduced a four-month Data Fellows Program designed to connect data leaders in nonprofits to experts in data science; providing them with the opportunity to hone their data skills and network with like-minded data for good professionals. Apply now!
Apply >

Become a Data & Society Fellow! – Dec 19 Deadline
Our friends at Data & Society are assembling its fourth class of fellows to further its mission of producing rigorous research that can have impact, and supporting and connecting the young but growing field of actors working on the social, cultural, and political effects of data.
Apply >

IBM Watson AI XPRIZE – Jan 19 Deadline
How can artificial intelligence solve the world’s grandest challenges in health, learning, energy, exploration and global development? The IBM Watson AI XPRIZE, a $5 million global competition to develop life-changing human + AI collaborations launched by IBMWatson and XPRIZE, aims to answer this question. Take the challenge!
Register >

The Measured Summit: Measuring the Impact of Social Design on Human Health – Jan 24, New York, NY
Does human centered design lead to better health outcomes? Does it make patients smarter and more informed? Can it make health care companies more innovative and successful? Can it improve delivery of products and services? Find out at The Measured Summit. Join leaders in philanthropy, business, healthcare, research and design as they create a shared approach to understanding how design can become a more powerful tool for systems-level transformation.
Get Tickets >

Become a Data Science for Social Good Fellow! – Jan 31 Deadline
Another friend, University of Chicago’s Data Science for Social Good program, is now recruiting its next class of fellows. Join as a fellow, a mentor, a project manager, or partner!
Apply >

DrivenData Machine Learning Competitions (virtual) – Ongoing
Check out DrivenData’s online challenges, usually lasting 2-3 months, where a global community of data scientists competes to come up with the best statistical model for difficult predictive problems that make a difference.
Sign up >

Source: DataKind – Get Involved – Monthly Roundup!

How We Priced Our Book With An Experiment

How We Priced Our Book With An Experiment

27 May 2015 – Chicago

Summary: We conducted a large experiment to test pricing strategies for our book and came to some very surprising findings about allowing customers to pay what they wanted.

Specifically, we found strong evidence that we should let customers pay what they want, which would help us earn more money and more readers when compared with traditional pricing models. We hope our findings can inspire other authors, musicians and creators to look into pay-what-you-want pricing and run experiments of their own.

To get automatically notified about new posts, you can subscribe by clicking here. You can also subscribe via RSS to this blog to get updates.

Introduction: Pay What You Want?

My co-authors (Henry Wang, William Chen, Max Song) and I have been working on our book, The Data Science Handbook, for over a year now. Shortly before launch, we asked ourselves an important question that many authors face: how much should we charge for our book?

We had heard of Pay-What-You-Want (PWYW) models, where readers can purchase the book for any amount they want (or at least above a threshold you set). However, many authors and creators worry that only a small percentage of people will contribute in a PWYW pricing model, and that these contributors will opt for meager amounts in the $1-$5 range.

On the other hand, we also felt that PWYW was an exciting model to try. A PWYW model would allow us to get the book out to as many people as possible without putting the book behind a paywall. We also had an inkling that this experimental pricing model would increase exposure for our book.

So we set out to answer this simple question: how should we price our book?

As practicing statisticians and data scientists, we thought of no better way to decide this than to run a large-scale experiment. The following section details exactly what we tested and discovered.

TL;DR – Letting Customers Pay What They Want Wins the Day

We experimented with 7 different pricing models pre-launch, with our subscriber base of 5,700 people. In these 7 different models, we compared different pricing schemes, including fixed prices at $19 and $29, along with several Pay What You Want (PWYW) models with varying minimum amounts and suggested price points.

Before the experiment began, we had agreed to choose whichever variant maximized the two things we cared about: the total number of readers and net revenue (later on, we’ll explain how we prioritized the two).

Before conducting the experiment, we thought that setting a fixed price at $29, like a traditional book, would lead to the maximum revenue.

After we analyzed our results, to our surprise, we discovered strong statistical evidence that with a PWYW model for our book, we could significantly expand our readership (by 4x!) while earning at least as much revenue (and potentially even more) as either of the fixed-priced variants.

The Prices We Tested: Setting Up Our Experiment

On notation: throughout this post, PWYW models will be described as (Minimum Price/Suggested Price). Example. ($0/$19) means ($0 Minimum Price, $19 Suggested Price).

Through a sign-up page on our website, we’ve been continuously gathering email addresses of individuals interested in our book throughout the process of promoting the Data Science Handbook.

We conducted this pricing experiment before the official launch of the book by letting our 5,700 subscribers pre-order a special early release of the book. The following diagram shows our experimental setup:

experiment setup

We started the early release pre-order process on Monday, April 20th. We stopped the pre-orders one week later, so that we could analyze our results.

Through Gumroad, we tracked data on the number of people who landed on each link, whether they purchased, and how much they chose to pay.

Note: To guard against people buying the book who were not originally assigned to that bucket (for example, those who inadvertently stumbled across our links online), we filtered out all email addresses that purchased a book through a variant that they were not explicitly assigned to. This gave us more confidence in the rigor of our statistical analyses.

What We Found: Experiment Results

The roughly 800 users in each of our experimental buckets went through a funnel, where they clicked through the email to visit the purchase page, and then decided whether or not to purchase. We collected data on user behavior in this funnel, as well as the price they paid.

conversion funnel

For each of the experimental variants, we collected data on 6 key metrics:

  • Email CTR – # of people who clicked through to the purchase page / # of people who received the email. The emails were identical, minus the link and a short section about the price.
  • Conversion Rate – # of purchases / # of people who clicked through to the purchase page
  • Total Sales – # of sales, regardless of whether a reader paid $0 or $100
  • Net Revenue – Total revenue generated, minus fees from Gumroad
  • Mean Sales Price – Average sales price that people paid
  • Max Sales Price – Largest sales price paid in that bucket

Below, you’ll see some plots on how each pricing variant performed on each metric. Each of the seven circles represents a different pricing variant, with the area of the circle being proportional to the magnitude it represents. The larger the circle, the “better” that pricing variant did in terms of our metrics.

The blue circles are the variants that were fixed at $19 and $29. The orange circles are the PWYW variants.

The X-axis of the following plots describes the minimum prices we offered: free, $10, $19 (this was a fixed price), $20 and $29 (also fixed). The Y-axes are the prices we suggested when we were using a PWYW variant: $19 and $29.

pwyw vs fixed


Looking above, it’s no surprise at a PWYW model of ($0/$19) had the highest conversion rate (upper right plot), and as a result the greatest number of people who downloaded the book . After all, you can get it for free!

Much to our surprise, many of our readers who got this variant paid much more than $0. In fact, as you can see above in the “Mean Sales Price” plot in the bottom left corner, our average purchase price was about $9. Some readers even paid $30.

To examine the distribution of payments we received for each variant, we also examined the histogram of payments for each of the 5 PWYW variants:

sales distribution

It’s again no surprise to see a large chunk of purchases at the minimum. However, you can also see fairly sizable clumps of readers who pay amounts around $5, $10, $15 and $20 (and even some who paid in the $30-$50 range).

In fact, readers seemed to like paying amounts that were multiples of $5, perhaps because it represented a nice round number.

Surprising Insights on Pay What You Want

You Can Earn As Much from a PWYW model (and possibly more) as from a Fixed Price model

Traditional advice told us that we should price our book at a high, fixed price point, since people interested in advancing their careers will typically pay a premium for a book that helps them do exactly that.

However, our ($0/$19) variant was ranked second in total revenue generated (tying with a fixed price of $29).

net revenue

In fact, if anything, the data lends credence to the belief that you can earn even more from PWYW than from setting a fixed price.

What do we mean by that?

Well, our ($0/$19) variant actually made nearly twice as much money as fixing the price at $19. The difference in earnings was large, and is strong statistical evidence that our book would make more money if we made it free, and simply had a suggested price of $19, than if we had fixed the price at $19.[1]

This was an incredible result, since it suggested that with a PWYW model, we could generate the same amount of revenue as a fixed price model, while attracting 3-4x more readership!

Higher Suggested Price Didn’t Translate to Higher Average Payments. But…

The “suggested” price didn’t seem to have seem to have a large impact on the price people paid. Compare the mean purchase prices between $19 suggested and $29 suggested in both the $0 minimum variants and the $10 minimum variants.

mean sales price

As you can see, moving the suggested price from $19 to $29 in both cases increased average purchase price by only $1.

However, we don’t mean to imply the suggested price had zero effect. In fact, the data lends support to actually having a lower suggested price.

You can look to see what happened to conversion rates when we changed the suggested price from $19 to $29. In both cases we tested ($0 minimum and $10 minimum), a lower suggested price had a higher conversion rate, and drove ultimately more revenue.[2]

Therefore, it seems that even if the average sale is the same despite different suggested price, total sales increased when you have a lower suggested price. This is perhaps due to certain readers being turned off by a higher suggested price, even if they could get it for $0.

Just imagine seeing a piece of chocolate being offered for free, but having a suggested price of $100. You might scoff at the absurdly high suggested price and refuse the candy, despite being able to take it for nothing.

On the other hand, if you were offered the same scenario, but this time the free candy had a suggested price of just $0.25, you may see this as fair and be much more inclined to part with your quarter.

Try It Out For Yourself

We think that all of these findings should spur authors and creators to conduct testing on their own product pricing. Gumroad, our sales platform, makes it remarkably easy to create product variants, which you can email out to randomized batches of your followers. Or, you can use the suite of A/B testing tools to ensure that different visitors to your website receive different product links.

By doing so, you may discover that you could reach a larger audience, while also earning higher revenue.

[1] This result just missed the cutoff for statistical significance. The actual p-value comparing $0/$19 with a fixed $19 was 0.057, missing our threshold of 0.05 necessary to qualify as statistically significant. Nevertheless, the very low p-value is a strongly suggestive result in favor of a PWYW model.

[2] Beyond being practically significant, this was also statistically significant with a p-value close to 0.

If you want to be notified when my next article is published, subscribe by clicking here.

Source: Carl Shan – How We Priced Our Book With An Experiment

How Data Science Can Be Used For Social Good

How Data Science Can Be Used For Social Good

08 Jan 2015 – Chicago

To get automatically notified about new posts, you can subscribe by clicking here. You can also subscribe via RSS to this blog to get updates.


Give Directly

Credit: Google Images

In 2013 Kush Varshney, a researcher at IBM, signed up through a non profit called DataKind to volunteer his technical skills on assisting pro bono projects. DataKind’s flagship program, DataCorps, assembles teams of data scientists to partner with social organizations like governments, foundations or NGOs for three- to six-month collaborations to clean, analyze, visualize and otherwise use data to make the world a better place.

Kush, who holds a PhD in electrical engineering and computer science from MIT, was promptly contacted by DataKind to work on a project with GiveDirectly. He was joined by another team member, Brian Abelson, himself now a data scientist at an open data search company. The two of them were brought together to tackle a challenging problem for a non profit called GiveDirectly.

GiveDirectly conducts direct cash transfers to low-income families in Uganda and Kenya through mobile payments. These donations are given with no strings attached, trusting that the poor know how to best use the money effectively. One of the top-rated charities on GiveWell, GiveDirectly has had randomized controlled trials conducted evaluating the effectiveness of its approach, with strong positive results.

GiveDirectly’s model is to conduct direct cash transfers to villages with large number of residents in poverty. However, to assess which villages these are, the organization relied upon staff members to individually visit villages in Uganda and Kenya and assess the relative poverty of the inhabitants.

When I spoke with Kush he described some drawbacks of this method, saying, “This method could be costly in both time required to visit each site, and in using donations to help pay wages for inspections that could otherwise be going directly to the poor.”

Together with GiveDirectly, Kush and Brian sought a better way to accomplish this task.

Enter data science.

What Is Data Science?

Data Science Venn Diagram

Credit: Drew Conway – The Data Science Venn Diagram

Data science is an emerging discipline that combines techniques of computer science, statistics, mathematics, and other computational and quantitative disciplines to analyze large amounts of data for better decision making. The field arose in response to the fast growing amount of information and the need for computational tools to augment humans in understanding and using that data.

Rayid Ghani, Director of the Data Science for Social Good Fellowship and former Chief Scientist for Obama, noted that “the power of data science is typically harnessed in a spectrum with the following two extremes: helping humans in discovering new knowledge that can be used to inform decision making, or through automated predictive models that are plugged into operational systems and operate autonomously.” Put plainly, these two ways of using data can be summarized as turning data into knowledge, or converting data into action.

Chiefly responsible for wrangling findings and crafting models using the data is an emerging profession: the data scientist. The “scientist” portion of the title conjures a vision of academia, partially as a result of many data scientists holding advanced STEM degrees, but it also paints a false picture of a data scientist as someone holed up in the research lab of an organization tinkering away on esoteric questions. This view of the data scientist characterizes peering into the depths of “Big Data” in pursuit of knowledge.

Rayid debunks this myth, saying that “frequently, however, the challenge in data science is not the science, but rather the understanding and formulation of the problem; the knowledge of how to acquire and use the right data; and once all that work is done, how to operationalize the results of the entire process.” Accordingly, the real role of a data scientist should be thought of as much more embedded in the core of a company or non profit, directly shaping the scope and direction of the organization’s products and services.

The handiwork of data scientists can be found in a plethora of products we interact with every day. Facebook uses data from each visit to tailor the posts you see in your News Feed. Amazon takes account of what you’ve purchased to recommend other items for purchase. PayPal roots out fraudulent behavior by analyzing the data from seller-buyer transactions.

So far, most of the uses of data science have been towards business objectives. The technology, financial services and advertising industries are rife with opportunities to convert data into profit. But now, more and more innovative social sector organizations like GiveDirectly are catching on to how technology and data science can be used to solve their problems.

Organizations like Rayid’s Data Science for Social Good Fellowship, Y Combinator-backed nonprofit Bayes Impact, and DataKind are popping up to fund, train and deploy excellent data scientists to tackle pressing social issues.

Data Science In Action

In the case of GiveDirectly, Kush and Brian were tasked to use their computational data science skills to help discover where the poorest villages were located, so that donations could be channeled to households with the highest needs.

To do this, Kush and Brian used GiveDirectly’s knowledge that an indication of the poverty of a household is the type of roofing of their home. Kush told me that in Kenya, “poorer families tended to live in homes with thatched roofs. On the other hand, a home with a metal roof typically meant the family was well-to-do enough to purchase a more sturdy shelter.”

Thatched vs. Metal Roofs

Credit: GiveDirectly

Using this knowledge, Kush and Brian used Google Maps to extract satellite images of the various villages in Kenya and deployed an algorithm that used the coloring of the roof to determine whether it was made of metal or straw. Doing this across all of the houses in the village could gave an estimate of the level of poverty in that village.

In early 2014, GiveDirectly piloted this algorithm to detect poverty levels in 50 different villages in Kenya. It was doing so in one of its largest campaigns, moving $4 million to households all over western Kenya.

By employing Kush and Brian’s algorithm, GiveDirectly eliminated over 100 days of manual inspection of each village. Through doing so, over $4,000 was saved, allowing GiveDirectly to fund four more households.

Excited by the potential of data science playing a role in more effectively help families escape poverty, GiveDirectly is now discussing with Kush, Brian and DataKind to see how their algorithm can be used even more precisely, and scaled to additional villages.

Potential To Build The Future

As an increasing volume of information is generated by the world, there will be more opportunities to apply data science towards socially meaningful causes. What if we could help guidance counselors predict which students were the most likely to drop out, and then design to successful interventions around them? What if we improve parole decisions, reduce prison overcrowding and lower prison recidivism?

Examples of how data science can be applied to the social sector include:

  • Reduce crime and recidivism: Predictive modeling can be used to assess whether an inmate would be likely to reoffend, informing the parole decision.
  • Give tailored feedback and content to students: Adaptive tutoring software can be used to model how much students are learning and understanding, tailoring problems.
  • Spot nutrition deficiencies: Data tools can be built that monitor vitamin and mineral intake, warning users of deficiencies in their dietary and health habits.
  • Early prevention of shootings: Network-based analyses of gangs can be used to predict where and when future shootings will occur.
  • Diagnose diseases early on: Leveraging genetic, imaging, and EMR data to provide early diagnosis of diseases such as Parkinson’s, M.S., and Autism.

It’s clear that we can be optimistic about how data scientists can use the data at their fingertips for social good. As an emerging technological frontier, data science is in a position of immense potential. As a result, there is much to explore about how we can use it to push the human race forward.


Targeting direct cash transfers to the extremely poor (2014), Kush Varshney and Brian Abelson

I write about data science applied to social causes. If you want to be notified when my next post is published, subscribe by clicking here.

Source: Carl Shan – How Data Science Can Be Used For Social Good

Weeks 7-12: Summer Wrapup

Weeks 7-12: Summer Wrapup

13 October 2014 – Chicago

This is the final post in a series of posts chronicling my reflections on participating in the 2014 Data Science for Social Good Fellowship at the University of Chicago. While I had intended to post once a week, I ended up falling short of my goals. Work from DSSG piled up, making it tough to write thoughtul posts on a weekly schedule.

Nevertheless, I intend for this to be a wraup post that summarizes the work that my team and I did. Reading this will allow you to glean all the different experiences, learnings and findings I encountered over the summer.

You can read my last post here:

To get automatically notified about new posts, you can subscribe by clicking here. You can also subscribe via RSS to this blog to get updates.

Health Leads

“It is health that is real wealth and not pieces of gold and silver.”– Mahatma Gandhi


President Obama’s Affordable Care Act enacted broad reforms across the United States’ healthcare system. While the healthcare landscape has changed drastically, one important constant has remained the same: a person’s health is affected significantly by non-medical factors.

For example, a patient with an asthmatic condition caused by a moldy apartment will not be cured simply with better medicine. She needs a better apartment, and yet our health care system is not traditionally set up to handle these non-medical issues.

During this summer’s DSSG Fellowship, our team — Chris Bopp, Cindy Chen, Isaac McCreery, myself and mentor Young-Jin Kim — worked with a nonprofit called Health Leads to apply data science to address these non-medical needs, to help patients get access to basic resources vital for a healthy life.

Health Leads

In 1996, Harvard sophomore Rebecca Onie was a volunteer at Greater Boston Legal Services, assisting low-income clients with housing problems. She found herself speaking with clients facing health issues brought on by their poverty. Some lived in dilapidated apartments, infested with rodents and insects. Others couldn’t afford basic necessities like food. Modern medicine was largely ineffective against these issues. Doctors were trained to treat medical ills, not social ones.

Inspired by her experiences, Rebecca launched a health services nonprofit called Health Leads, which recruits and trains college students to work closely with patients referred by doctors who needed basic resources such as food, transportation, or housing. These college students, called “Advocates” in Health Leads lexicon, learn about each patient’s needs, and meticulously dig up resource providers — food banks, employment opportunities, childcare services — that can fulfill them.

In the nearly two decades since Health Leads’ inception, its impact on the health landscape has been tremendous. In 2013 alone, Health Leads Advocates worked with over 11,000 patients to connect them with basic services and resources.

The Problem

Serving a predominantly low-income patient population can pose a challenge for Health Leads. Some patients will lack stable, permanent housing or employment. Others may not own a cell phone on which they can be consistently reached. Health Leads noticed that these circumstances affected their work with some patients: despite Advocates’ best efforts, a proportion of their clients would disconnect from working with the program. These clients would be unreachable, not returning phone calls and ultimately Advocates would be forced to close their cases — never knowing if these clients received the basic resources they needed.

Below is an image displaying the phone calls made to a random group of 200 different patients and whether they responded or not. Half of the clients worked with Health Leads through the completion of their case and the other half ultimately disconnected from Health Leads’ program.

Patient Disconnection vs. Success

(The cases with negative days are ones where Health Leads took down the information for patient, didn’t begin working with them until a few days later.)

Just at a glance, there appears to be pretty clear differences between the two groups. Most obviously, the disconnected patients seem to have many more failed communication attempts (red dots) than successful ones (green dots).

However, Health Leads wanted to know: exactly what are the factors that contribute to a patient disconnecting from Health Leads? How does the difficulty of a patient’s need play into the problem? What other factors might be important to consider?

Against the backdrop of these pressing questions, Health Leads came to our DSSG team to use data to help discover some answers.

The Challenges

When we began tackling the problem, we ran into a slew of challenges. Unlike in the internet world where companies can track every iota of data down to the click, nonprofits serve their clients in person – meaning data must be manually recorded, rather than passively accumulated.

Furthermore, it may be that the factors we end up discovering as influencing patient outcome may be outside of the control of Health Leads. What if we found that the most significant indicators of patients’ success was gender or age? It would be hard to translate a finding like this into operationalizable actions for Advocates.

Our Findings

Over the summer, our team worked through the data to distill insight, discovering findings that Health Leads can use to improve their practice.

For example, we developed a “Patient Complexity Index” that tries to capture the probability that a patient will disconnect from Health Leads. We incorporate information about the type of resources this patient requires and historic performance information about the Health Leads clinic where the patient is served. For instance, needs involving employment or housing are typically much harder to resolve than needs around childcare or transportation. The success rates of each of these resource connections also vary per desk. We found that different Health Leads sites specialize in different types of resource connections.

By combining this information, Health Leads can more accurately quantify the difficulty of each patient so that more experienced Advocates can work with patients with more complex needs. By doing so, Health Leads can better address each patient’s different circumstances, lowering the chance that they’ll disconnect.

Patient Needs

A Need Complexity Index can help quantify the difficulty of these patients’ needs

Furthermore, Health Leads currently standardizes the intervals at which patients call patients: a minimum of once every 10 days. The findings from the data confirmed previous Health Leads research that Advocates should try to get in touch with patients frequently in the beginning stages of building a relationship with a patient. When an Advocate successfully contacts a client in the first month, that one successful phone call significantly decreases the likelihood of disconnection:

Call Frequency

Health Leads should call new clients frequently in the first month


We presented our findings and models to Health Leads at the end of this summer, and our results validate Health Leads’ emphasis on regular follow up. We believe that the information we provided reinforces organizational strategies that can increase client engagement: calling clients regularly and leveraging communication tools such as text messaging. By investigating the different factors influencing a patient’s likelihood to disconnect, our team’s findings have pointed to important steps that Health Leads can continue to take to ensure that more people get the resources they need for a healthy life.

I write about data science applied to social causes. If you want to be notified when my next post is published, subscribe by clicking here.

Source: Carl Shan – Weeks 7-12: Summer Wrapup