What is the best way to sync data science teams?

A well-defined workflow will help a data science team reach its goals. In order to sync data science teams and its members it’s important to first know each part of the phases needed to get data based results.  

When dealing with big data or any type of data-driven goals it helps to have a defined workflow. Whether we want to perform an analysis with the intent of telling a story (Data Visualization) or building a system that relies on data, like data mining, the process always matters. If a methodology is defined before starting any task, teams will be in sync and it will be easy to avoid losing time figuring out what’s next. This will allow a faster production rhythm of course and an overall understanding of what everyone is bringing into the team.

Here are the four main parts of the workflow that every team member should know in order to sync data science teams.

1) Preliminary analysis. When data is brand new this step has to be performed first, it’s a no-brainer. In order to produce results fast you need to get an overview of all data points. In this phase, the focus is to make the data usable as quickly as possible and get quick and interesting insights.

2) Exploratory analysis. This is the part of the workflow where questions will be asked over and over again, and where the data will be cleaned and ordered to help answer those same questions. Some teams would end the process here but it’s not ideal, however, it all depends on what we want to do with the data. So there are two phases that could be considered ideally most of the times.

3) Data visualization. This step is imperative if we want to show the results of the exploratory analysis. It’s the part where actual storytelling takes place and where we will be able to translate our technical results into something that can be understood by a wider audience. The focus is turned to how to best present the results. The main goal data science teams should aim for in this phase is to create data visualizations that mesmerize users while telling them all the valuable information discovered in the original data sets.

4) Knowledge. If we want to study the patterns in the data to build reliable models, we turn to this phase in which the focus of the team is producing a model that better explains the data, by engineering it and then testing different algorithms to find the best performance possible.
These are the key phases around which a data science team should sync up in order to have a finished, replicable and understandable product based on data analysis.

The post What is the best way to sync data science teams? appeared first on 3Blades.

Source: 3blades – What is the best way to sync data science teams?

How Can Businesses Adopt a Data-Driven Culture?

There are small steps that any business can adopt in order to start incorporating a data-driven philosophy into their business. An Economist Intelligence Unit survey sponsored by Tableau Software highlights best practices.

A survey made by Economist Intelligence Unit, an independent business within The Economist Group providing forecasting and advisory services, sponsored by Tableau Software, highlighted best practices to adopt a data-driven culture among other information relevant to the field of data science. To ensure a seamless and successful transition to a data-driven culture, here are some of the top approaches your business should apply:

Share data and prosper

Appreciating the power of data is only the first step on the road to a data-driven philosophy. Older companies can have a hard time transitioning to a data-driven culture, especially if they have achieved success with minimum use of data in the past. However, times are changing and any type of company can benefit from this type of information. More than half of respondents from the survey (from top-performing companies) said that promotion of data-sharing has helped create a data-driven culture in their organization.

Increased availability of training

Around one in three respondents said it was important to have partnerships or courses in house to make employees more data-savvy.

Hire a chief data officer (CDO)

This position is key to convert data into insight so that it provides maximum impact. This task is not easy, quite the contrary, it can turn out to be very specialized and businesses shouldn’t expect their CIO or CMO to perform the job. A corporate officer is needed who is wholly dedicated to acquiring and using data to improve productivity. You may already have someone who can be promoted to a CDO at your company: someone who understands the value of data and owns it.

Create policies and guidelines

After the CDO runs a data audit internally, it is relevant that company guidelines are crafted around data analysis. This is how all employees will be equipped with replicable strategies focused on improving business challenges.

Encourage employees to seek data

Once new company policies are in place and running, the next step is to motivate employees to seek answers in data. One of the best ways to do this is offering incentives (you pick what type). Employees will then feel encouraged to use (or even create) tools and find solutions on their own without depending on the IT guys.

The post How Can Businesses Adopt a Data-Driven Culture? appeared first on 3Blades.

Source: 3blades – How Can Businesses Adopt a Data-Driven Culture?

App and Workspace Discovery Demo

Whether you are selling tooth paste or software, reducing operations costs while improving efficiency is every business’s goal. In this post, we will go over some general concepts that we have encountered while setting up a modern micro services based architecture based on the Python stack, Docker, Consul, Registrator and AWS.

This example application demonstrates the simplest service discovery setup for a load balanced application and multiple named sub-applications. This is a typical setup for micro services based architectures. For example,https://myapp.com/authentication could route to one micro service and https://myapp.com/forum could route to another micro service. Additionally, ../authentication and ../forum could be any number of instances of the micro service, so service discovery becomes necessary due to dynamic instance updates.

This setup uses Consul and Registrator to dynamically register new containers in a way that can be used by consul-template to reconfigure Nginx at runtime. This allows new instances of the app to be started and immediately added to the load balancer. It also allows new instances of workspaces to become accessible at their own path based on their name.

This guide is in large part the result of a consulting engagement with Glider Labs. We also used this post from the good people at Real Python as a guide. The source code for this post can be found here.


This demo assumes these versions of Docker tools, which are included in the Docker Toolbox. Older versions may work as well.

  • docker 1.9.1
  • docker-machine 0.5.1
  • docker-compose 1.5.2

You can install Docker Machine from here and Docker Compose from here. Then verify versions like so:

$ docker-machine --version
docker-machine version 0.5.1 (04cfa58)
$ docker-compose --version
docker-compose version: 1.5.2

Configure Docker

Configure Docker Machine

Change directory (cd) into repository root and then run:

$ cd myrepo
$ docker-machine create -d virtualbox dev;
Running pre-create checks...
Creating machine...
Waiting for machine to be running, this may take a few minutes...
Machine is running, waiting for SSH to be available...
Detecting operating system of created instance...
Provisioning created instance...
Copying certs to the local machine directory...
Copying certs to the remote machine...
Setting Docker configuration on the remote daemon...
To see how to connect Docker to this machine, run: docker-machine env dev

The above step creates a virtualbox image named dev. Then configure docker to point to local dev environment:

$ eval "$(docker-machine env dev)"

To view running machines, type:

$ docker-machine ls
NAME      ACTIVE   DRIVER       STATE     URL                         SWARM
dev       *        virtualbox   Running   tcp://


Launch locally with Docker Compose

This demo setup is for a single host such as you laptop, but can work across hosts with minimal changes. There are notes later for running with AWS ECS.

First, we’re going to build the containers upfront. You can view the Dockerfiles in each directory to see what they’re building. An Nginx container and two simple Flask apps: our primary app, and our workspace app. Consul is pulled in from Docker Hub.

Let’s take a look at docker-compose.yml. Some images are pulled from DockerHub, while others are built from the included Dockerfiles in the repo:

  image: gliderlabs/consul-server
  container_name: consul-server
  net: "host"
  command: -advertise -bootstrap
  image: gliderlabs/registrator
  container_name: registrator
  net: "host"
    - /var/run/docker.sock:/tmp/docker.sock
  command: -ip= consul://

  build: app
   - "8080"
  build: workspace
    SERVICE_NAME: workspace
    SERVICE_TAGS: workspace1
    WORKSPACE_ID: workspace1
   - "8080"

  build: nginx
  container_name: nginx
    - "80:80"
  command: -consul

To build all images, run:

$ docker-compose build

Now we can start all our services (-d is used to run as daemon):

$ docker-compose up -d

This gives us a single app instance and a workspace called workspace1. We can check them out in another terminal session, hitting the IP of the Docker VM created by docker-machine:

$ curl
App from container: 6294fb10b701

This shows us the hostname of the container, which happens to be the container ID. We’ll use this later to see load balancing come into effect. Before that, let’s check our workspace app running as workspace1:

$ curl
Workspace [workspace1] from container: 68021fff0419

Each workspace instance is given a name that is made available as a subpath. The app itself also spits out the hostname of the container, as well as mentioning what its name is.

Now let’s add another app instance by manually starting one with Docker:

$ docker run -d -P -e "SERVICE_NAME=app" 3blades_app

We’re using the image produced by docker-compose and providing a service name in its environment to be picked up by Registrator. We also publish all exposed ports with -P, which is important for Registrator as well.

Now we can run curl a few times to see Nginx balancing across multiple instances of our app:

$ curl
App from container: 6294fb10b701
$ curl
App from container: 044b1f584475

You should see the hostname changed, representing a different container serving the request. No re-configuration necessary, we just ran a container to be picked up by service discovery.

Similarly, we can run a new workspace. Here we’ll start a workspace called workspace2. The service name is workspace but we provide workspace1 as an environment variable for the workspace app, and as a service tag used by Nginx:

$ docker run -d -P -e "SERVICE_NAME=workspace" -e "SERVICE_TAGS=workspace2" -e "WORKSPACE_ID=workspace2" 3blades_workspace

Now we can access this workspace via curl at /workspace2:

$ curl
Workspace [workspace2] from container: 8067ad9cfaf3

You can also try running that same docker command again to create a second workspace2 instance. Nginx will load balance across them just like the app instances.

You can also try stopping any of these instances and see that they’ll be taken out of the load balancer.

To view logs, type:

$ docker-compose logs

Launch locally with Docker Compose

Launch to AWS with Docker Compose

Docker Machine has various drivers to seamlessly deploy your docker stack to several cloud providers. We essentially used the VirtualBox driver when deploying locally. Now we will use the AWS driver.

For AWS, you will need your access key, secret key and VPC ID:

$ docker-machine create --driver amazonec2 --amazonec2-access-key AKI******* --amazonec2-secret-key 8T93C********* --amazonec2-vpc-id vpc-****** production

This will set up a new Docker Machine on AWS called production:

Running pre-create checks...
Creating machine...
Waiting for machine to be running, this may take a few minutes...
Machine is running, waiting for SSH to be available...
Detecting operating system of created instance...
Provisioning created instance...
Copying certs to the local machine directory...
Copying certs to the remote machine...
Setting Docker configuration on the remote daemon...
To see how to connect Docker to this machine, run: docker-machine env production

Now we have two Machines running, one locally and one on AWS:

$ docker-machine ls
NAME         ACTIVE   DRIVER         STATE     URL                         SWARM
dev          *        virtualbox     Running   tcp://
production   -        amazonec2      Running   tcp://<awspublicip>:2376

Then switch to production on AWS as the active machine:

$ eval "$(docker-machine env production)"

Finally, let’s build the Flask app again in the cloud:

$ docker-compose build
$ docker-compose up -d

How it works - Data Science

How it works


The heart of our service discovery setup is Consul. It provides DNS and HTTP interfaces to register and lookup services. In this example, the -bootstrap option was used, which effectively combines the consul agent and server. The only problem is getting services into Consul. That’s where Registrator comes in.


Registrator is designed to run on all hosts and listens to each host’s Docker event stream. As new containers are started, it inspects them for metadata to help define one or more service definitions. A service is created for each published port. By default the service name is based on the image name, but in our demo we override that with environment variables. We also use environment variables to tell Registrator to tag our workspace services with specific names.

Nginx + Consul Template

In this demo we have a very simple Nginx container with a very simple configuration managed by consul-template. Although there are a number of ways to achieve this with various kinds of limitations and drawbacks, this is the simplest mechanism for this case.

In short, our configuration template creates upstream backends for every app and every tag of the workspace app. Each instance is used as a backend server for the upstream. It then maps the / location to our app upstream, and creates a location for each tag of our workspace apps mapping to their upstream.

Consul-template ensures as services change, this configuration is re-rendered and reloaded without downtime in Nginx.

https://github.com/3Blades/deploymentRunning with ECS

There is an Amazon article on using Registrator and Consul with ECS. In short, it means you have to manage your own ECS instances to make sure Consul and Registrator are started on them. This could be done via infrastructure configuration management like CloudFormation or Terraform, or many other means. Or they could be baked into an AMI, requiring much less configuration and faster boot times. Tools like Packer make this easy.

In a production deployment, you’ll run Registrator on every host, along with Consul in client mode. In our example, we’re running a single Consul instance in server bootstrap mode. A production Consul cluster should not be run this way. In a production deployment, Consul server should be run with N/2 + 1 servers (usually 3 or 5) behind an ELB or joined to Hashicorp’s Atlas service. In other words, Consul server instances should probably not be run on ECS, and instead on dedicated instances.

A production deployment will also require more thought about IPs. In our demo, we use a single local IP. In production, we’d want to bind Consul client to the private IP of the hosts. In fact, all but port 80 of Nginx should be on private IPs unless ELB is used. This leaves the IP to use to connect to the Consul server cluster, which is most easily provided with an elastic IP to one of them or an internal ELB.


Star and Watch our GitHub repo for future example updates, such as using AWS CLI Tools with Docker Compose, example configurations with other Cloud providers, among others.


The post App and Workspace Discovery Demo appeared first on 3Blades.

Source: 3blades – App and Workspace Discovery Demo

Which College Should I Choose? (R Script)

Like many of you, I like to learn based on trial and error, so I decided to try to run my first R script and from “Michael L. Thompson’s script” and adapt it to this new experiment.

This script will tell you which college is best for you based on your ethnicity, age, region and SAT score. Although this is a generic example, it can be easily replicated using 3Blades’s Jupyter Notebook (R) version. Simply select this option when launching a new workspace from your project.

Who Are You?

Let’s pretend you are a foreign citizen that came here from Latin America on a H1-B and you are looking for a B.S. in Engineering. You are about to turn 30 and got married not that long ago and a kid is on his way. You are still earning under 75k but you are certain that if you get this career change you will jump into the 100K club, so what are your best choices to achieve that in the west coast?

studentProfile = list(
    dependent = TRUE,           
    ethnicity = 'hispanic',     
    gender    = 'male',         
    age       = 'gt24',         
    income    = 'gt30Kle110K',  
    earnings  = 'gt48Kle75K',   
    sat       = 'gt1400', 
    fasfa     = 'fsend_5',     
    discipline= 'Engineering', 
    region    = 'FarWest',      
    locale    = 'CityLarge',    
    traits    = list(          
      Risk      = 'M',
      Vision    = 'M',
      Breadth   = 'M',
      Challenge = 'M') 

Setup the Data & Model

This code loads the college database and defines necessary data structures and functions to implement the model.

## Loading college database ...Done.

Now, here are the top colleges so you can make a wise decision

# ENTER N, for top-N colleges:
ntop <- 10
studentProfile$beta <- getParameters(studentProfile,propertyMap)
caseResult          <- studentCaseStudy(studentBF,studentProfile,propertyMap,verbose=FALSE,ntop=ntop)
# This code will display the results
gplt <- plotTopN(caseResult,plot.it = TRUE)

R Script choose college

Now let’s tweak a bit and see what are your best options based on your SAT scores

R script which colleges is best for you

R script which colleges is best for you

R script which colleges is best for you

R script which colleges is best for you

R script which colleges is best for you
Full credit goes to Michael for this amazing job, I just tweaked a bit to use it as a brief example of the great things you can do with R.

The post Which College Should I Choose? (R Script) appeared first on 3Blades.

Source: 3blades – Which College Should I Choose? (R Script)

OpenAI: A new non-profit AI company

A new non-profit artificial intelligence research company has just been founded. According to the announcement made from the company’s website, the goal of the company is to advance digital intelligence in the way that is most likely to benefit humanity as a whole, unconstrained by a need to generate financial return. Announcement was made on the last day of NIPS 2015 conference and a small event was held by OpenAI on 12 December 2015 near the conference venue.

OpenAI’s research director is Ilya Sutskever, one of the world experts in machine learning. Our CTO is Greg Brockman, formerly the CTO of Stripe. The group’s other founding members are world-class research engineers and scientists: Trevor Blackwell, Vicki Cheung, Andrej Karpathy, Durk Kingma, John Schulman, Pamela Vagata, and Wojciech Zaremba. Pieter Abbeel, Yoshua Bengio, Alan Kay, Sergey Levine, and Vishal Sikka are advisors to the group. OpenAI’s co-chairs are Sam Altman and Elon Musk.
Sam, Greg, Elon, Reid Hoffman, Jessica Livingston, Peter Thiel, Amazon Web Services (AWS), Infosys, and YC Research are donating to support OpenAI. In total, these funders have committed $1 billion, although they expect to only spend a tiny fraction of this in the next few years.
Medium published an interview about OpenAI with Altman, Musk, and Brockman [2], where the founders answered various questions about their new AI initiative.
In a Guardian article about OpenAI written by Neil Lawrence[3], a professor of machine learning from University of Sheffield, importance of open data for the AI community is emphasized besides the open algorithms.

[1] OpenAI, Introducing OpenAI, https://openai.com/blog/introducing-openai/, Greg Beckman, Ilya Sutskever, OpenAI team, December 11 2015.
[2] Medium, How Elon Musk and Y Combinator Plan to Stop Computers From Taking Over, https://medium.com/backchannel/how-elon-musk-and-y-combinator-plan-to-stop-computers-from-taking-over-17e0e27dd02a#.x79zvtwsl, Steven Levy, December 11 2015.
[3] The Guardian, OpenAI won’t benefit humanity without data-sharing, http://www.theguardian.com/media-network/2015/dec/14/openai-benefit-humanity-data-sharing-elon-musk-peter-thiel, Neil Lawrence, December 12 2015.

Source: DeepLearning – OpenAI: A new non-profit AI company

Conference on the Economics of Machine Intelligence-Dec 15

The Creative Destruction Lab at the University of Toronto is hosting a conference on the Economics of Machine Intelligence on December 15 in Toronto: “Machine Learning and the Market for Intelligence.”

This meeting is not a computer science conference. The focus is on the business opportunities that ML is spawning: what has already happened, trends, and how the future may unfold. In addition to researchers such as Geoff Hinton, Rus Salakhutdinov, and Ilya Sutskever, the conference will also feature founders of ML-oriented ventures (e.g., Clarifai, MetaMind, Atomwise), large organizations that are at the frontier of applying ML (e.g., Uber, IBM, Bloomberg), investors focused on ML ventures (e.g., Accel, Bessemer, Google Ventures), and authors of recent books on the implications of advances in machine intelligence (The Master Algorithm, SuperIntelligence, Machines of Loving Grace, Humans Need Not Apply).  The program is attached.

Many entrepreneurs and inventors will attend, such as Tony Lacavera (Wind), Ted Livingston (Kik), David Ossip (Ceridian), Geordie Rose (D-Wave), Jevon MacDonald (GoInstant), Tomi Poutanen (Optimized Search Algorithms), Mike Serbinis (Kobo), Dan Debow (Rypple), Dennis Kavelman (RIM), and Barney Pell (Powerset, Moon Express).

Also, a number of Canadian CEOs will also attend (Dave McKay [RBC], Brian Porter [Scotiabank], Don Walker [Magna], Paul Desmarais III [Power Corp], Sam Sebastian [Google], Gerrard Schmid [D&H], Kilian Berz [BCG], Joanna Rotenberg [BMO, Private Wealth Management], Steve Carlisle [GM], etc)

Several VCs will join us (Relay, Real, BDC, Celtic House, Georgian, Accel, Bessemer, Google Ventures, DFJ, FoundersFund, Greylock, True, Amplify, Lux, Bloomberg, Microsoft, Salesforce, Spark, etc.).

Several members of the international print media will also participate (The Economist, Wired, New York Times, Financial Times, Associated Press, Bloomberg, etc.).

The Governor General and the Mayor of Toronto will also join us, along with a number of provincial and federal politicians.

If you would like to attend, then please register here: http://bit.ly/1OuuHIB

For official announcements on Facebook and Twitter page of “Creative Destruction Lab”, please check-out the links in [1] and [2].

[1] https://twitter.com/creativedlab/status/672066522395705345

[2] https://www.facebook.com/creativedestructionlab/?fref=nf

Source: DeepLearning – Conference on the Economics of Machine Intelligence-Dec 15

Open Discussion of ICLR 2016 Papers is Now Open

Open discussion of #ICLR2016 submissions is now open:


Access requires a CMT account. If you don’t have one already, go here:


Note that the assigned reviewers and area chair of each paper will be encouraged to consider the public comments in their evaluation of submissions.
Your comments will thus be very useful and appreciated!


Hugo Larochelle’s Google+ Post:



Source: DeepLearning – Open Discussion of ICLR 2016 Papers is Now Open

Software Developer Position at MILA

At MILA (Montreal Institute for Learning Algorithms), we are looking for a software developer to help us improving our software libraries (mostly Theano) and other related tasks.

This is a one year contract, full time position.
The duration of the contact could be extended depending on available funding.
If you are interested, please send your CV to Frédéric Bastien at “frederic.bastien AT gmail.com” with “Software Developer Position at MILA” as the email subject.
Candidates need to be authorized to work in Canada.

Source: DeepLearning – Software Developer Position at MILA

How We Priced Our Book With An Experiment

How We Priced Our Book With An Experiment

27 May 2015 – Chicago

Summary: We conducted a large experiment to test pricing strategies for our book and came to some very surprising findings about allowing customers to pay what they wanted.

Specifically, we found strong evidence that we should let customers pay what they want, which would help us earn more money and more readers when compared with traditional pricing models. We hope our findings can inspire other authors, musicians and creators to look into pay-what-you-want pricing and run experiments of their own.

To get automatically notified about new posts, you can subscribe by clicking here. You can also subscribe via RSS to this blog to get updates.

Introduction: Pay What You Want?

My co-authors (Henry Wang, William Chen, Max Song) and I have been working on our book, The Data Science Handbook, for over a year now. Shortly before launch, we asked ourselves an important question that many authors face: how much should we charge for our book?

We had heard of Pay-What-You-Want (PWYW) models, where readers can purchase the book for any amount they want (or at least above a threshold you set). However, many authors and creators worry that only a small percentage of people will contribute in a PWYW pricing model, and that these contributors will opt for meager amounts in the $1-$5 range.

On the other hand, we also felt that PWYW was an exciting model to try. A PWYW model would allow us to get the book out to as many people as possible without putting the book behind a paywall. We also had an inkling that this experimental pricing model would increase exposure for our book.

So we set out to answer this simple question: how should we price our book?

As practicing statisticians and data scientists, we thought of no better way to decide this than to run a large-scale experiment. The following section details exactly what we tested and discovered.

TL;DR – Letting Customers Pay What They Want Wins the Day

We experimented with 7 different pricing models pre-launch, with our subscriber base of 5,700 people. In these 7 different models, we compared different pricing schemes, including fixed prices at $19 and $29, along with several Pay What You Want (PWYW) models with varying minimum amounts and suggested price points.

Before the experiment began, we had agreed to choose whichever variant maximized the two things we cared about: the total number of readers and net revenue (later on, we’ll explain how we prioritized the two).

Before conducting the experiment, we thought that setting a fixed price at $29, like a traditional book, would lead to the maximum revenue.

After we analyzed our results, to our surprise, we discovered strong statistical evidence that with a PWYW model for our book, we could significantly expand our readership (by 4x!) while earning at least as much revenue (and potentially even more) as either of the fixed-priced variants.

The Prices We Tested: Setting Up Our Experiment

On notation: throughout this post, PWYW models will be described as (Minimum Price/Suggested Price). Example. ($0/$19) means ($0 Minimum Price, $19 Suggested Price).

Through a sign-up page on our website, we’ve been continuously gathering email addresses of individuals interested in our book throughout the process of promoting the Data Science Handbook.

We conducted this pricing experiment before the official launch of the book by letting our 5,700 subscribers pre-order a special early release of the book. The following diagram shows our experimental setup:

experiment setup

We started the early release pre-order process on Monday, April 20th. We stopped the pre-orders one week later, so that we could analyze our results.

Through Gumroad, we tracked data on the number of people who landed on each link, whether they purchased, and how much they chose to pay.

Note: To guard against people buying the book who were not originally assigned to that bucket (for example, those who inadvertently stumbled across our links online), we filtered out all email addresses that purchased a book through a variant that they were not explicitly assigned to. This gave us more confidence in the rigor of our statistical analyses.

What We Found: Experiment Results

The roughly 800 users in each of our experimental buckets went through a funnel, where they clicked through the email to visit the purchase page, and then decided whether or not to purchase. We collected data on user behavior in this funnel, as well as the price they paid.

conversion funnel

For each of the experimental variants, we collected data on 6 key metrics:

  • Email CTR – # of people who clicked through to the purchase page / # of people who received the email. The emails were identical, minus the link and a short section about the price.
  • Conversion Rate – # of purchases / # of people who clicked through to the purchase page
  • Total Sales – # of sales, regardless of whether a reader paid $0 or $100
  • Net Revenue – Total revenue generated, minus fees from Gumroad
  • Mean Sales Price – Average sales price that people paid
  • Max Sales Price – Largest sales price paid in that bucket

Below, you’ll see some plots on how each pricing variant performed on each metric. Each of the seven circles represents a different pricing variant, with the area of the circle being proportional to the magnitude it represents. The larger the circle, the “better” that pricing variant did in terms of our metrics.

The blue circles are the variants that were fixed at $19 and $29. The orange circles are the PWYW variants.

The X-axis of the following plots describes the minimum prices we offered: free, $10, $19 (this was a fixed price), $20 and $29 (also fixed). The Y-axes are the prices we suggested when we were using a PWYW variant: $19 and $29.

pwyw vs fixed


Looking above, it’s no surprise at a PWYW model of ($0/$19) had the highest conversion rate (upper right plot), and as a result the greatest number of people who downloaded the book . After all, you can get it for free!

Much to our surprise, many of our readers who got this variant paid much more than $0. In fact, as you can see above in the “Mean Sales Price” plot in the bottom left corner, our average purchase price was about $9. Some readers even paid $30.

To examine the distribution of payments we received for each variant, we also examined the histogram of payments for each of the 5 PWYW variants:

sales distribution

It’s again no surprise to see a large chunk of purchases at the minimum. However, you can also see fairly sizable clumps of readers who pay amounts around $5, $10, $15 and $20 (and even some who paid in the $30-$50 range).

In fact, readers seemed to like paying amounts that were multiples of $5, perhaps because it represented a nice round number.

Surprising Insights on Pay What You Want

You Can Earn As Much from a PWYW model (and possibly more) as from a Fixed Price model

Traditional advice told us that we should price our book at a high, fixed price point, since people interested in advancing their careers will typically pay a premium for a book that helps them do exactly that.

However, our ($0/$19) variant was ranked second in total revenue generated (tying with a fixed price of $29).

net revenue

In fact, if anything, the data lends credence to the belief that you can earn even more from PWYW than from setting a fixed price.

What do we mean by that?

Well, our ($0/$19) variant actually made nearly twice as much money as fixing the price at $19. The difference in earnings was large, and is strong statistical evidence that our book would make more money if we made it free, and simply had a suggested price of $19, than if we had fixed the price at $19.[1]

This was an incredible result, since it suggested that with a PWYW model, we could generate the same amount of revenue as a fixed price model, while attracting 3-4x more readership!

Higher Suggested Price Didn’t Translate to Higher Average Payments. But…

The “suggested” price didn’t seem to have seem to have a large impact on the price people paid. Compare the mean purchase prices between $19 suggested and $29 suggested in both the $0 minimum variants and the $10 minimum variants.

mean sales price

As you can see, moving the suggested price from $19 to $29 in both cases increased average purchase price by only $1.

However, we don’t mean to imply the suggested price had zero effect. In fact, the data lends support to actually having a lower suggested price.

You can look to see what happened to conversion rates when we changed the suggested price from $19 to $29. In both cases we tested ($0 minimum and $10 minimum), a lower suggested price had a higher conversion rate, and drove ultimately more revenue.[2]

Therefore, it seems that even if the average sale is the same despite different suggested price, total sales increased when you have a lower suggested price. This is perhaps due to certain readers being turned off by a higher suggested price, even if they could get it for $0.

Just imagine seeing a piece of chocolate being offered for free, but having a suggested price of $100. You might scoff at the absurdly high suggested price and refuse the candy, despite being able to take it for nothing.

On the other hand, if you were offered the same scenario, but this time the free candy had a suggested price of just $0.25, you may see this as fair and be much more inclined to part with your quarter.

Try It Out For Yourself

We think that all of these findings should spur authors and creators to conduct testing on their own product pricing. Gumroad, our sales platform, makes it remarkably easy to create product variants, which you can email out to randomized batches of your followers. Or, you can use the suite of A/B testing tools to ensure that different visitors to your website receive different product links.

By doing so, you may discover that you could reach a larger audience, while also earning higher revenue.

[1] This result just missed the cutoff for statistical significance. The actual p-value comparing $0/$19 with a fixed $19 was 0.057, missing our threshold of 0.05 necessary to qualify as statistically significant. Nevertheless, the very low p-value is a strongly suggestive result in favor of a PWYW model.

[2] Beyond being practically significant, this was also statistically significant with a p-value close to 0.

If you want to be notified when my next article is published, subscribe by clicking here.

Source: Carl Shan – How We Priced Our Book With An Experiment

Risk vs. Loss

A risk is defined as the probability of an undesirable event to take place. Since most risks are not totally random but rather dependent of a range of influences, we try to quantify a risk function, that gives the probability for each set of influences. We then calculate the expected loss by multiplying the costs that are caused by the occurrence of this event with the risk, i.e. its probability.

Often, the influences can be changed by our actions. We might have a choice. So it makes sense to look for a course of actions that would minimize the loss function, i.e. lead to as little expected damages as possible.
Algorithms that run in many procedures and on many devices often make decisions. Prominent examples are credit scoring or shop recommendation systems. In both cases it is clear that the algorithm should be designed to optimize the economic outcome of its decision. In both cases, two risks emerge: The risk of a false negative (i.e. wrongly give credit to someone who cannot pay it back, resp. make a recommendation that does not fit the customer’s preferences), and the risk of a false positive (not granting credit to a person that would have been creditworthy, resp. not offering something that would have been exactly what the customer was looking for).
There is however an asymmetry in the losses of these two risks. For the vast majority of cases, it is far more easy to calculate the loss for a false negative than for the false positive. The cost of credit default is straightforward. The cost of someone not getting the money is however most certainly bigger than just the missed interests; the potential borrower might very well go away and never come back, without us ever realizing.
Even worse, while calculating risk is (more or less) just maths and statistics, different people might not even agree on the losses. In our credit scoring example: One might say, let’s just take what we know for sure, i.e. the opportunity costs of missed interests, the other might insist to evaluate a broader range of damages. The line where to stop is obviously arbitrary. So while the risk function can be made somehow objective, the loss function will be much more tricky and most of the time prone to doubt and discussion.

Collision decision

In the IoT – the world of connected devices, of programmable object, the problem of risks and losses becomes vital. Self-driving cars will cause accidents, too, even if they are much safer than human drivers. If a collision is inevitable, how should the car react? This was the key question ask by Majken Sander in our talk on algorithm ethics at Strata+Hadoop World. If it is just me in the car, a possible manoeuvre would turn the car sideways. If however my children sit next to me, I might very well prefer a frontal crash and rather have me injured than my passengers. Whatever I would see as the right way to act, it is clear that I want to make the decision myself. I would not want to have it decided remotely without my even knowing on what grounds.
Sometimes people mention that even for human casualties, a monetary calculation could be done -no matter how cruel that might sound. We could e.g. take the valuation of humans according to their life expectancy, insurance costs, or any other financial indicator. However, this is clearly not, how we would usually deal with lethal risks. “No man left behind” -how could we explain Saving-Private-Ryan-ish campaigns on economic grounds? Since the human casualty in the values of our society is regarded as total, not commensurable (even if a compensation can be defined), we get a singularity in our loss function. Our metric just doesn’t work here. Hence there will be no just algorithm to deal with a decision of that dimension.

Calculate risks, let losses be open

We will nevertheless have to find a solution. One suggestion for the car example is, that in risky situations, the car would re-delegate the driving back to a human to let them decide.
This can be generalized: Since the losses might be valuated differently by different people, it should always be well documented and fully transparent to the users, how the losses are calculated. In many cases, the loss function could be kept open. The algorithm could offer different sets of parameters to let the users decide on the behavior of product.
As a society we have to demand to be in charge defining the ethics behind the algorithms. It is a strong cause for regulation, I am convinced about that. It is not an economic, but a political task.

Source: Beautiful Data