Our fifth Author Earnings report marks the first anniversary of this website. We now have a year of quarterly snapshots to analyze, and the results have been consistent while also revealing gradual trends.

This time around, we look at ebooks and ISBNs. There is an entire “shadow industry” of ebook sales uncounted by industry pundits. A year ago, we gave you the first look at a shadow industry of indie ebooks. This year, we get a first glance at the works that no one is tracking or counting.

Which raises the question: Do we need ISBNs? Probably something like them, but not at their current cost to benefit ratio. ISBN-less ebooks outsell those with ISBNs, which proves nothing except that ISBNs aren’t needed for sales success. If the industry or retailers want to track ebooks, let them offer a standardized and low-cost means to do so.

Another point is that ebooks change over time as authors update them and their backmatter. There may even be different editions for each retailer, so links point to sequels at that website. Expecting a different ISBN for each of these editions is not realistic. These numbers are simply a relic of the print days, which are on their way out. But that’s looking ahead to our next report.


A good point about ISBNs, but you should note that this is not a universal problem. For instance, ISBNs for Canadian residents are free ( and costs vary in other countries. Obviously that isn’t an expensive or onerous burden for those authors.

Cost isn’t the only issue. I bought a block of ISBNs (from Nielsen, because I’m in the UK). I don’t plan to buy any more. The cost is a factor, but the bigger issue by far is the clunky, unfriendly and time-consuming system for assigning ISBNs to books.

ISBNs need to be free. Period. It’s outrageous that there is actually a monopoly out there–not Amazon–and the AG and their pals don’t get on the case of Bowker.


Privately, Big Publishing will chalk up Amazon’s decision not to require ISBNs as some sort of insidious and deliberate sabotage plan aimed at the publishing industry.

But it’s obvious that Amazon eschewed ISBNs for a far simpler and more benign reason: the cost of ISBNs were an unnecessary barrier to entry for the smallest publishers and self-published authors. Look at how punitive Bowker’s ISBN fee structure is toward folks only buying a few at a time:

1 ISBN costs $125.00
10 ISBNS cost $275.00
100 ISBNS cost $525.00
1000 ISBNS cost $1250.00

An indie self-publisher buying ‘em one at a time from Bowker is paying at least 100X more per ISBN than a Big Five publisher does. Bowker’s inflated ISBN costs at the low end were effectively another “gatekeeper” that kept the cash-strapped riffraff from self-publishing their handful of books. And that’s no accident; the industry liked it that way.

Looks like that shit just blew up in their faces pretty bad, huh?

I read the entire report + graphs and I have one question: It seems that you “crawl” the top 120,000 titles–which the report states comprise approx. 50% of Amazon’s revenue. I “get” the 120k titles part, but I didn’t see (unless in an earlier report) how you know what % of Amazon revenues these top-120k titles represent. I believe each sub-120k title generates between $0.00 and a few hundred dollars per book per year in revenue, but I would think the summation of these sales would be a mystery to all but one poor fellow or fella in some subterranean cubicle in Seattle. So, I’m very interested in your thought process on this. The way I look at it, Amazon controls 67% of the ebook market, and AE collects and analyzes data on every single ebook which is sold on the Amazon platform and generates at least $Xyy per year in sales (I know this number is less than $1,000/yr./title). Or, AE tracks 100% of the Amazon titles which sell more than 0.x ebooks per day (I believe it would be a fractional number of ebooks sold per day). Thanks also for crunching some KU initial numbers. I’ll be very interested in the KU element of your future quarterlies–but I also appreciate the individual reports that authors are posting in their blogs–interesting to see all angles as this new marketplace develops (and affects different authors/genres differently). Anyway, I was wondering how you kept busy on your recent sea adventure, and now I know. :-P

Fixed the bad links — thanks for the heads-up.

The 120,000 titles that AE grabs aren’t overall sales-ranks 1-120,000 all inclusive.
As we crawl the bestseller sublists, we end up capturing roughly:
– all of the top several hundred ranks
– 95% of the top 1,000
– 80% of the top 5,000
– 68% of the top 10,000
– 52% of the top 25,000
– 42% of the top 50,000
– 33% of the top 100,000
– 11% of the top 1,000,000
– some ranked in the 2,000,000-4,000,000 range (mostly from really specific nonfiction bestseller lists like “Renaissance Painter Biographies” or whatever.)

But we know what the shape of the sales-to-rank curve is, and so we know what the “missing” books at ranks in between the ones we captured are selling. We then numerically integrate the whole curve to get a total daily sales number for all ebooks at all ranks. In other words, for each rank, whether or not we happened to capture that particular book in our data set, we add up its corresponding unit sales to compute Amazon’s total unit sales. Picture “shading in the area under the curve.”

While the books in long tail below rank 100,000 are shown as having 0 daily sales in our spreadsheet, they actually do sell a book every few days in the 100,000-500,000 range, a book a week in the 500,000-1,000,000 range, etc. (We zeroed those out in the spreadsheet because we didn’t want to get caught up explaining to the math-challenged how a book can sell a fraction of a copy a day. ;) But we do include those fraction-sellers in the integrated total of 1,542,000 total ebooks sold per day (of which 1,331,910 are ranked 1-100,000).

Hope that helps.

Thanks, Spiderman. That absolutely clears it up for me and is pretty fascinating new info.

“[W]e know what the shape of the sales-to-rank curve is, and so we know what the “missing” books at ranks in between the ones we captured are selling. We then numerically integrate the whole curve to get a total daily sales number for all ebooks at all ranks. In other words, for each rank, whether or not we happened to capture that particular book in our data set, we add up its corresponding unit sales to compute Amazon’s total unit sales.”

This explanation helped me a lot. It may be incomprehensible to the innumerate, but explaining that you are running integral calculus by numeric techniques on the data to estimate the total daily sales numbers gives me a high degree of confidence in the accuracy of your results.

The one problem that I see with integrating over the sales numbers is that doing so treats a set of discrete values as if they were continuous. I will have to think about this some more. Have you validated that you can indeed do that with the dataset you have? I admit the numbers are large enough to assume certain approximations, but you need to validate those approximations anyway. (I have been there and seen engineers assume they could run parametric statistics on a binomial expansion just because the book said they could, but the data did not support to assumption.)

Just asking.

Great question about numerical integration. The thing that makes it easy (and accurate) is the by-definition monotonically-decreasing nature of the sales-to-rank curve (it’s a pareto distribution, more or less, with a couple kinks in it caused by different “list visibility” regimes).

So it just becomes a choice of what numerical-integration interpolation strategy you use. We used linear interpolation between sales-to-rank data points, to get an appropriate level of accuracy.

Sorry for the geekspeak.

I am not troubled by your geekspeak. I, too, speak geek.

Is there any way to validate the linear interpolation technique? You say you have a Pareto curve, and that would cause me to question the validity of a linear interpolation immediately. I think as a first order approximation it may yield useful results, but my concern is that the magnitude of the error term is unknown. Do you have a handle on that?

Good point about error magnitude — it didn’t matter as much before, as out focus was mainly the relative performance of books published via each path. Therefore, an error affected all sectors consistently and equally and didn’t change those relative results.

However, now we’re looking at predicting the actual absolute number of ebook sales on, and the actual absolute size of the market as a whole. That requires more accuracy.

“Within 20%” is no longer good enough — we need a better handle on the accuracy. That’ll be our next focus.

The data, however, doesn’t follow a strict pareto or power-law distribution — it’s close, but not exact. There are those rank regimes I mentioned where the slope steepens or flattens — most likely due to sharp differences in how much bestseller list visibility books get in those ranges.

I offered these points — integration over discrete values and precision in the error term — as critiques. Based on my experience, if I were to question your results, those are the points I would choose to attack. By giving you these critiques, I give you a chance to prepare yourself in a benign environment.

The question is cost vs benefit. How much time will you expend to shore up these weaknesses versus expending that time on some other matter?

Were I in your place, I would spend 10 minutes, maybe half an hour on the argument regarding integration over discrete values, but I would give a solid day or more to tying down that error term.

Thanks again Hugh and Data Guy! Question for Data Guy, what is the most up to date information you are using to assign number of sales to different sales rank? I know Theresa Ragan updates that from time to time based on her experience but do you have one of those you could share with us all? Thanks! Dan

Sorry for the below formatting – I cut and pasted from the spreadsheet:

Sales Rank Sales Per Day
1 7,000
5 4,000
20 3,000
35 2,000
100 1,000
200 500
350 250
500 175
750 120
1,500 100
3,000 70
5,500 25
10,000 15
50,000 5
100,000 1

Mostly, it still follows: with a few additional data points added (like the one at rank 100) to increase curve accuracy.

We’ve left it consistent since we started to avoid introducing yet another variable into the report-to-report comparisons.

Yeah, ISBNs are hugely problematic. If you make them more generic, like a number that represents a specific story in all formats, then it’s not a useful tool for people who want to use it to track physical inventory. If it’s too specific, like it is now, it doesn’t allow for minor changes to the amorphous ebook format.

The main problem is that they cost anything. ANYTHING. In this age of computers, there is almost zero reason that the system can’t be automated. Maybe we’d just make a short phone call to prove that we are not robots, and then we’d get an account that let us create something like 10 ISBNs without further human interaction.

And I bet Amazon could figure out how to do it all very cheaply.

Seriously, if the Canadian government can make them free, anyone can. The very fact we don’t pay for them here means it’s not hard to do! I also find the process miraculously easy.

The Canadian government isn’t that good. There is a cost. Someone is paying it.

Is there some reason the US publishing industry should have its record keeping paid for by taxpayers?

Well said. Nothing is free and the rest of the country should not be paying for an isbn number for me.

How much should it cost to give out a number?

really, the question is more a matter of how many thousand can you give out for a dollar?

you need a very low bandwidth website to collect the registration info and some sort of data store to record it. Your biggest problem is going to be minimum transaction fees or it would be a matter of hundreds for a penny.

I really don’t know what Bowker does, or what is involved in ISBN management. So I can’t comment on cost. I resist assuming anything about cost based on my extreme ignorance of the operation.

Likewise, I have no idea what the Canadians ISBN operation entails or what it costs for them to do it.

But I do know it isn’t free.

Might be nice to have an actual system to track publications, especially if we’re going to have copyrights that last almost a century.

Also, aren’t ISBNs used by the Library of Congress?

Amazon already does it cheaply. It is called an ASIN. Whom does it benefit? Amazon. Not me.

Do my books need a number for my business purposes? No.

Do my books need a number for Amazon’s business purposes? Apparently so, and I am content to let Amazon pay for those numbers.

Do my books need a number for bookstores to order and stock them? Yes, so in my fiendish little mind, it is right and good and righteous that the bookstore pay for that number. The bookstores do not see it that way. I understand that, but I am not bound to play by their rules.

Hugh and Data Guy, your reports are real page-turners. I have to stop myself from rushing on to the next revelation and the next, utterly captivated, caffeine delivery system growing cold. This one delivered a long-overdue spanking to the hand-wavy data collection systems Big Pub has depended on for years, and used to convince a good deal of the industry of a non-existent slump in e-book sales.

Indies are not just nibbling at trad-pub’s lunch, we’re taking increasingly large bites from the one sandwich they’d never trade: e-book profits.

ISBN are free in France too.

There’s a thing that might slightly alter the overall data. It’s related to Amazon publishing. Amazon’s imprints (including Amazon Crossing) benefit exclusively from Kindle First: these are ebooks chosen among Amazon imprints becoming free for a one month period of time for all Amazon’s customers who have subscribed to Amazon Prime. So, each time a prime member downloads a free Kindle First ebook, it is accounted as one sale.

On January 2015, for example there has been four Kindle first ebooks almost constantly ranked in the Kindle Store overall top 20. Granted, the number is too small for the overall data to be skewed, but should the number of Amazon Kindle First ebooks increase, that would have to be taken into account. In other words, Amazon publishing titles who benefit from Kindle First don’t compete on a fair basis with the other ebooks (yes, I know, the word “fair” is a loaded one, and publishers also benefit from coop on the Amazon store, contrary to indies).

Wow, thanks for this, DG. Absolutely fascinating. It seems that reported pub industry stats are in the same class as headlines bemoaning Hollywood’s “declining” box office; by failing to account for constantly-growing foreign receipts, the number-crunchers exclude 60-80% of total b.o. receipts. Makes for nice catchy headlines and sympathy-mongering–but does not reflect observable reality.

It seems that reported pub industry stats… make for nice catchy headlines and sympathy-mongering–but do not reflect observable reality.

And that’s precisely the problem with having bad official data. We have an entire industry basing its go-forward business decisions on badly skewed statistics. Here are a few of the consequences of misleading ISBN-based reporting:

1) Print bookstores and other retailers fail to proactively source and carry 30% of the top-selling books that readers want.

2) Writers make poor career decisions based on wrong information.

3) Inaccurate data-based reporting paints a woefully incorrect picture of our industry (Science Fiction is dead! Ebooks are plateauing! Declining! Writers are earning less and less! No one reads anymore!)

4) The NYT, USA Today, WSJ, and other national ebook “Best Seller” lists lose credibility by only showing *some* of the bestselling ebooks in the U.S. and ignore others. (To be fair, that’s a non-ISBN-related policy decision: those lists deliberately exclude “single retailer” best sellers, ISBN or not, that outsell most of the books on their lists. But the ISBN-based data the industry publishes do lend that policy artificial validation.)

None of these things are good for readers or writers.

As someone who’s been making deeply data-driven entertainment for over a decade (video games), I can’t tell you how crazy-making it is to watch the publishing industry attacking you for attempting to derive useful information.

I once had a manager who dinged me for having a tracking spreadsheet too complicated for him to understand. The criticism you come under often feels like that.
“If he didn’t have an agenda, he wouldn’t be compiling all those numbers!*

As a fellow video-game industry techie, I do find the publishing world rather behind the times when it comes to data analytics. Right now, in addition to writing, I’m also doing some part-time data science consulting work for a video-game publisher; the contrast in data savvy is night and day.

But I don’t feel that we’re really being “attacked” by the industry for our work. In fact, Publishers Lunch joined us in the comments over at the author earnings site and they were very constructive and helpful in their participation. Hopefully, we’ll see some opportunities to collaborate with them to both get a clearer picture of the overall market.

Nobody likes not having accurate data on our industry. We’d all like to see the reported numbers — official and unofficial — do a better job of reflecting reality.

