/ Sausage & Laws

Download Slides (KEY)


Hi my name's Hugh McGuire, founder of LibertyVox which is an open source audio library of volunteer read public domain books and Pressbooks, which is an open source book production tool built on top of WordPress. Those of you who've seen me speak before, I often talk about pretty abstract kinds of stuff and I'm going to talk more about some of my experiences in the trenches of what it's like building a publishing workflow tool and I was thinking as I was listening to John and Haig, I think that I'm perhaps the enemy of craft here because I'm a big fan of good enough but I'll just let them know that there are a lot of publishers who don't agree with that right now.

So that's my Twittery stuff and the website of Pressbooks if you'd like to make books really easily.

So Pressbooks–for those of you who aren't familiar–is a content management system and export system for books built on top of WordPress. We've totally rebuilt WordPress to make a system for creating books outputting epubs, mobis, pdfs as well as a web version. Our front page looks like this and I won't bore you with the inside of it but again: you put your content in once and we handle all the conversion to get the stuff out the other end.

I started Pressbooks...What are we 2013? I think we cobbled together an alpha in 2011, so it's been a while and it's been really exciting.

There are a few things that I find really interesting about Pressbooks and what we've done so far. The first thing is that all Pressbooks have a native web version straight out of the box because we're built on WordPress. It's very easy to do, it's built into the system that you have a web version of. The book right from the start, whether you make it publicly accessible or not is up to you.

Here's a book I did with Brian O'Leary, published by O'Reilly and probably lots of you guys in here have contributed to the book or perhaps read it. What we found about the web versions of books is that the publisher's really doesn't care about web versions of books, author's really don't care that much, and readers mostly don't care although I would say that book got–I think probably over it's lifetime it's had–thirty or forty thousand visitors to the site reading articles and I see often in Twitter someone pointing to one article or another from the book so it has had a certain value but generally when we talk about a web version of a book that comes out of Pressbooks, the eyes of publishers either light up in fear or roll back in boredom.

Another thing really exciting I announced last year at this event: that we were going to open source Pressbooks. It's built on top of WordPress. I'm a long advocate of open source and a believer in the cause of open source. Pressbooks is on GitHub so you can download the whole thing and wrangle with it and get it installed on your WordPress system. It's not the easiest tool to get up and running but we found also that publishers, authors, and readers really don't care about open source at all.

The third thing and this is the thing probably that excites me more than anything else in the world of books is that with an online system–a content management system–where your whole book is in a structured format, you understand, you know, what's a title and what different things are. You could tag it very richly in the content itself and you would have an automatic very powerful web based API that you could expose in different ways to users.

This is a book that actually no longer exists on the web and in fact wasn't built on Pressbooks, but it's Dracula Dissected done by a friend of mine where he took the text of Dracula and split it up into component parts that were all navigable on the website. So it's the whole test but he split it up by character, by date, by location, tagged it all and you can navigate in different kinds of ways. I think it's a very interesting idea to start thinking about how we can interact with books in the future and interesting to hear discussions of that earlier today.

We also found that publishers, authors, and readers really don't care at all about books as APIS–except Kate and Megan who are both here today who I got an e-mail in my box today from the publicity department at Doubleday Canada which said Kate Pullinger launched her book as API and I thought I bet you I'm the only person on this e-mail list who even knows what that means but it's great to see some people doing some really interesting experimenting. I think also it's fair to say when I talk about books as API not being important, it's really in the trade world, academic publishing, but when you get into medical reference text books et cetera, certainly APIs are an important part of how they think about creating their books.

Here's the other thing–oh by the way Pressbooks is also a single source book production system that produces pdf, epubs and mobi–now this actually is interesting to publishers, authors, less so to readers and publishers and authors really want to hear more. Readers of course don't really care about how the sausage is made and the really exciting thing to me is that all of those formats: web, epub, mobi, and pdf are all generated from HTML and CSS which also publishers, authors, and readers don't care about at all.

So the conclusion I have come to after a year since we launched Pressbooks as open source is that really publishers and authors are interested in ways to save money and make more money and that that's really the thing that drives their decision-making. And I think it's fair enough that generally authors and publishers are doing what they're doing at least in large part certainly on the publisher's side for financial reasons. And also I would say the kinds of authors and publishers that we're talking to tends to be either self-publishing authors or small presses doing let's say up to fifty or a hundred books a year or ten books a year, something like that. It's possible that there are larger more interesting publishers that just haven't been talking to Pressbooks all that much.

So that's kind of an introduction to some of the experience that we've had at Pressbooks and I'm not going to go into something just a bit more concrete. The famous line: "If you like laws and sausages, you should never watch either being made" and I don't know if I can say the same for making books but I'm going to talk about some experiences we've had on the PDF side of building books with CSS and HTML. And I think it's kind of this funny place we're in–I've had this conversation a few times in the last couple of days that all of us are deeply committed to the book itself and many of us come from the world of the web and we're sort of trying to bring these two traditions together. And something that has been the most exciting for me in this whole process is to take a very webby interface and to be building print outputs out of that and to be tinkering around with the designs of books.

I'm just going to show you a couple examples of things we've been doing–outputs using a template system using Pressbooks. This is a group out of the UK called Connell Guides. They do these great, very short books: Shakespeare's plays et cetera. So this was one of the first complicated themes we did for PDF output with some colors. The image is kind of cool, started to do backgrounds with color text boxes with funky colors and a bit more colory stuff. I don't know how well you can see any of that but it was kind of interesting to start realizing how much of the world of CSS we could start pouring into actually print output.

There are a couple of tools: we use a tool called Prince XML that is our PDF generating engine. There's another one called Antenna House, which I think Atlas uses and I think there are some open source ones out there. Prince is proprietary but the amazing thing is how much you can start doing especially with the new page media CSS3 that's out there; how much power you can get in creating a book.

So here's another book which is a much more traditional book. We did this with University of British Columbia Press. It's not quite finished yet but it's in process. So this again is a very traditional academic book, a collection of different authors contributing to the book and what's a bit challenging about what we're doing–maybe I'll just run through and show you some neat things: so there's table of contents laid out. It's all laid out automatically straight out of Pressbooks. We're pretty happy about how the design came out. We got a spec for what the design should look like and came very close to matching it. I'm going to get to some problem that we have in the system but you can do different running headers, put your page numbers wherever you want, it can handle floating figures, it can even flip tables around on their side. That was fun.

So I guess the point here is that you can get to a very good print output using CSS and HTML and that's pretty interesting. And one of the powers of this is you might have different content from using the same template and so you might want chapter numbers sometimes, you don't want chapter numbers other times. You want to get rid of the subtitle, get rid of the chapter numbers and then you might just apply another template altogether.

This is a template that was actually built to do copies of Agatha Christie mysteries but we can also apply it to long boring academic texts about the security situation after 9/11 in Canada and the U.S.

Another output there and we see a little problem that I'm going to talk about shortly at the bottom of that page: More output. The idea that you can decouple that presentation layer from the content itself is really exciting and it's amazing how much power there is already in the CSS spec and thanks to W3C for pushing that along, but there is a catch that the CSS doesn't yet handle fine tooth typography very well and there are a few things that keep coming up as we show publishers and say “I'm not a typographer by the way” so when I show them stuff and say “Look how amazing this is” they say “Oh, but that comma’s in the wrong place,” and the kinds of problems that we have are with widows, orphans, especially bottom balancing. That's the biggest problem we see. Some bad breaks and flowing text around strange stuff like that inverted table. So widows and orphans can be controlled. I'm going to get into some CSS stuff here – don't have too much left.

Anyway widows and orphans can be controlled in CSS. It works really well but there's this big penalty you get in bottom balancing. Who knows what bottom balancing means? Anyone? I learned it just a few months ago when I was getting criticized by one of our publishing clients so here is an example of what happens. You see at the bottom there's two lines left there? So we're saying “No, we don't want an orphan here,” so you don't want to leave a line on its own at the beginning of a paragraph at the bottom of a page so the system says “Okay we'll make sure there are two here” and then at the bottom here–there's another paragraph on the other page so if they have that sentence you're going to have a problem with a widow which would be a sentence left over at the bottom of the page and so, so far so good. We've solved widows, we've solved orphans, but the result is bottom balancing is off. So you're pages at the bottom don't line up. So this typographer sees this and they go “Ah that's not good enough, my craft genes say no” so figuring out how to fix this I think is going to be important for the W3C folk and those who want to solve using CSS as a true print production mechanism.

Now there are ways to do this. You can turn of your widow and orphan setting and here you see we've got an orphan at the bottom of the page and I don't know if you can see this but you can apply a span up there to loosen up the text in this paragraph up here that's going to drop that down so that you lose your orphan at the bottom and you do that with some CSS that looks something like this.

The other thing you could do is tighten up that paragraph so that the next sentence that comes along and from the next page and you do that with a tight class that might look something like this.

So again, my plea to the W3C: If you can figure out how to solve bottom balancing with CSS, we're a long way to having publishers saying “Wow this really is an awesome print production tool.” It might happen anyway. I think there's this sort of this...as the world evolves we're going to see this improvement in print technology in templating systems and an adjacent reduction in requirements for fine-tuning of typography.

So that really is all I wanted to talk about–just some in the trenches stuff as we're evolving and it's very interesting as I find myself up here talking about some really nitty gritty details of print production and I've been reflecting over the last few days on what’s the conclusion of this. And I think again, we've got this group of people who are deeply committed to the idea of the book, to the book as a metaphor, to the book as a cultural artifact but who are also deeply committed to the web and technology and evolving the world and I have this feeling as we struggle with Pressbooks to get close enough to what a lot of publishers need and as we see what's going on in the world–this is a very slow movement and I think in the publishing space towards changing the way we think about books and it's been a theme, certainly last year as John mentioned, there was this moment where we sort of shed this you know what? Screw the old publishing companies. We're going to go off to the future with all sort of new and exciting directions and I think that's what's going to end up happening and it's places like Wattpad where really the concerns about bottom balancing are the furthest things from the mind of all the millions of the kind who are writing and adults who are writing original fiction and interacting and non-fiction on sites like Wattpad.

So I think that there's a risk as we really try to force the craft on this culture of the book that it's going to be left behind by the rest of the universe as it moves away because of this slowness to adapt to what people are wanting and needing in greater numbers. So I think that's it for me. Thank you very much.