Is This Post True?

There’s a lot of discussion about what constitutes fake news, what impact it has, and whether blocking it is the bigger threat.

I’d like to instead talk about the perception of truth. How do we change the norms of publishing so that a mistake is detected rather than amplified? How do we broaden the context that an article lives within and dynamically update it as time passes and others expand on the original article? In what context are writers presenting their reporting and opinions?

I started reading Neil Postman’s “Amusing Ourselves to Death” the other day. It was written in 1985, but feels very relevant today. I’m not that far into it, but the second chapter delves into the interplay between truth, and how the medium we are communicating in shapes that truth.

As a culture moves from orality to writing to printing to televising, its ideas of truth move with it. … Truth, like time itself, is a product of a conversation man has with himself about and through the techniques of communication he has invented.

that there is a content called “the news of the day”— was entirely created by the telegraph (and since amplified by newer media)… The news of the day is a figment of our technological imagination. … Cultures without speed-of-light media -— let us say, cultures in which smoke signals are the most efficient space-conquering tool available -— do not have news of the day.

It is not really a deep insight to say that the news of the minute, the “trending” news is something we have created with the systems we have built. Lots has (somewhat rightly) focused on Facebook and Google, but the Open Web is much larger than that.

We’ve lowered the barrier to publishing. We’ve changed the medium through which we express truth, but we haven’t really changed the norms or means by which we enable readers to judge truth.

Let’s compare how truth is perceived in some different mediums:
1. Newspaper journalistic standards in the 1980s (for instance) relied on “balance” and “unbiased” language and had separate sections for opinion. All of this got published by some “reputable” source. Some publisher with a long history that was known to the reader. And typically there was not really a lot of competition within any particular geographic region (though this hasn’t always been true) so it was pretty easy to know the range of writers and publishers.
2. Scientific journals rest on citations and peer review. Ostensibly the data/methods are all available so someone could reproduce the experiments. But doing so is often not easy. The perceived truth is determined both by the reputation of the journal (and by proxy, who reviewed it) as well as explicitly referencing other papers that may disagree. A paper that ignores existing literature is unlikely to get past reviewers.
3. For an article on the Web we borrow a lot of our norms from (1) and (2). We have also added a social validation of truth by displaying a count of likes, the number of comments, or the number of people who have shared an article. Rarely is anything shown about the people providing the social validation other than a simple count. Sometimes it is also where the content ranks on in a search, or perhaps what the post is linking to. But the text of a link and the entire content is entirely within the writers control.

Obviously, for all of these cases I am ignoring lots of background on all the work that people do in research, investigation, validation, and writing. The point here is not about what went into publishing it, but rather what does a new reader see? Why does a reader trust it? Of course the reader can judge the written evidence for themselves, but we know of a huge list of cognitive biases that the reader must contend with. Additionally, it takes a lot of time to research the truth of an article.

Both (1) and (2) do nothing to handle the case where the author has simply made a mistake. They have mechanisms for referring to articles published before the current one, but the web is capable of dynamically updating to refer to things published afterwards. An articles does not need to be static.

How do we take the best of (1) and (2) and adapt them into a world where the reader can better judge truth in (3)?

I don’t really have an answer. I’m not sure these are even the right questions, but let’s try an idea, right here and now since this post is currently asking you to determine its validity. Don’t just judge just my words. Judge my words against the words of others in the world. You’ve read this far, let’s add some related content to this article and see how it affects your personal search for the truth. Then let’s reconvene below and discuss some more…

 

Onward…

How has your perspective of this post changed? Did you open any of the links? Did you get lost in them? Were you overwhelmed by them? Do you feel that you can more accurately judge truth? Do you find my ideas more valid or less valid? Getting back to my click-bait headline: Is this post in accordance with fact or reality (true)?

I haven’t actually read all of those posts. I looked at a few. The Stanford Web Credibility Project from 2002 was very interesting and relevant. Its top recommendation is to:

1. Make it easy to verify the accuracy of the information on your site.

What if in order for a writer’s opinion or reporting of news to be considered a part of humanity’s search for truth that writer is expected to publish it alongside others’ content? What if publishers are expected to give others traffic in order for your words to have any weight? Why should I care what you have to say if you’re afraid to algorithmically link me to other people who are providing other opinions/insights? What if readers learn to instantly dismiss any article that is not willing to automatically link to others?

Displaying related content is a fundamental part of the web now, but so far we have mostly only used it to keep users on our own sites, or to make money with advertisements. Maybe there is another use case? Yes, the user could go to a search engine, but maybe we can improve the truth seeker’s user experience beyond that.

This related content would not need to be static. As posts link to yours, they may get weighted more heavily and show up on your post. So your post does not only link backward in time, it is a living document providing a link to how others have built on your work.

Some systems already try to do this. WordPress has pingbacks where a link from some other site generates a comment on the site it is linking to. It is an attempt to keep an old post connected to the conversation, but there is no weighting of it relative to others. It doesn’t really scale for a very popular post. And a post that is two hops back in the chain is not necessarily considered.

Of course a related content system still has potential problems of bias due to who controls the algorithms. Open sourcing the algorithms would help, as would having a standard mechanism where multiple providers can provide the service in a compatible way. Building trust here would be hard. Getting publishers to trust content from competing publishers to be inserted into their page would be especially difficult.

But maybe by changing the norms through which we judge truth we can get back to seeking truth together. Or maybe there are other ideas for how we should be presenting our articles to help humanity find truth.

 

Technology Apathy

Someone is wrong on the Internet.

Sure, personal apathy and sarcasm in the face of lots of people being wrong is understandable. Life is busy, and it takes a lot of energy to discuss something. But have we let our own apathy to that situation seep into the systems on which our civilization is built?

 

Struggling with Content Ordering

I agree in freedom of speech. I agree in democratizing publishing. I agree with giving a voice to everyone with as small a barrier to entry as possible. I’m not sure anymore what I think about “neutral” algorithms, code, and design. Optimizing only for what users interact with, or click on or what can be measured is always problematic, but maybe its bigger than that.

Neutral unbiased ranking of content is what is currently building and reinforcing the bubbles on the Web. Treating one like as being just as important as another, one comment as just as important as the next. Treating one vocal community’s ideas as just as important as a comment by someone who’s spent their life focused on deeply studying something. Letting that vocal community then shape and reinforce the opinion and content of the web. Forcing others to spend their time fighting back because “someone is wrong on the Internet”. The lack of context around who is speaking is glaring.

If the medium is the message, then what does it say that our mediums use collaborative filtering so that users never see anything other than what is within their own bubble?

But simple reverse chronological sorting fails on so many levels. It is transparent and easy to implement, but it does nothing to combat information overload. Nothing to engage users.  It optimizes for people who have the time to keep up with some particular stream of content. Or for those who are online all the time and so can keep up with an infinite stream. It can only ever engage those users who are already super engaged with that topic or website. A chronologically sorted website seems unlikely to create the level of engagement that a site like Facebook creates by optimizing to keep users within their own bubbles.

 

No Handlebars

Was listening to the Flobots earlier on a run. Feels very appropriate to be thinking about the impact of technology on society and how to do better.

Uniform priors now seem much less appropriate to me than they used to. Time for this blog to be less neutral also. 

Great Links to Lucene/Solr Revolution 2015

A great list of links to slides from Mani Siva’s blog. A lot of things I’m currently thinking about: overlaying graphs on Elasticsearch to do content re-ranking, improving search relevancy, what is “fairness” in ranking and how you display content.

 

One of the annual conferences that I always look forward to is Lucene/Solr Revolution. The reason is not only because of highly technical nature of the conference, but also you can get a glimpse of how future of search is evolving in the open and how Solr is being pushed to its limits to handle […]

via Lucene/Solr Revolution 2015 — Mani Siva’s blog

The Walsh Standard v Automattic Creed

I’m reading Bill Walsh’s book The Score Takes Care of Itself on his methodology for getting the San Francisco 49ers to perform at a high level in the 1980s (and win 3 Super Bowls in the process), and I found it interesting how closely his Standard of Performance matches up against the Automattic Creed. I thought I would compare the two: Walsh’s clause in black, relavant Automattic line(s) in red, and some notes from me.

Exhibit a ferocious and intelligently applied work ethic directed at continual improvement;

I will never stop learning.

…the only way to get there is by putting one foot in front of another every day.

In both cases the first line is about learning.

demonstrate respect for each person in the organization and the work he or she does;

I will never pass up an opportunity to help out a colleague, and I’ll remember the days before I knew everything.

Not really a perfect match, but to help someone with humility implies respecting them and who they are.

be deeply committed to learning and teaching, which means increasing my own expertise;

I will never stop learning.

I will never pass up an opportunity to help out a colleague, and I’ll remember the days before I knew everything.

Learning and teaching go hand in hand.

be fair;

I will build our business sustainably through passionate and loyal customers.

I think its a stretch to say that the Creed mentions fairness so openly. But this line captures the fairness we strive to apply to our customers.

demonstrate character;

Really the whole Automattic Creed is about character. Being written in the first person means it is defining the character that we are all agreeing to try and match.

honor the direct connection between details and improvement, and relentlessly seek the latter;

I will never stop learning.

I am in a marathon, not a sprint, and no matter how far away the goal is, the only way to get there is by putting one foot in front of another every day

Systematic learning. I like Walsh’s focus on the details mattering though.

show self-control, especially where it counts most— under pressure;

I am more motivated by impact than money, and I know that Open Source is one of the most powerful ideas of our generation.

Choosing what to work on and how to work on it is all about self control.

demonstrate and prize loyalty;

We don’t really go for loyalty oaths. We have a great retention rate though and really quite a lot of loyalty and friendship. Getting together with Automatticians feels like getting together with family.

use positive language and have a positive attitude;

I’ll remember the days before I knew everything

This is really the closest thing we have to mentioning a positive attitude in the Creed.

take pride in my effort as an entity separate from the result of that effort;

Open Source is one of the most powerful ideas of our generation

I guess that kinda works… Again not strong overlap. “Pride” is actually the first line in the Automattic Designer’s Creed.

be willing to go the extra distance for the organization;

I am in a marathon, not a sprint

I think there is a pretty important difference in philosophy here. Walsh is pretty strongly into personal sacrifice in what I would call an unsustainable way — “If you’re up at 3 A.M. every night talking into a tape recorder and writing notes on scraps of paper, have a knot in your stomach and a rash on your skin, are losing sleep and losing touch with your wife and kids, have no appetite or sense of humor, and feel that everything might turn out wrong, then you’re probably doing the job.”

To me you are not doing the job at all. You’re going to write terrible code, poorly think it through, and have a really hard time sustaining this pace. Everyone has crunchtimes, but this is not a sustainable way to live. Nor is it a sustainable way to build products. I have similar problems with the Agile methodology and its focus on “sprints”.

Ironically though, I am writing this post when I should be asleep.

deal appropriately with victory and defeat, adulation and humiliation (don’t get crazy with victory nor dysfunctional with loss);

Given time, there is no problem that’s insurmountable.

Keep working at it, don’t give up. This can be really hard when you spend months fighting to fix one scaling problem after another and it is impossible to know where the end is.

promote internal communication that is both open and substantive (especially under stress);

I will communicate as much as possible, because it’s the oxygen of a distributed company

Open communication is the type of communication that really matters.

seek poise in myself and those I lead;

No real direct comparison. To me poise is keeping your cool when you realize that some code you launched is causing a performance problem about to bring down WordPress.com and how do you diligently and swiftly solve the problem.

put the team’s welfare and priorities ahead of my own;

I don’t really want a similar line in the Creed. Yes, I sacrifice some of the things I would love to be working on in order to pursue Automattic’s goals, but my welfare and priorities are pretty well aligned with the company’s.

 

maintain an ongoing level of concentration and focus that is abnormally high;

I am in a marathon, not a sprint, and no matter how far away the goal is, the only way to get there is by putting one foot in front of another every day

To me the level of concentration needed is to be able to focus on what is most important, say ‘no’ to the sorta important things, and get up the next day and make that correct decision yet again. I don’t make the correct decision every day, but hopefully I will quickly recognize when I make the wrong one.

and make sacrifice and commitment the organization’s trademark.

Given time, there is no problem that’s insurmountable

Again Walsh focuses a lot on sacrifice. I think that’s the wrong thing to focus on. There is certainly some involved in any endeavor, but the way he discusses it in his Standard of Performance, and elsewhere in the book doesn’t feel very sustainable.


 

There’s some interesting differences between the two, but the focus on teaching, openness, and being systematic in making progress stand out pretty strongly. For reference, here is the entire Automattic Creed:

I will never stop learning. I won’t just work on things that are assigned to me. I know there’s no such thing as a status quo. I will build our business sustainably through passionate and loyal customers. I will never pass up an opportunity to help out a colleague, and I’ll remember the days before I knew everything. I am more motivated by impact than money, and I know that Open Source is one of the most powerful ideas of our generation. I will communicate as much as possible, because it’s the oxygen of a distributed company. I am in a marathon, not a sprint, and no matter how far away the goal is, the only way to get there is by putting one foot in front of another every day. Given time, there is no problem that’s insurmountable.

 

Openness, Supportiveness, and Leadership

In the Fall of 2015 I took the Emerging Leaders Program at the University of Denver. I really enjoyed the class. Gave me a great framework and space for thinking about how to improve my ability to lead a team. Below is my final essay for the class. The goal of the essay was to review yourself and create a plan for how you want to become the leader you want to be. Since openness is a key part of what I want to do better, what better way to practice than by publishing it to the 100+ people that visit my blog each day. 

What Leadership Aspects I Value

A number of the values of great leadership resonated really strongly with me. I especially get convinced by evidence, so the When Teams Work Best readings felt very compelling (still on my list to read the whole book). The most effective teams practice being open, supportive, action oriented, and having positive personal styles (the venerated “no jerks” policy). Of these four aspects, openness and supportiveness are the characteristic most prevalent in the most successful teams.

I work at a completely distributed company where most folks work from home all over the globe. Openness and supportiveness are values that are very strongly encouraged (intentionally) at Automattic, and I believe are an important part of our culture. When you are distributed and growing you need ways to communicate asynchronously, both across time zones, but also to new employees who will join you next year. Email is terrible at this, so we rarely use it. Our primary form of communication, p2s, are all open by default; our financials are open within the company; all of the metrics we track are open.

And of all the places I have worked I have never seen a similar level of support for for giving folks autonomy and trying to enable each other. Basics like our systems team defaulting to yes when someone has an idea are a stark difference to many other places. That doesn’t mean debate doesn’t occur, but unlike many places the default never seems like a ‘no’.

I have been working in this open, supportive environment every day for over four years now, and yet I feel like I am terrible at being open and supportive. Some of this is certainly the general self doubt we all carry around with us. I’m ok at it, but for working at an open source company it’s striking how little I’ve contributed to open source projects. My blog is the best example of my contributions to Elasticsearch in the open, and the openness I’ve expressed there has been incredibly rewarding when I make time for it. But wow do I find it hard. When I look around at other Automatticians and how fluidly they seem to practice openness, I don’t come anywhere close.

Looking back at the 10-12 managers I have had in my life, I also realize that the best of them were both really open and really supportive. Those two or three jump out immediately to me, and the rest fade into the background noise of a 15 year career. And yes, one of the top three managers I have worked with is my current team lead. That’s what makes now such a great opportunity to learn.

Reviewing Where I Am At

About 6 months ago I expanded my role on the Data Team to try and help fill some of the gaps that have been developing as the team has expanded to 20 people and our team lead has expanded his role within the company. In some areas I feel like I’ve been fairly successful, others have been less so. The role was announced under the title “parity bit”, as a witty reference to the simplest form of error checking data. In some respects the role has also been about being a co-lead, or more accurately, a backup lead for the team. Like most things at Automattic, the role is continuing to evolve. I think it is becoming more of a mixture between a product/project manager where I try and facilitate organizing specific projects and initiatives rather than being a catch-all for everything. Not a dramatic shift, but it feels like it has more focus than a generic “backup” lead.

From these short six months I feel like openness and supportiveness are the things I most need to work on. They are probably aspects that always need to be worked on, but they don’t come naturally to me, and that is probably because I haven’t practiced them enough. Both of them seem tightly bound to a leader’s Emotional Intelligence (self awareness, self management, social awareness, and relationship management) which has been found to be up to 90% of effectively leading a team. Taking an emotional intelligence test during the class I ranked average to moderately high for self awareness and self management. For both social awareness and relationship management I was average. So I have some things to work on across the Emotional Intelligence (EQ) spectrum.

You Can Only Control Yourself

There is not a perfect mapping or path for which EQ aspects to work on to cultivate an open and supportive environment, but for the short term (six months) I have decided to focus on a few specific behaviors. My own behaviors are the only thing I can control, so that’s what I need to work on:

  1. More regular journaling. I think best when I put words on paper and need to form a coherent thought. I’ve re-picked up my intermittent journal that I started back in March of 2004. Let’s see if I can make it a habit. I should probably practice more openness here, and post some of these on my blog too.
  2. Practice Reframing Problems (at least partly through journaling). This was the first item brought up in class, and it jives with a number of other things I have read. In the book Your Brain at Work (discussing brain biology and understanding how you and others react) reappraisal was one of the key methods for controlling your emotions and reframing feels like a similar methodology. Run into a problem or an emotional response, and reconsider it from a different perspective. Find a different benefit or a different way to interpret the event. Or just recognize and treat it as a normal emotion to have. I also want to practice reframing problems so that they are inspiring and sensible for others.
  3. Practice Openness and Supportiveness. This is really broad, but a few ideas really resonated:
    • Post more often. Automattic has embraced chatting through Slack at the expense of p2s. It has mostly been good, but our per capita p2 posting and commenting rate have dropped significantly. So has mine. I think it makes it harder for future Automatticians (and current ones) to stay up to date. Posting takes time, I should volunteer to do so more often, and through that help to clarify project and role clarity.
    • Ask questions rather than providing solutions. I’m terrible at this, I always want to be the one to provide solutions when really helping others to find solutions, enabling them, is far more effective in the long term. It is also more supportive. And yet despite knowing this I catch myself failing at this almost every day.
    • Bring up the uncomfortable issues. We are all smart, we all know there are lingering issues. Be the one to ask about them rather than letting them fester.
  4. Improve my listening. I had a fairly low score on the listening survey I did. 11 points out of a possible 25. So I’m picking a few specific things that I rated myself low on to work on:
    • Don’t think about what I am going to say while someone is speaking.
    • Intentionally learn something from every person I meet.
    • Don’t assume I know what the speaker is going to say before they say it.
    • Be comfortable with allowing silence, allow people to think and react.
  5. Manage my own energy. Again this was a topic that also came from the Your Brain at Work book, and is something I’ve tried working on in the past. Despite not being a muscle, the brain burns 15-20% of your calories every day. Certain times of day my brain is at its most effective. I should intentionally choose to do things that take more mental energy at those times. Control distractions and interruptions, and recognize at any one moment what i can mentally handle working on. Running… meditation… journaling… these are all tools that I know work for me and I should use them more judiciously.

This feels a bit like too many things to really focus on. Choosing is hard and I should consider paring it down, but feels helpful to write them out. I would also really like to have some metrics to track how well I am doing at these, but I don’t think I can come up with them for everything. Ultimately I think everything is about changing habits (The Power of Habit – another great book), so here is how I am approaching these:

  1. For journaling I have already added that into my regular habits three times a week (I use a great little app called Balanced to remind me of habits I am trying to form).
  2. Reframing is tied to my journaling where I am trying to regularly pick an event from the day before to reframe. I added an automatic prompt every time I open my journal that is: “Event from yesterday to reframe: “.
  3. I can easily track posting more. We have good metrics of how many posts and comments I make internally. On average since I started I have been posting 20 times a month and commenting 100 times a month. But while I am commenting at a faster rate in 2015 than prior years, I think my posting rate has fallen to 12 or 13 per month.
  4. Listening seems hard to have a metric or habit for. The best I have come up with is going off of Julian Treasure’s five methods for listening better. I think of them, practicing silence as a part of meditation is the one habit that would be worth trying first.
  5. Which brings us to managing energy where I am positive that meditation is something that I need to build a stronger habit around. My current goal in Balanced is to meditate twice a week, and it’s great when I really make it happen. One related habit that I have been pretty good at building over the past few months is to have a minimum morning exercise routine based off of the Royal Canadian Air Forces’ 5BX 12 minute exercise program. Having something that is short and minimal every day is a much easier habit to maintain.

Looking back at the five major areas of improvements I’ve suggested for myself feels fairly daunting and maybe too large of a thing to focus on. Maybe even unrealistic. At the same time, the individual habits that I think get me there don’t feel that onerous. Like everything, it will require some more iterations and more experimentation.

Learning About Modern Neural Networks

I’ve been meaning to learn about modern neural networks and “deep learning” for a while now. Lots to read out there, but I finally found this great survey paper by Lipton and Berkowitz: A Critical Review of Recurrent Neural Networks for Sequence Learning[pdf]. It may be a little hard to follow if you haven’t previously learned basic feed forward neural networks and backpropagation, but for me it had just the right combination of both describing the math while also summarizing how the state of the art is applicable to various tasks (machine translation for instance).

Screen Shot 2015-12-28 at 7.41.13 AM

A two cell Long-Short-Term-Memory network unrolled across two time steps. Image from Lipton and Berkowitz.

I’ve been pretty fascinated with RNNs since reading “The Unreasonable Effectiveness of RNNs” earlier this year. A character based model that can be good enough to generate psuedo code where the parenthesis are correctly closed was really impressive. It was a very inspiring read, but still left me unable to really grok what is really different about the state of the art NNs. I finally feel like I’m starting to understand and I’ve gotten a few of the Tensorflow examples running and started to play with modifying them.

Deep learning seemed to jump onto the scene just after I finished my NLP Masters degree about 5 years ago. I hadn’t really found the time to fully understand it since then, but it feels like I’ve avoided really learning it for too long now. Given the huge investments Google, Facebook, and others are putting into building large scalable software systems or customizing hardware for processing NNs at scale, it no longer just seems like hype with clever naming.

If you’re interested in more reading, Jiwon Kim has a great list.

 

Six Use Cases of Elasticsearch in WordPress

I spoke at WordCamp US two weeks ago about six different use cases I’ve seen for Elasticsearch within the WordPress community.

I also mentioned two projects for WordPress.org that are planned for 2016 and involve Elasticsearch if anyone is looking for opportunities to learn to use Elasticsearch and contribute to WordPress:

The video is here:

 

And here are my slides:

Colemak: 0 to 40 WPM in 40 Hours

On April 1st my first child was born and I started a wonderful month of paternity leave. Holding a sleeping infant leaves you with lots of sleepy hours where its (sometimes) possible to do repetitive tasks, so I decided to follow the 10% of my Automattic colleagues that are using either Dvorak or Colemak. My love of natural language processing led me to build word lists based on English word frequency and word/character frequency of my code and command line history.Colemak_layout_2 I chose Colemak over Dvorak because only 17 keys change location and most of those only move slightly. A lot of the key combinations that are ingrained from 15 years of using emacs are still pretty much the same. Standard commands like Cmd-Q, Cmd-W,Cmd-Z, Cmd-X, Cmd-C, and Cmd-V are all in the same places.

Why Would You Do This?

Well, needless to say, a layout designed in 1878 is probably not optimized for computers. Colemak was actually designed to place the most frequent letters right at your fingers. The fluidity is unnerving. There is very sparse evidence that you can type any faster with Colemak if you are already a great QWERTY touch typist. If you want to read more this StackOverflow thread is interesting. I also know and work with a lot of folks who don’t regret moving to either Colemak or Dvorak.

For myself, I was not a great touch typist. I knew the theory. But practicing typing was never something I did. Before I started Colemak I had a QWERTY typing speed of about 60 words per minute when copying text using TypeRacer. That’s about average. I don’t like being average. And I’ve never practiced typing code for speed. My most common three character sequence when coding is not ‘the’, it is ‘( $’… sigh PHP. I bet I can be faster with some practice.

So, if I’m going to try and get faster why not go all out? I make my living by tapping keys in a precise order. Why not learn a modern layout that has been well designed? I’ve also occasionally had pain in my hands, and my knuckles like to crack in ominous ways sometimes. Altogether, now seemed like a good time to give it a try.

And the most important reason: Never stop learning.

Learning Strategy

My strategy evolved over time, but this is where I ended up and what I would recommend.

  • This article made me think about typing as analogous to learning a musical instrument. Research has shown that learning music requires: “accurate, consistent repetition, while maintaining perfect technique”. In short, strive for accuracy and focus on the parts that you are not doing well at to improve.
  • Your brain needs time to process and learn. I had a habit of practicing Colemak for at least one minute each day. Some days I practiced for an hour, rarely longer.
  • Start out by learning the keyboard layout. I used The Typing Cat for about two hours over the course of a week.
  • Get a software program that can take arbitrary lists of words, and track and analyze where you are slow. I used Amphetype. Its not a great UI, but worked well enough. When practicing word lists practice the same three words in a row repeated three times before moving on to the next (the, of, and, the, of, and, the, of, and, to, in, a, …). This just felt like a good mix of repetition and mixing words to me. Your mileage may vary.
  • Then focus on practicing frequent English key sequences (or whatever your preferred language).
  • The top 5 bi-grams (the two-letter sequences ‘th’, ‘he’, ‘in’, ‘er’, and ‘an’) comprise 10% of all bi-grams. You should be extraordinarily fast and accurate at the top 30 bigrams.
  • Similarly get fast at 3-grams, 4-grams, and 5-grams. I built my lists from Peter Norvig’s analysis of the Google N-Gram Corpus.
  • Learn the most frequent words. Also from the N-Gram Corpus, the top 50 English words are about 40% of all words. Get fast at those, and you are well on your way.
  • When you are typing the above lists at 30+ WPM start practicing the top 500 words.
  • Along the way, focus on your mistakes. With Amphetype you can analyze the words and tri-grams that you make the most mistakes with. Build new lists based on these, slow down, and practice them till you are doing them perfectly. Speed will come. Focus on not needing to make corrections.
  • Rinse and repeat. Take breaks.
  • Go cold turkey and switch over completely. This was a lot easier because I was on leave from work. It wasn’t really until a month of practice that I completely switched. My QWERTY speed is now about as slow as Colemak because my brain is confused.
  • I’ve also moved on beyond simply English words and am practicing the 200 most common terms in my code, the 40 most common unix command terms, and the most common 3, 4, and 5 grams in my code.

All of my word lists are available in this Github project. There are also instructions for building your own lists. Writing this post was my trigger for cleaning up my lists so I can be more efficient at getting from 40 WPM to 80 WPM.

Analysis of Time Spent

I use RescueTime to track all of my time on my computer. In April I spent a total of 46 hours on my computer. Looking at only the time I spent where I was typing (rather than editing adorable photos of my daughter):

In the first six days of May as I slowly ramped back up at work I spent 26 hours on my computer with the keyboard layout entirely set to Colemak. About an hour of that time was spent practicing in Amphetype (still doing at least a minute of practice per day). Total time spent with Colemak has been about 47 hours, but I’m pretty sure I am undercounting how often I switched back to Qwerty for writing email in April. On May 6th I reached 41 WPM on TypeRacer for the first time.

Forty WPM is not very impressive, but it is noticeably more fluid and continuing to improve steadily. At this point it is good enough that I can return to work and be productive (if a little terse).