Monday 7 April 2014

In praise of Voice Recognition Software

The first time I watched the words appear on the screen as I spoke, I was reminded of Arthur C Clarke's dictum: "any sufficiently advanced technology is indistinguishable from magic".  It is 18 months since I put my dictaphone in my desk drawer, never to be taken out again, and if it is possible to have a love affair with a piece of technology, then I believe that this would apply to me and my voice recognition software.

Here are the people who toiled for decades to make it happen, James and Janet Baker.  You wouldn’t necessarily have thought it, but they are mathematicians and the technology is based on Hidden Markov modelling- that’s all I’ll say about that.  The Bakers sold their baby to a company in the Netherlands, for stock rather than cash.  When the company went into bankruptcy, the stock became valueless.  Story here and here.  They didn't lose everything though, and they should certainly be remembered for their contribution.



The technology was ultimately bought by a different company called Nuance, and is marketed as Dragon Naturally Speaking.  I like the software so much that I am going to show their logo right here, on my blog, without any financial inducement whatsoever.



We Clinical Geneticists should think about adopting it more, in my view.  We write longer letters than the average orthopaedic surgeon and, we take some pride in them.  Pills and operations do not form a part of our treatment repertoire; instead, we have the “therapeutic power of knowledge”.  Not much, perhaps, but better, by a long way, than nothing.     

Why is it better than Dictaphones and magnetic tapes; or digital audio files, which are transferred to another country for typing?  

Firstly, it is a technological advance which, like the washing machine, the tractor and arrayCGH, is capable of freeing people (and horses) up from tasks (washing clothes, ploughing, microscopic karyotyping) that can be grouped together  under the sub-heading of ‘drudgery’.  

Secondly, it is easier to maintain one's train of thought when the words are appearing on the screen in front of you.  This is a big one for people, like Clinical Geneticists, who are given to writing long and complex letters.  I always found it quite stressful to hold the entire contents of a complex letter in my head,  mentally ticking off sections as they were dictated.  And that's before someone came into my office and interrupted me in mid-flow, after which it became downright impossible. 

Thirdly, it is very easy to insert standard paragraphs using voice recognition software.  These can then be customized and/or edited at the time of dictating, making the whole process of generating letters amazingly quick and effiicient.  

Fourth, there is more control over turnaround time.  The letter is more nearly the finished article when it has been committed to text than when it is on a tape.  And if secretaries are freed up from typing, then they will be available to help with other administrative aspects of patient care- scheduling appointments, arranging tests, keeping track of file reviews. They need not fear that they will be out of a job.

Fifth, the software is cheap.  It can be purchased from their website for the sum of precisely £79.99, that is less than 100 Euros or around 130 USD.  You can buy 'medical' versions much more expensively; or you can teach it medical vocabulary yourself, which is easy to do.   

I have to be scrupulously honest and admit that, whilst I love it, our lovely administrative/secretarial staff do not.  The main argument is that, because they have not typed the letter, they don’t have a recollection of what is happening with a particular patient; and this is a problem if the patient phones in with a query.  I wouldn’t at all dismiss that as a valid objection, but there could be ways to work round it without throwing the baby out with bath water.

Second, getting a bunch of text to paste into a standard letter head to which names, addresses, copy recipients etc have to be added is not ideal for the medical secretary.  But again, workarounds for this can be created.  

Third, the technology is not perfect: it is still fairly new.  It’s a bit slower than talking into a Dictaphone and it will be a long time before a machine processes words better than an experienced secretary.  Names, and alphanumeric strings like gene names can all be painful to transcribe, and often require manual editing.  I notice that if I’m tired or have a cold, the accuracy falls.  It uses up quite a lot of RAM and can run very slowly if memory is tight.  

Most of the problems with it should easily be fixable in time.  The amazing thing to me is that it works so well in the first place.   

Will we all be using Voice Recognition Software in time?  I think so.  It is hard for someone who has used a dictaphone for 25 years to change to it (though not impossible).  But what about a young person who has never used either?  I know which I'd choose.