Archive for Data

Digital Pen Technology

Back in April 2011 I went to a Birds of the Feather event sponsored by BrightIdea, see here for my posting on that event. At the event one of the examples of an innovation was the use of digital pen technology to collect information on forms by workers in the field. I didn’t know what they meant by digital pens but soon learned that it was a combination of a pen and some special paper. The pen and paper let normal handwriting be captured and transferred to a digital format without any scanning.

I was intrigued because this is a particular pain point in many explicit knowledge management/document management projects I’ve worked on over the years. Trying to convince people to change from handwritten forms to online input forms is fraught with hardship and resistance. The arguments pro and con usually go something like this:

– The handwritten form is easy to use, can be signed for authorization and can then be retained as an authenticated final copy.

– The online form is difficult to use, requires a computer, a printer and a scanner and a whole lot of training.

– The handwritten form is difficult to share. We have to scan it and input the data manually and then verify the input. This slows down the workflow.

– We must have a printed and signed form for compliance purposes. We want a true, unchangeable authenticated copy so we will have to print the form, sign it and then scan it.

The core digital pen technology comes from the Swedish company Anoto. The technology has been licensed to other companies. For example, PaperIQ Digital Pen for Blackberry solution, and LiveScribe that produces digital pens combined with digital audio recorders.

In June 2011, I got in touch with Big Prairie Limited a local Hong Kong company that provide consulting services for Anoto based digital pens technology. Big Prairie has been working in the digital pen area in Asia for about ten years.

In exchange for helping out at the Big Prairie booth at the Greater China eHealth Forum 2011 on 7-8 October I was given a LiveScribe Echo 4GB digital pen. I didn’t agree to blog about the digital pen technology and no one from Big Prairie has reviewed this blog. Just to be clear, I got 1 pen, 3 notebooks and some software with a retail value of about HKD1,700 (about USD220) in exchange for 2 days of work at the forum. Clearly, I’m really a cheap date.

I prepared some demos and for two days I demonstrated the LiveScribe Echo digital pen. Subsequently, I’ve been using the pen almost every day and showing it around to my network. People are very impressed with the technology. The pen is a normal ball-point pen with a small camera and a digital recorder and player. The pen is lightweight and a bit big but still comfortable for writing. The pen can be used as just a normal ball-point pen or used with the Anoto dot paper to record what you are writing. The notebook is made from normal notebook paper which as been printed with thousands of very small blue-colored dots. These dots are locational markers; when combined with the camera your writing is recorded precisely and accurately. The digital recorder can be used stand-alone or it synchronizes with your handwritten notes. The notebook has printed commands at the bottom of the page that let you turn on, pause, turn off the audio recording, jump ahead in 10 second intervals, bookmark, jump to a % position, set the playback speed and adjust the volume of the playback. The small speaker on the pen is good enough for listening or you can buy some headphones that plug into the pen for both better listening quality and recording quality. You can play back an audio recording by simply touching any of the synchronized handwritten text.

I don’t want to go over all of the features of the LiveScribe pen because that information is easily available and better explained on the LiveScribe website.

The handwritten notes and recording can be easily exported to Adobe pdf format. This is called a pencast. I’ve made some pencast examples that illustrate what I think are the important points. Download these pencasts and open with Acrobat Reader version 9.3.2 or higher.

1. A pencast with handwritten notes and audio recording by listening to the BBC One-minute news broadcast. Pencast no 1 ….
– The black text has no synchronized recording.
– The green text has synchronized recording.
– Click on the green text to listen to the recording. The remaining green text will go light grey. You can jump around by clicking on the green or light grey text.

2. A pencast with handwritten notes, audio and a quick sketch. Added a few days’ later handwritten notes and audio recording from a BBC One-minute news broadcast. Pencast no 2 ….
– It captures the sketch easily and accurately.
– Notes and recordings can be added at different times.

3. A pencast of “Double Ninth: Missing my Shandong Brothers”, with handwritten Chinese characters, Putonghua pinyin and English translation. There is no audio recording. This is a poem that almost all Chinese children learn in school. Pencast no 3 ….
九月九日憶山東兄弟
獨在異鄉為異客,
每逢佳節倍思親.
遙知兄弟登高處,
遍插茱萸少一人.

As a lonely stranger in a strange land,
At every holiday my homesickness increases.
Far away, I know my brothers have reached the peak;
They are planting flowers, but one is not present.

“Double Ninth, Missing My Shandong Brothers”
— Wang Wei (王維), Tang Dynasty

4. A pencast with handwritten Japanese kana and kanji. Pencast no 4 …
– Kana means katakana and hiragana writing. Kanji mean Chinese characters.

Handwritten notes can be converted to text using the MyScriptforLiveScribe tool from a company named VisionObjects . It costs about USD30.00. It is very easy to use from within the LiveScribe Desktop.

Here are the results.
1. A pencast with handwritten notes and audio recording by listening to the BBC One-minute news broadcast converted to text and then saved as a JPG. Pencast no 5 …
– As a JPG file it retains the layout and format of the handwritten notes.

2. A pencast with handwritten notes and audio recording by listening to the BBC One-minute news broadcast converted to text and then saved as a DOCX. Pencast no 6 ….
– It loses the layout and the format.
– The text can be edited.
– The text has 401 written letters and 10 errors marked in red so the error rate was 2.49%. This is remarkable for conversion of handwritten text to type text. I did a project last year where scanning handwriting and then converting to type text the error rates was between 25% to 50%. See here for information on ICR, intelligent character recognition.

3. A pencast of “Double Ninth: Missing my Shandong Brothers”, Chinese characters, Putonghua pinyin and English translation. I’ve converted the Chinese characters to text, the Putonghua pinyin to text and the English translation to text and saved as a DOCX. Pencast no 7 ….
– There are 37 characters in the original text plus 2 characters of the author’s name. There are 6 errors marked in red. The error rate is 15.38%. Once again, this is remarkable for handwritten Chinese character to type text conversion.
– There are 199 handwritten letters and diacritic marks in the Putonghua pinyin. There are 29 errors marked in red. The error rate is 19.33%. It is high because MyScriptforLiveScribe doesn’t have a dictionary for Putonghua pinyin.
– There are 188 handwritten letters and punctuation marks in the English translation. There are 15 errors marked in red. The error rate is 7.98%.

4. A pencast of Japanese kana and kanji. This text only has hiragana and kanji characters converted to characters and then saved as a DOCX. Pencast no 8 ….
– 65 handwritten hiragana and kanji.
– 3 errors marked in red. The error rate is 4.62%. Once again, this is remarkable. Projects in Japan I’ve worked on with handwritten Japanese to text conversion are lucky to get the error rate down to 25%.

The LiveScribe solution is for personal note-taking. It is most useful for students, business people, professionals such as lawyers, doctors, advisors and anyone who takes notes which is just about anybody. Being able to record the audio along with the notes can significantly increase the value of the notes and can improve your ability to add to the notes later. Being able to create a PDF pencast makes the notes easy to share with others. Being able to convert them into type text makes it much easier to produce formal reports from handwritten notes. I recommend it highly for an effective and inexpensive solution for taking notes and keeping them digitally.

In Hong Kong, information on ordering a LiveScribe digital can be found on the Smartpen Asia website. The LiveScribe website has an international store locator. LiveScribe pens and accessories can also be ordered thru Amazon.

There are enterprise solutions enabling forms to be created and specialized workflows integrated to enterprise backend solutions using the Anoto digital pen technology. You should contact Big Prairie Limited or Anoto for more information.

Comments (2)

Chinese Character Normalization – Finding People in Greater China

For most of this past year I’ve been working on a project that involved searching for Chinese people in various online databases using their Romanized names or their Chinese character names. When you are searching for someone’s Chinese name inside a database there are some quite thorny issues. With the rise of China as the world’s 2nd largest economy and Chinese people traveling and spending more and more around the world these issues about identifying Chinese people by their names are going to become a part of many knowledge workers day-to-day tasks. Here is the definition of Greater China from Wikipeida.

Most of the time trying to find a Chinese person among many other Chinese people in a database by name is not very successful. Most of the problems are around ‘Romanization’ and ‘Simplification and Traditional Chinese characters’. If you are interested in ‘Romanization’ see this Wikipeida entry. The ‘Romanization’ problem is that there are simply too many methods and no real standard.

In mainland China, people are by law required to use ‘simplified’ characters for their names. This assumes that there is a ‘simplified’ character for that name. In Hong Kong, Macau and Taiwan people use ‘traditional’ characters for their names. If you are interested in the difference refer to this Wikipedia entry. In any event, ‘simplification’ is a master stroke of censorship and knowledge control by the mainland Chinese government. Mainland Chinese have difficulty reading books, pamphlets and newspapers from outside of China. What better way could there be of controlling knowledge than by changing the writing system people use every day? Conversely, people from Hong Kong, Macau and Taiwan have a difficult time reading ‘simplified’ characters. Some claim it is harder to go from ‘Simplified’ to ‘Traditional’ than from ‘Traditional’ to ‘Simplified’ but I’m not sure. Reading Chinese is always hard for me and I’ve learned both character sets, sort of, up to the 1,000 character mark.

However, since there are different character sets a problem arises when someone from mainland China comes to Hong Kong, Macau or Taiwan and start to use their written character name to open accounts at banks, shops, hotels and so on. The same happens when people from Hong Kong, Macau and Taiwan go to mainland China. Simply put, people can’t easily read this person’s name. The solution is to ‘transform’ the name into the ‘correct’ character set; ‘Simplified’ Character to ‘Traditional’ Character or ‘Traditional’ Character to ‘Simplified’ Character. It happens all the time when a person opens an account where there details will be input into a database. They write down their name in the character set they are comfortable with using and the person either collecting the names or the data-input person ‘transforms’ this name. Interestingly, all Hong Kong and Macau Chinese people may apply for a ‘home return permit‘ card that lets them cross the border into China easily, and also lets the Chinese government know they have arrived. Their names are always ‘transformed’ into simplified characters when there is corresponding character between the ‘traditional’ character and the ‘simplified’ character. I assume these transformations are more accurate than some of the others. I know some of the transformations between ‘simplified’ to ‘traditional’ are not always accurate. This is due to imperfect knowledge of the mapping rules between the character sets. Sometimes people are in too much a hurry so they simply guess. All Chinese names have at least 2 characters and many, maybe the majority, have 3 characters. Sometimes the transformer will transform 1 or 2 characters and leave 1 or 2 character unchanged.

The end result is that if even if you have a Chinese person’s correct name you may not be able to find it in a database because someone has ‘transformed’ the name. Sometimes you can’t find a Chinese person in a database because you believe their name is written with character ‘X’ but in fact they write it with character ‘Y’. The only way to solve this problem is for the database’s search engine to ‘normalize’ the search. Here is an excellent summary of ‘normalization’ prepared by Michael CY Chan.

Leave a Comment

Web 2.0 … The Machine is Us/ing Us

It still resonates ….

Leave a Comment

More on data – Hans Rosling shows the best stats you’ve ever seen –

Hans Rosling shows how the gap between the rich and the developing world had lessened in the last 50 years. This is a great use of data. What I find really interesting is that the sterotypes about the developing were fixed in the minds of rich country people long after the reality changed.

Leave a Comment

Data – its new importance

There seems to be, at least to me, a renewed interest in data.  This is after years of listening to people disparage data in favour of information and then knowledge.  I personally rather like data because I have always thought without solid accurate data it was likely I was going to come to a bad end, make the wrong choice, end up with a failed project or process.

I’ve had a few friends mention to me they are involved in data projects of one sort of the other.  Collecting source data and sharing it effectively is becoming more important in all sorts of fields.  This makes sense to me because there is now so much raw data available and computing power is available to make sense of this raw data.

This site, Digging into Data Challenge, is a start.

Here is an excellent TED Talk from Tim Berners-Lee on the data and the ‘new web’.  He is not sure what this ‘new web’ will be exactly but it will involve the use of a lot more raw data.

Leave a Comment

« Newer Posts